A text classification and extraction method for Chinese news emergencies

A text classification and emergency technology, applied in text database clustering/classification, unstructured text data retrieval, special data processing applications, etc.

Active Publication Date: 2019-02-01
CHINASO INFORMATION TECH
View PDF8 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a text classification and extraction method for Chinese news emergencies, on the basis of classifying news texts, use the event extraction method driven by event instances to extract news events, thereby solving the problem of The foregoing problems existing in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A text classification and extraction method for Chinese news emergencies
  • A text classification and extraction method for Chinese news emergencies
  • A text classification and extraction method for Chinese news emergencies

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0042] In this embodiment, taking the content of a news website as an example, a method for text classification and event extraction is provided, and the specific steps are as follows:

[0043] S1, using crawlers to obtain the content of multiple news websites as the initial news data set, denoted as news data set S={s 1 ,s 2 ,...,s N}, where s i is the i-th news text in the news data set, i=1, 2,..., N, N is the total number of news texts in the news data set;

[0044] S2, classify the news data set S obtained in step S1, and obtain the classified news data set Such as figure 1 As shown, the specific steps are as follows:

[0045] S21, extract the first news text s in the news data set S 1 The title and body of the news, get the news text s′ 1 ={t 1 ,c 1}, where t 1 for news texts 1 title, c 1 for news texts 1 title;

[0046] S22, traversing all the news texts in the news data set S, repeating step S21, to obtain the news data set S′, Among them, N is the tot...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification and extraction method for Chinese news emergencies, belonging to the natural language processing field. The invention adopts the joint characterization based on the headline and the content information for the news text classification, and integrates the contribution degree of the part of speech to the text classification into the traditional TF-IDF algorithm, the weight of Word2Vec word vector is used to generate short text vector, which avoids the loss of information caused by the single use of title or content and the reduction of classificationaccuracy caused by the different lexical importance of the text. Finally, event instances are extracted by using event-driven news emergency extraction method, It not only overcomes the imbalance between positive and negative examples and data sparseness, but also solves the limitation of pre-defined event categories, and realizes event extraction, which is convenient for journalists and public opinion analysts to use the event extraction results to quickly analyze news.

Description

technical field [0001] The invention relates to the field of natural event processing, in particular to a text classification and extraction method for Chinese news emergencies. Background technique [0002] In recent years, online news has developed rapidly. Compared with traditional media news, online news is faster, more flexible and more convenient, and online news has gradually become the most common way for people to obtain news information. Using text classification technology to automatically and quickly classify a large number of news, and how to effectively extract the events in the news under the premise of ensuring the main content of the original news are the main difficulties in current research. News text classification and event extraction have always been extremely challenging issues in the field of information extraction, involving technologies and methods in multiple disciplines such as natural language processing, data mining, and machine learning, and ha...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/35G06F16/9537
Inventor 滕辉龙飞
Owner CHINASO INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products