Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

A technology of text classification and quantitative methods, applied in the field of text classification methods and devices, can solve problems such as unstable performance and poor performance

Inactive Publication Date: 2016-01-06
CENT SOUTH UNIV
View PDF3 Cites 43 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although some existing methods perform well on certain corpora, they perform poorly on other corpora, and their performance is unstable

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device
  • Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device
  • Text feature quantification method based on comentropy, text feature quantification device based on comentropy, text classification method and text classification device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0072] The specific embodiment of the present invention will be described in detail below with reference to the accompanying drawings and specific cases, and relevant experimental results will be provided. In order to highlight the novelty of the present invention, some technical details well known in the art will be omitted.

[0073] like figure 1 As shown, the specific implementation steps of the text feature quantification method based on information entropy of the present invention are as follows:

[0074] Step s1: text preprocessing;

[0075] Prepare a batch of pre-classified text sets, and divide them into training sets and test sets according to a certain ratio; import the classified and to-be-classified text sets, and perform word segmentation and denoising on all texts; the denoising is to remove the text in the text Some noise information, including punctuation marks, numbers, stop words, etc., and convert English letters to lowercase to extract the roots of English ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text feature quantification method based on comentropy, a text feature quantification device based on comentropy, a text classification method and a text classification device. The text feature quantification method comprises the following steps that: the weight of each feature word in a document is calculated according to the word frequency of feature words in a text document and the comentropy distributed on different text classes; meanwhile, the inter-class distribution entropy of the feature words is calculated in different modes according to the unbalance performance of the scale of each class of a text set; in addition, the inverse document frequency is introduced as required according to the distribution features of each feature word in the text set; local word frequency factors are properly reduced, so that the weight distribution of each feature word in the document is reasonable; and the feature differences of different classes of texts are sufficiently reflected by generated document feature vectors. The text feature quantification device and the text classification device disclosed by the invention have a plurality of options or parameters; and the optimum text classification effect can be achieved through regulation. The text feature quantification method has the advantages that the text classification accuracy is improved, and the performance on different text sets is stable.

Description

technical field [0001] The invention belongs to the technical field of text mining and machine learning, and in particular relates to a text feature quantification method and device based on information entropy and a text classification method and device. Background technique [0002] The organization and mining of large-scale text data often rely on automatic text classification techniques. Automatic text classification technology generally needs to quantify the characteristics of the text before classification, so that the computer can use supervised machine learning methods for classification processing. The features of the text can be described by some words, the quantification of the text features is to select some words from the text as feature words, and assign different weights to them, so that each text document is represented by the weight value of multiple feature words. composed of vectors (called eigenvectors). It can be seen that the quantification of text fe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 陈科文张祖平龙军胡扬
Owner CENT SOUTH UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products