Text classification method and system based on class perception feature selection framework

A text classification and feature selection technology, applied in character and pattern recognition, instruments, computer parts, etc., can solve the problems of sparse features, not considering the ability to distinguish between feature word classes, and poor differentiation effect of imbalanced data sets. The effect of overcoming one-sidedness and excellent text classification effect

Active Publication Date: 2019-08-20
GUANGDONG UNIVERSITY OF FOREIGN STUDIES
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this approach does not perform well in discriminating between imbalanced datasets
This is because, when the data set has a large number of categories and is unbalanced data, the traditional feature extraction method only considers the features with the highest global class discrimination, resulting in sparse features extracted for some small sample category clusters, resulting in The classification accuracy rate for small sample clusters is reduced
At the same time, the feature extraction methods that the existing text classification methods rely on only consider the class inclination of the feature words but not the inter-class discrimination ability of the feature words. This one-sidedness limits the classification accuracy of the existing text classification methods.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method and system based on class perception feature selection framework
  • Text classification method and system based on class perception feature selection framework
  • Text classification method and system based on class perception feature selection framework

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] This embodiment includes a text classification method, refer to figure 1 , the method includes the following steps:

[0034] S1. Preprocess multiple category clusters to obtain a set of characteristic words; the category clusters include multiple words of the same category, and the multiple category clusters are used to form a training set, and the training set is used to perform a classifier on the classifier. train;

[0035] S2. Calculate the class correlation score and the class distinction score between each feature word in the feature word set and each category cluster respectively;

[0036] S3. Respectively assign each feature word in the feature word set to the category cluster with the corresponding highest class relevance score;

[0037] S4. According to the class distinction score between each class cluster and the assigned feature word, reorder the words in each class cluster respectively;

[0038] S5. Select feature subsets from the reordered clusters; al...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method. The text classification method comprises the following steps: preprocessing a plurality of category clusters, obtaining a set of feature words, calculating a class relevancy score and a class discrimination score between each feature word and each class cluster; allocating each feature word to a category cluster with a corresponding highest category relevancy value; and reordering words in each category cluster, selecting feature subsets from each category cluster, reordering the feature subsets in the total feature set to obtain a final feature set, inputting the to-be-classified text subjected to vector representation into a classifier, outputting a classification result and the like. According to the text classification method, the data processed by the classifier contains the respective properties of different types of clusters, the intra-class correlation degree and the inter-class distinguishing degree of the feature words andother information at the same time, thus overcoming the one-sidedness of the prior art, and being able to achieve a better text classification effect. The text classification method is widely appliedto the technical field of text classification.

Description

technical field [0001] The invention relates to the technical field of text classification, in particular to a text classification method and system based on a class-aware feature selection framework. Background technique [0002] Text classification technology is widely used in practical application scenarios such as information retrieval, text mining, public opinion analysis, and spam identification. Most text classification technologies are implemented based on classifiers, and the training set used to train the classifiers contains as many as hundreds of thousands of feature words. Therefore, feature extraction is an important part of text classification technology. [0003] The purpose of feature extraction is to extract feature words that are more capable of identifying cluster categories. Most of the existing feature extraction methods extract feature words that can best identify cluster categories from a global perspective. Taking information gain as an example, its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/24155G06F18/24G06F18/214
Inventor 李霞刘汉锋
Owner GUANGDONG UNIVERSITY OF FOREIGN STUDIES
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products