Text classification method lacking negative examples

A text classification and text technology, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of lack of statistical theory support, poor accuracy rate, lack of negative example data, etc., to achieve Improve classification accuracy, good classification effect, and efficient classification effect

Active Publication Date: 2020-02-14
南京稷图数据科技有限公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Research in this area, such as the LSI (Latent Semantic Indexing) model, is sometimes called LSA (Latent Semantic Analysis), which can obtain text compression information to a large extent through SVD (Singular Value Decomposition), but such methods also exist Some defects, such as the obtained model cannot be explained by probability, lack of statistical theory support, etc.
[0007] (2) In the absence of negative example data, it is difficult to select training data for classification models
For example, PCA, decision tree, Bayesian framework, S-EM and other methods are used by scholars in the selection process of negative examples, but these are not classifiers with strong generalization ability, so the final effect is slightly lacking
[0008] (3) When there are many categories that need to be judged, it will take a lot of time to use the trained classification model to score each category, which will seriously affect the use in the production environment
[0009] Through the above analysis, when performing text classification, for text classification lacking negative example data, the classification is difficult due to the lack of negative example data, and the accuracy rate is poor, the effect is poor, and the efficiency is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification method lacking negative examples
  • Text classification method lacking negative examples
  • Text classification method lacking negative examples

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] The present invention will be further described below in conjunction with embodiment and accompanying drawing.

[0044] The present invention provides a text classification method lacking negative examples. The method is used for text classification. It should be noted that the method is not limited to text classification in a single field, and can be used in various fields. combine figure 1 , the specific steps of the method are:

[0045] S1: Determine the classification text and classification category

[0046] Determine the data text to be classified, and customize the text classification category, where the customized text classification category is used as the positive example category.

[0047] When performing text classification, it is necessary to determine which texts to classify, and the user can customize the classification categories according to the needs, and determine which categories the data texts are to be divided into, and these given categories are...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a text classification method lacking negative examples, and belongs to the technical field of machine learning and text classification. The method comprises the following steps: firstly, determining to-be-classified data texts, and customizing text classification categories; training a TF-IDF model and an LSI model based on the obtained corpus; respectively constructing feature vectors of the text based on the trained TF-IDF model and the LSI model, and constructing a combined text feature vector based on an ensemble method; secondly, training a Basic classifier by adopting an ROC-SVM combination algorithm, training the Basic classifier in combination with a k-means clustering method, and training a label classifier at the same time; and finally, initially classifying the text to be classified by using a Basic classifier, screening by using Elasticsearch, determining candidate classifications, and accurately classifying the document to be classified into one ormore of the custom classifications by using a label classifier. Text data lacking negative examples can be effectively classified, the accuracy is high, the effect is good, and the efficiency is high.

Description

technical field [0001] The invention belongs to the technical field of machine learning and text classification, and in particular relates to a text classification method lacking negative examples. Background technique [0002] With the development of the Internet, the number of Internet texts has increased dramatically, and the resulting demand for textual classification has also become stronger. In the face of massive data texts, manual classification is obviously impossible, but with the rise of machine learning methods, it provides ideas to solve this demand. Therefore, a large number of researchers have proposed a series of methods around this field. For example, machine learning methods such as naive Bayesian method, decision tree, k-nearest neighbors, and support vector machines have been successfully applied to text classification and achieved good results. However, because the data texts in different fields are intricate and the mechanisms of many methods are diffe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/216G06F40/242G06K9/62
CPCG06F16/35G06F16/355G06F18/23213G06F18/2411G06F18/22G06F18/241G06F18/214
Inventor 吴刚王楠
Owner 南京稷图数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products