Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification auxiliary labeling method based on collaborative training

A text classification and collaborative training technology, applied in the field of text classification auxiliary labeling based on collaborative training, can solve the problems of multi-classification, difficulty and high requirements of text, improve accuracy and time efficiency, reduce redundant redundant work, and improve resources. The effect of utilization efficiency

Active Publication Date: 2019-09-17
广州探域科技有限公司
View PDF6 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Machine learning labeling methods, such as supervised machine learning methods such as logistic regression, support vector machine, and naive Bayesian, can achieve fast labeling of corpus, but the training of these models depends on high-quality data labeling, and when dealing with multi-category The corpus, the performance is not good
[0006] Crowdsourcing labeling is based on crowdsourcing theory, specifying a special labeling system and labeling tasks for each task, and then randomly sampling the samples, and then handing them over to specialized personnel for labeling, but this requires relatively high requirements for professionals , the dependence is relatively large, and a large amount of data review work is required
[0007] In the field of text classification in natural language processing, the problem of multi-classification of text is often encountered, and there may be hundreds of labels, which requires a very large number of labels, and ordinary labelers are better at For the binary classification of the corpus, it is very difficult and inefficient for them to mark a large number of labels at one time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification auxiliary labeling method based on collaborative training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The following clearly and completely describes the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0029] see figure 1 , the present invention provides a text classification auxiliary labeling method based on collaborative training, which is characterized in that it comprises the following steps:

[0030] S1, label the sample data, and each sample data corresponds to a label; first, there are a large number of unlabeled data sets U, and tens of thousands of data are randomly selected from the data set U for labeling as sample data. Each piece of data is short text data, and the label is the intention of the short text data, whic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification auxiliary labeling method based on collaborative training, and relates to the technical field of text classification. The text classification auxiliary labeling method comprises the following steps: labeling sample data; constructing a data set; training two classifiers; classifying and recording classification results; extracting correct features and error features of the classifiers; performing sample optimization; retraining the replacement data set until the accuracy reaches a confidence threshold; and meanwhile, performing classification by using the two classifiers, and outputting a result under the condition that classification results are the same. According to the text classification auxiliary labeling method, most of simple data can be automatically labeled with high quality, so that the labeling accuracy and efficiency are greatly improved.

Description

technical field [0001] The present invention relates to the technical field of text classification, in particular to a text classification auxiliary labeling method based on collaborative training. Background technique [0002] In recent years, with the advent of the era of big data and the rapid development of machine learning and artificial intelligence, practitioners' demand for data has become more and more urgent, and the limited workload of labelers has become increasingly unable to meet people's demand for massive data. , and manual labeling brings many disadvantages, the time cost and economic cost of manual labeling are high, and there will inevitably be inevitable errors in manual labeling, which makes the quality of the labeled corpus not high. [0003] Artificial intelligence, especially natural language processing and image processing, is facing a difficult situation: supervised learning methods need to obtain a large amount of labeled corpus, and to obtain thes...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/35
CPCG06F16/35
Inventor 张丰琪
Owner 广州探域科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products