Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text classification model optimization method based on crowdsourcing feedback and active learning

A technology of active learning and text classification, applied in the field of crowdsourcing and machine learning, can solve problems such as reducing overhead and achieve the effect of improving accuracy

Inactive Publication Date: 2017-09-15
EAST CHINA NORMAL UNIV
View PDF2 Cites 44 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to solve a common classification problem in reality, only including a small amount of text classification tasks containing label data, to overcome the shortcomings of existing crowdsourcing-based text classification methods, and to use the labeling reasons collected by the crowdsourcing platform , proposed a model optimization method, and introduced active learning to reduce overhead

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text classification model optimization method based on crowdsourcing feedback and active learning
  • Text classification model optimization method based on crowdsourcing feedback and active learning
  • Text classification model optimization method based on crowdsourcing feedback and active learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The present invention will be further described in detail in conjunction with the following specific embodiments and accompanying drawings. The process, conditions, experimental methods, etc. for implementing the present invention, except for the content specifically mentioned below, are common knowledge and common knowledge in this field, and the present invention has no special limitation content.

[0028] exist figure 1 Among them, a text classification model optimization method based on crowdsourcing feedback and active learning according to an embodiment of the present invention includes the following steps:

[0029] Step 1: Select a text dataset and divide the text dataset into an initial training set and a remaining dataset.

[0030] Step 2: Preprocessing the text data set to obtain words therefrom;

[0031] Step 3: Using each word as a feature, construct a feature set of the text data set, and calculate the weight value corresponding to the feature to vectoriz...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text classification model optimization method based on crowdsourcing feedback and active learning. The method comprises the following steps that: selecting a text dataset, dividing the text dataset into an initial training set and a residual dataset; obtaining a word from the text dataset; constructing the feature set of the text dataset, and carrying out vectorization on the text dataset; and introducing the active learning on a classification model, predicting the sentiment polarity of the text dataset subjected to the vectorization, and combining a crowdsourcing feedback information optimization model to obtain a text classification result. By use of the method, constructing is used for collecting manually annotated reasons, more user information is obtained, the subjective feeling of people is mined, crowdsourcing feedback information is fused into the model in a weight change way, and the text classification model is optimized so as to improve model classification performance. An active learning algorithm is also introduced, and an annotation sample with a highest value is picked up and handed to a crowdsourcing platform to be annotated so as to lower annotation cost. Under a limited budget, annotation accuracy is improved, and the problem that a text classification task containing label data is in shortage is solved.

Description

technical field [0001] The invention relates to the field of crowdsourcing and machine learning, in particular to a text classification model optimization method based on crowdsourcing feedback and active learning. Background technique [0002] Crowdsourcing is a research point that has emerged in recent years. A large number of problems that are difficult for machines to handle can be assigned to online users on the Internet through crowdsourcing platforms. At present, the existing methods of using crowdsourcing to optimize text classification often only collect some labels of unlabeled data through crowdsourcing and add them to the training set, without in-depth learning of human subjective feelings and understanding ability, which limits the classification model final performance. Therefore, the present invention proposes an optimization method for a text classification model, which optimizes the model by collecting reasons for manual annotation, and can improve the accu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 杨静陈博闻江雨
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products