Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Method and apparatus for selecting text classification training sets

A text classification and training set technology, applied in the computer field, can solve the problems of rigidity, low marking accuracy, and large errors in this way, and achieve the effects of improving classification speed, quickly creating text classifiers, and high accuracy

Active Publication Date: 2017-04-05
BEIJING GRIDSUM TECH CO LTD
View PDF5 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method is relatively rigid. For some texts that have similar meanings to keywords but do not contain keywords, it will not be possible to mark the text correctly.
Therefore, this method also has the disadvantages of low labeling accuracy and large errors in the selection of text training sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and apparatus for selecting text classification training sets
  • Method and apparatus for selecting text classification training sets
  • Method and apparatus for selecting text classification training sets

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

[0025] Embodiments of the present invention provide a method for selecting a text classification training set, such as figure 1 As shown, this method is applied to mark and classify the text in the training set, so as to construct a text classifier to achieve accurate classification of the text. The specific steps include:

[0026] 101. Using cosine similarity according to a predetermined clustering algorithm, perform similar clust...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method and an apparatus for selecting text classification training sets, relates to the technical field of computers, and solves the problems of low speed, large error and low efficiency of an existing text training set classification mode. According to the main technical scheme, the method comprises the steps of performing similarity clustering on texts in training sets according to a predetermined clustering algorithm by utilizing cosine similarity to obtain a plurality of text clusters; extracting a representative text from each text cluster, wherein the representative text and other texts in the cluster in which the representative text is located have common similar features; determining a text classification tag of the representative text according to a predetermined keyword; and adding all texts in the text cluster, in which the representative text is located, to a text training set corresponding to the text classification tag. The method and the apparatus are mainly used for classification selection of the text training sets.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for selecting a text classification training set. Background technique [0002] In the era of informationization, the information resources on the Internet are increasing rapidly, and the merging and classification of information texts has become a problem that must be solved in information management. Based on this, automatic text classification technology came into being. Automatic text classification is the process of automatically assigning texts of unknown categories to known categories. To realize automatic text classification, it is first necessary to create a text classifier, which is established on the basis of a large number of classified text training sets. [0003] The classification of a text training set is to mark the text in the training set according to the classification rules. The general method is to select a keyword, judge the relevanc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/355
Inventor 林漫鹏
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products