Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for constructing text classifier with reference to external knowledge

A text classification and external knowledge technology, applied in the direction of instruments, special data processing applications, electrical digital data processing, etc., can solve the problem of dependence on data distribution, generalization ability and robustness, poor generalization ability and robustness of classifiers and other issues to achieve the effect of improving generalization ability and robustness, increasing diversity, and improving category representativeness

Inactive Publication Date: 2015-09-30
NEC (CHINA) CO LTD
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Therefore, the final text classifier constructed must be completely influenced by the given labeled text, resulting in poor generalization and robustness of this classifier
[0012] Although there are other training text selection methods in the prior art, the current training text selection methods are mainly realized by using the internal knowledge of a given labeled text set, that is, the features and weights used are completely dependent on the given Determine the data distribution of the labeled text set, so that the selected training text will have a strong bias
This bias will be propagated to the classification orientation of the final constructed classifier, which greatly affects its generalization ability and robustness, and finally causes the performance of the classifier to be unsatisfactory.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for constructing text classifier with reference to external knowledge
  • Method and device for constructing text classifier with reference to external knowledge
  • Method and device for constructing text classifier with reference to external knowledge

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] Here, for the convenience of description, at first some technical terms that will be used in the present invention are briefly explained:

[0034] the term

definition

machine learning based

Text Categorization

Machine learning is the mainstream method of text classification, which generally utilizes a set of

text (i.e., labeled text) to supervise the learning process of the classifier.

Selection of training text

The selection of the training text is used to eliminate the text in the given set of labeled text and the text used by the classifier

The final decision function is independent of the labeled text to improve the effect and efficiency of classifier construction.

word dictionary

The word meaning dictionary defines the meaning of words used in natural language and their mutual semantic relations.

container. Depending on the language, there may be monolingual, multilingual, or cross-lingu...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method and equipment for constructing a text classifier by referencing external knowledge. The method comprises the steps of: inputting a label text set; extracting internal characteristics of the label text set; constructing external characteristics of the label text set by referencing an external knowledge source (such as a dictionary); comprehensively considering the internal characteristics and the external characteristics of the label text set, and selecting training texts from the label text set; and learning the generation of the text classifier by using the selected training texts. According to the invention, sample distribution deviation generated by the label text set can be possibly regulated by the external characteristics automatically generated by the external knowledge source, and therefore, the finally trained classifier has better generalization capability and robustness.

Description

technical field [0001] The present invention relates generally to information retrieval and text classification. More specifically, the present invention relates to methods and devices for constructing text classifiers with reference to external knowledge. Background technique [0002] With the rapid development of electronic office and the Internet, the amount of electronic text information has exploded, and large-scale automatic information processing has become a necessary means and challenge for people to make better use of this large-scale information. [0003] Information retrieval refers to the process and technology of organizing information in a certain way and finding relevant information according to the needs of information users. Automatic text classification is one of the main supporting technologies for information retrieval. Its basic purpose is to divide text into predefined categories, which is an effective means to help people search, query, filter and ut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 李建强赵彧刘博
Owner NEC (CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products