Abstract sample information retrieval system based on context, and abstract sample characterization representing method of abstract sample information retrieval system

A technology of information retrieval and context, applied in the field of information retrieval, can solve the problems of characteristic representation of word vector formation samples, word meaning feature extraction, etc., and achieve the effect of improving accuracy, improving accuracy, and expanding construction methods

Active Publication Date: 2016-11-09
长源动力(北京)科技有限公司
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The purpose of the present invention is to overcome the situation in the prior art that it is difficult to form a characteristic representation of a sample according to the word vector of Word2vector, and solve the problem of word meaning feature extraction in the characteristic representation of an abstract sample

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Abstract sample information retrieval system based on context, and abstract sample characterization representing method of abstract sample information retrieval system
  • Abstract sample information retrieval system based on context, and abstract sample characterization representing method of abstract sample information retrieval system
  • Abstract sample information retrieval system based on context, and abstract sample characterization representing method of abstract sample information retrieval system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0027] Such as figure 1 As shown, the content is a context-based abstract sample information retrieval system of the present invention, including a word segmentation function module, a word meaning feature extraction module, an abstract word feature substitution representation module, an ST-IDF module and a classification module.

[0028] The abstract sample characterization method of the abstract sample information retrieval system includes the following steps:

[0029] Step 1: Use the word segmentation function module to segment the abstract words of the sample. When the sample completely uses abstract words to record information, it is impossible to segment the abstract words in the sample according to the dictionary or thesaurus. Therefore, this step only treats the abstract word as a string of ASCII characters. When the sample is a data link mes...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention puts forward an abstract sample information retrieval system based on a context, and an abstract sample characterization representing method of the abstract sample information retrieval system. In the system, the abstract sample characterization representing method comprises the following steps: utilizing a word2vecor to extract semantic characteristics, and obtaining the word2vecor of an abstract word; then, carrying out ''optimal fitness division'' clustering on the word2vecor of the abstract word, and replacing the abstract word with clustering centroid according to a clustering result; and finally, according to the centroid and the word frequency of the abstract word represented by the centroid, forming a word2vecor clustering centroid frequency model (ST-IDF (Inverse Document Frequency)) used for carrying out characterization representing on the abstract sample. By use of the abstract sample information retrieval system, the execution frequencies of clustering and fitness calculation can be lowered, the performance of the similarity analysis of the abstract sample is improved, and sample classification accuracy is improved.

Description

technical field [0001] The invention relates to the field of information retrieval of data link messages, semi-structured texts or ordinary texts, in particular to sample similarity analysis and classification based on word vector (Word2vector). Background technique [0002] Abstract words refer to special words in information retrieval samples that cannot be directly interpreted by language, that is, no known language rules (word meaning, grammar, word order) can directly identify their actual semantics. A large number of abstract words exist in information retrieval samples to varying degrees, such as military data link messages (Link-16, Link-22), semi-structured text (XML) or ordinary text for data exchange. At the same time, there are a large number of data link messages, semi-structured texts or ordinary texts that completely use abstract words to record information. For this situation, we call such messages or texts in information retrieval tasks abstract samples. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/3329G06F16/355G06F18/23213
Inventor 吴琳韩广袁鑫攀李亚楠
Owner 长源动力(北京)科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products