Document and label word semantic association method and device thereof

A technology for labeling words and documents, applied in the computer field, can solve the problems of poor update timeliness of thesaurus, incomplete data, and lack of synonyms with upper and lower relative words, so as to improve the recall rate, improve efficiency and accuracy, and improve the accuracy of association. degree of effect

Active Publication Date: 2016-06-29
NAT UNIV OF DEFENSE TECH
View PDF3 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to provide a semantic association method and device for documents and tag words. This invention solves the problem that the existing thesaurus requires manual maintenance, the update of the thesaurus is not timely, and the data is not comprehensive, resulting in the lack of upper and lower levels corresponding to synonyms. Technical Problems with Relational Words

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document and label word semantic association method and device thereof
  • Document and label word semantic association method and device thereof
  • Document and label word semantic association method and device thereof

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0077] see Figure 8 , the following is a specific example: including the following steps:

[0078] a) Obtain news articles published in the last month related to specific fields from the Internet as document corpus and store them in the document library; the articles can be obtained from the Internet or other media. Considering the timeliness, it is preferred to obtain it from the Internet. The 1 month here is an example, and you only need to extract time-sensitive articles as needed.

[0079] b) Construct a syntactic pattern in the form of (S, W, N), where S represents the domain tag word, W represents the pattern tag word, and N represents the set of all nouns after the pattern tag word. Schema identifiers include two categories, namely, synonymous schema identifiers and sub-concept schema identifiers:

[0080] 1. Synonymous pattern identifiers (including but not limited to): that is, also called, abbreviated, aliased, or, commonly known

[0081] 2. Sub-concept identifi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a document and label word semantic association method and a device thereof.The method comprises the following steps that S100, documents are obtained and serve as a document corpus, wherein the documents belong to the field relevant to label words and have timeliness; S200, a syntactic pattern is constructed, syntactic pattern matching is conducted on the document corpus, and results according with the syntactic pattern are merged to serve as a candidate mention relevant word set C; S300, the candidate mention relevant word set C is filtered through a trained Word2vec pattern, and a mention relevant word set V is obtained; S400, according to mention relevant words and the label words, relevancy degrees of multiple documents and the label words are calculated, and a relevancy degree database of the label words and the documents is constructed.The document and label word semantic association method dynamically constructs semantic association on the basis of the real-time document corpora and does not depend on a synonym base which is static and high in maintenance cost.

Description

technical field [0001] The invention relates to the field of computer technology, in particular to a method and device for semantically associating documents and tag words. Background technique [0002] Today's Internet generates massive amounts of news information every day, and the ability of individuals to read and understand news information is limited. However, Internet users need to know what events have happened and what events have been discussed on the Internet. For example, financial industry analysts and investors need to consult a large number of Information to understand the current hot events in the industry, in order to know what news related to the hot events. [0003] At present, the more common method of establishing domain tags and document associations is through keyword matching, that is, if the specific tag word appears in the document, the document is considered to be related to the tag. and extract it as a pending document. The problem is that the r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 陈发君刘忠黄金才修保新朱承程光权陈超冯旸赫
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products