Active learning method for domain hierarchical dictionary mining construction

An active learning and domain technology, applied in the active learning domain constructed by domain-level dictionary mining, can solve problems such as high labor requirements, inconvenient migration and analysis of vocabulary, and insufficient pertinence

Active Publication Date: 2019-11-26
同方知网数字出版技术股份有限公司
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current computer has a strong memory function but insufficient reasoning skills. In terms of cognition of domain-specific words, the method based on artificial rules focuses on analyzing and constructing word-forming rules from the grammatical structure, and fully utilizes the rule discovery when analyzing the corpus. Words, this method has high requirements for the language and domain expertise of the participants, and the omission of human thinking and design will inevitably lead to a larger amount of missing vocabulary summaries, and this system is not easy to migrate to different fields. The method analyzes the probability of lexical composition from natural language texts. Since the volume of training data in the domain is large enough, this requires a lot of labor for manual labeling. The current main processing method is mostly based on familiar corpus regardless of domain Unified training learns to generate models and is eventually used in different fields, which results in a decrease in accuracy
Excavating domain-specific vocabulary and forming a proprietary vocabulary dictionary is for subsequent application tasks. However, due to the lack of pertinence of the general method, adding specific words from different fields will cause the failure of subsequent tasks.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Active learning method for domain hierarchical dictionary mining construction
  • Active learning method for domain hierarchical dictionary mining construction
  • Active learning method for domain hierarchical dictionary mining construction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the embodiments and accompanying drawings.

[0020] Such as figure 1 As shown, the active learning method flow constructed for domain-level dictionary mining includes:

[0021] Step 101 constructs a field professional word entity extraction model, and extracts professional words in domain articles to generate the underlying original professional thesaurus;

[0022] Step 102 combines information entropy, support and part-of-speech templates to filter the underlying original professional thesaurus to construct a domain dictionary;

[0023] Step 103 generates a domain thesaurus based on the domain dictionary combined with multiple synonyms generating methods;

[0024] Step 104 constructs initial seed words of domain-level words through network open resources, trains layer-level word prediction...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an active learning method for domain hierarchical dictionary mining construction. The method comprises the following steps: constructing a domain professional word entity extraction model, and extracting professional words in a domain article to generate a bottom-layer original professional word bank; combining the information entropy, the support degree and the part-of-speech template to filter a bottom-layer original professional lexicon to construct a domain dictionary; generating a domain synonym dictionary based on the domain dictionary in combination with multiplesynonym generation methods; constructing initial seed words of domain hierarchical words through network open resources, training a hierarchical word prediction model, and summarizing related optimization filtering rules; and based on the domain dictionary and the synonym dictionary, combining the hierarchical word prediction model and the optimization rule to complete superior and subordinate expansion and same-level expansion of the domain hierarchical word list. According to the method, a computer can automatically extract domain-related proprietary words from wide corpora of a natural language and construct a hierarchical dictionary, and expansion and application in different domains are facilitated.

Description

technical field [0001] The invention relates to the computer technology field of natural language processing, in particular to an active learning method for mining and constructing domain-level dictionaries. Background technique [0002] Natural language is an information-carrying communication symbol formed by human beings in their long-term life. The meaning of this symbolic language is influenced by people's living environment, field division of labor, and work experience. As words are the basic elements of language information expression, people with common experience will splice words together to form special vocabulary in order to express an entity or behavior in a specific field. [0003] With the continuous differentiation of social division of labor, the types of fields that people are engaged in are increasing, and the number of special vocabulary produced in each field has also become huge, and the meanings of words in different fields are not the same. The cogni...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/33G06F17/27
CPCG06F16/374G06F16/3344
Inventor 梅珊熊海柴庆凤贺惠新
Owner 同方知网数字出版技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products