Word bank updating method and device and electronic equipment

An update method and technology of thesaurus, applied in the direction of electronic digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problems of incomplete data, easy omission of professional terms, unbalanced data quality of terminology database, etc. Achieve the effect of discovery, efficiency improvement, and data quality balance

Pending Publication Date: 2020-06-23
STATE GRID BEIJING ELECTRIC POWER +2
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Embodiments of the present invention provide a method and device for updating a thesaurus, and electronic equipment, so as to at least solve the technical problem that professional terms are easily missed when extracting corpus in related technologies, resulting in unbalanced data quality and incomplete data in the terminology database.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word bank updating method and device and electronic equipment
  • Word bank updating method and device and electronic equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.

[0025] It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a lexicon updating method and device and electronic equipment. The method comprises the steps that audit new corpora are acquired, and the audit new corpora comprise audit basic words and electric power data statements; labeling the audit new corpus to obtain basic sample data; a new word retrieval model is adopted to identify new words in the basic sample data, the new word retrieval model is obtained on the basis of training of multiple sets of data, and each set of data in the multiple sets of data comprises lexicon words and new word prediction probabilities; and updating the audit data lexicon based on the new words. According to the method and the device, the technical problems of unbalanced data quality and incomplete data of a term database due to the fact that professional terms are easy to miss when corpora are extracted in related technologies are solved.

Description

technical field [0001] The present invention relates to the technical field of power data processing, in particular, to a method and device for updating a lexicon, and electronic equipment. Background technique [0002] In the field of electric power, unregistered words are a big problem when performing word analysis on unprocessed original corpus. Unregistered words refer to words that are not included in the electric power word segmentation vocabulary but must be segmented, including All kinds of proper nouns (names of people, places, companies, etc.), abbreviations, new vocabulary, etc. Moreover, most of the unregistered words are technical terms in the field of electric power, so term discovery is an urgent problem that needs to be solved. The discovery of terms directly affects the quality of the corpus. In the process of term discovery, the main task is to complete term extraction, that is, from the Terms are extracted from the processed corpus to ensure the comprehen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/242G06F40/109G06F16/332
CPCG06F16/332
Inventor 尚颖马薇徐光兵黄松李彦龙梁卫泉丁勇王端瑞侯本忠张永强闫丽飞
Owner STATE GRID BEIJING ELECTRIC POWER
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products