Information extraction method and device

A technology of information extraction and rules, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as manpower and time

Active Publication Date: 2018-10-12
ULTRAPOWER SOFTWARE
View PDF20 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] A good rule model can reach a high standard of accuracy and precision, but when building a rule model, not only professional modelers are required, but also the text elements that need to be matched must be exhaustively exhausted, which consumes a lot of manpower and time

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Information extraction method and device
  • Information extraction method and device
  • Information extraction method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] The embodiments of the present application will be described in detail below.

[0053] In the rule-based extraction method, the regular expressions include information extraction rules, and the information extraction rules are used to extract the information expected by the user from the text. For example, match the information extraction rule "medium body | average body shape" with the text, and when "medium body size" or "average body shape" appears in the text, such information describing the body shape in the text will be extracted. In order to extract information more comprehensively, modelers need to exhaustively enumerate all possible expressions to construct regular expressions, which consumes a lot of manpower and time.

[0054] In addition to rule-based extraction methods, statistics-based extraction methods can also be used to extract information. That is, first use the corpus that marks the information that the user wants to extract to train the statistical...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the present invention discloses an information extraction method and device. The method comprises the following steps: acquiring a text of to-be-extracted information and an extraction expression, wherein the extraction expression comprises an area determination rule and an information extraction rule; the area determination rule comprises a statistical operator, and the statistical operation represents a statistical model for identifying named entities and / or dependent components in the text; using the statistical model to identify the named entities and / or the dependent components in the text, and respectively marking corresponding identification tags for the identified named entities and / or dependent components; using the identification tags to compare the area determination rule and the text so as to determine a valid extraction area in the text; and extracting a character string matching the information extraction rule from the valid extraction area. The aforementioned method calls the statistical model in a rule manner, which is convenient and flexible, and expands the scope of identifying vocabularies, reduces the rule construction, and extracts the information required by the user more accurately.

Description

technical field [0001] The invention relates to the fields of text processing and information extraction, and in particular to an information extraction method. In addition, the invention also relates to an information extraction device. Background technique [0002] Information Extraction is a text processing technology that extracts specified types of factual information such as entities, relationships, and events from natural language texts, and forms structured data output. It can be used as a pre-information processing process for operations such as intelligent question answering, deep mining of semantic information, and standardized information extraction. [0003] The main method of information extraction is the rule-based extraction method, which generally includes two stages: constructing regular expressions, and applying regular expressions to obtain the information needed by users. Constructing regular expressions is mainly constructed by modelers based on extra...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/22G06F17/27
CPCG06F40/131G06F40/295G06F40/14G06F40/30
Inventor 李德彦晋耀红吴相博
Owner ULTRAPOWER SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products