Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

System and method for extracting Chinese event trigger words

A technology of event triggering and trigger words, which is applied in unstructured text data retrieval, natural language data processing, special data processing applications, etc., and can solve problems such as relatively poor performance

Active Publication Date: 2015-02-04
SUZHOU UNIV
View PDF2 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

These inherent characteristics of Chinese make the effect of syntactic information in Chinese event extraction not as obvious as in English, and the relative performance is poor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • System and method for extracting Chinese event trigger words
  • System and method for extracting Chinese event trigger words
  • System and method for extracting Chinese event trigger words

Examples

Experimental program
Comparison scheme
Effect test

example 2

[0097] For example, Example 2: The robot army / ORG attacks the Galactic Republic / GPE Naboo planet / LOC with missiles / WEA and injures 3 / NUM Galactic Republic / GPE civilians / PER. Among them, the entity categories represented by "ORG", "WEA", "GPE", "LOC", "NUM" and "PER" respectively are organizations, weapons and equipment, political entities, locations, quantities and people. In addition, commonly used entity categories include "TIME", "JOB", "FAC" and "VEH", which represent time, job position, place and means of transportation, respectively.

[0098] S103. Call a syntax analysis tool to perform syntax analysis on each document in the second document collection to obtain a third document collection.

[0099]Specifically, the syntax structure obtained after syntactic analysis in example 2 is shown in example 3, example 3: ((IP(NP(NR robot army))(VP(VP(PP(for P)(NP(NN missile)) )(VP(VV attack)(NP(NR Galactic Republic)(NR Naboo planet)))(PU,)(CC and)(VP(VV cause)(AS)(NP(CD3 name)(...

example 7

[0126] Syntax: ((IP(NP(CP(IP(NP(NT9 morning))(NP(NN one))(VP(VV wear)(NP(ADJP(JJ white))(NP(NN clothes)))) )(DEC))(NP(NN Junior)))(VP(PP(P)(LCP(NP(NN Street))(LC))(PP(P)(CLP(M Stick)) )(VP(VV beat)(AS up)(NP(NN one)(NN middle-aged women)))(PU.))).

[0127] Partial dependencies: nsubj(hit-13, juvenile-7), prep(hit-13, used-11), dep(used-11, stick-12), dobj(hit-13, middle-aged woman-16).

[0128] S303. According to the pre-selected trigger word features, extract the feature of each trigger word in the training set trigger word set from the dependency and syntax training set to form a training set feature set.

[0129] In S303, each trigger word tr i The feature set of is:

[0130] (trigger word for event of type n(n>0); 0-non-event trigger word)

[0131] i >i The part of speech>i previous word + tr i >i The part of speech of the preceding word + tr i The part of speech>i +tr i The following word>i part of speech+tr i The part of speech of followin...

example 9

[0152]

[0153]

[0154] S602. For each quadruple in the core entity trigger word set, obtain the dependency path between the core entity in the quadruple and the trigger word from the dependency and syntax training set, form a core template quintuple, and obtain A collection of core templates for the training set. Wherein, the five-tuple of the core template is expressed as .

[0155] Specifically, in the two quadruples in Example 9, the dependency path of "juvenile" and "hit" is "nsubj", while the dependency path of "middle-aged woman" and "hit" is "dobj", then the obtained core The template quintuple is shown in Example 10:

[0156]

[0157]

[0158] S603. For each trigger word in the trigger word set in the training set, obtain the auxiliary entities and their entity types corresponding to all auxiliary roles of the event type according to the event type marked in the training corpus by the trigger word; each of the auxiliary The entity and its enti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a system and a method for extracting Chinese event trigger words. The system comprises a syntax and dependence analysis module, a core and auxiliary role definition module, a training corpus feature extraction module, a candidate trigger word extraction module, a basic feature extraction module, a training set template extraction module, a candidate template extracting module, a solid feature extraction and a trigger word recognition module. According to the fact that a role semantics is one event semantics expression form, the invention provides a method for expressing the role semantics by use of a core role and auxiliary roles, and the method is used for extracting the Chinese event trigger words. Compared with the best existing method and system for extracting a Chinese event, the method provided by the invention has the advantage that the Chinese event trigger word extraction performance is obviously improved.

Description

technical field [0001] The invention belongs to the field of natural language processing, and in particular relates to an extraction system and method for extracting a certain event trigger word. Background technique [0002] Event (Event) is a main form of information representation. It is an objective fact (also called "natural event") of specific people, things, and things interacting at a specific time and a specific place, such as human injury and death events. and food additive incidents, etc. As a subtask of information extraction, event extraction is a research hotspot in information extraction, and its research content is to automatically discover specific types of events and their event elements from natural texts. Events, as one of the basic elements of understanding natural language, are the basis of major natural language understanding applications such as automatic summarization, machine translation, question answering systems, and decision-making systems. Fo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/36G06F40/211
Inventor 李培峰周国栋朱巧明孔芳朱晓旭
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products