Method for extracting hyponymy relation of field terms from wikipedia

A domain and relationship technology, which is applied in the field of extracting the hyponym relationship between domain terms from Wikipedia, can solve the problems that the results cannot be guaranteed to be correct, limit the performance of the hyponym relationship, and the accuracy rate is not high.

Inactive Publication Date: 2014-04-02
XIAN JIAOTONG UNIV CITY COLLEGE
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Only based on a series of text features such as sentence structure features, term word frequency, part of speech, and part of speech, the accuracy of extraction is not high; secondly, word segmentation and part-of-speech tagging operations are involved in term extraction, and the results of these text preprocessing cannot be guaranteed to be completely correct. Limits the performance of subsequent hypernymy relationship extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for extracting hyponymy relation of field terms from wikipedia
  • Method for extracting hyponymy relation of field terms from wikipedia
  • Method for extracting hyponymy relation of field terms from wikipedia

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The specific technical solutions of the present invention will be further described in detail below in conjunction with the accompanying drawings.

[0052] In the present invention, domain terms refer to words or phrases expressing specific concepts or relationships in a subject area. For example, in the field of Data mining, typical domain terms include Cluster analysis, k-means algorithm, Classification, and Support vector machines, etc. The hyponym relationship is the semantic relationship of domain terms, which indicates the two types of affiliation between terms, kind-of (subclass and class) and is-a (instance and class), for example, between k-means algorithm and Cluster analysis and Support vector There is an upper-lower relationship between machines and Classification.

[0053] The invented method for extracting the hyponym relationship between domain terms from Wikipedia includes three steps as shown in the accompanying drawings, and the specific process is as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a method for extracting a hyponymy relation of field terms from wikipedia. The method comprises the following steps of (1) using a wikipedia page corresponding to the field name as the starting page, carrying out the breadth-first traversal with the depth of 3, utilizing an URL (uniform resource locator) regular expression to filter the hyperlink not directing to the field term, and respectively storing the traversed page and hyperlink as the page text collection and the binary group collection; (2) obtaining the bidirectional link feature, edge betweenness feature and clustering coefficient feature from the binary group collection; obtaining the anchor text location feature and anchor text context feature from the text collection, and building five-dimensional feature vectors; (3) using a Random Forest classifier to carry out binary classifying on the hyperlink in the binary group collection according to the hyponymy relation and the non-hyponymy relation. The method has the advantage that the text feature and the hyperlink topology feature are comprehensively applied, so the hyponymy relation can be automatically extracted from the wiki.

Description

technical field [0001] The invention relates to an information extraction method, in particular to a method for extracting the hyponym relationship between field terms from Wikipedia. Background technique [0002] Hyponymy relation (Hyponymy relation) is the most basic semantic relationship between domain terms, which mainly describes the affiliation relationship between domain terms. For example, there is hyponymy between the two terms K-means algorithm and Cluster analysis in the field of "data mining". relation. The hyponymy relationship is the foundation of the classification system, and plays a fundamental role in the organization, management, classification, and retrieval of massive digital resources, especially digital resources related to the field (such as professional literature, textbooks, etc.). However, hyponymy relations are usually implicit in domain-related texts, and manual labeling of hyponymy relations is not only time-consuming and laborious, but also re...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F16/3344G06F16/35
Inventor 何绯娟缪相林
Owner XIAN JIAOTONG UNIV CITY COLLEGE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products