Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Sparse sample-oriented focus type Web information extraction system and method

An information extraction and focusing technology, applied in the field of information extraction, can solve problems such as difficulties in Web information extraction

Inactive Publication Date: 2016-08-31
SHANGHAI UNIV
View PDF0 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This semi-structured form enriches the extraction method, and at the same time, the semi-structured feature makes the presentation of the page diversified. Some information appears in the form of text, some information appears in the form of tables, and some information appears in the form of XML. This brings difficulties to Web information extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sparse sample-oriented focus type Web information extraction system and method
  • Sparse sample-oriented focus type Web information extraction system and method
  • Sparse sample-oriented focus type Web information extraction system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0123] The implementation of the present invention will be described in detail below in conjunction with the accompanying drawings and examples, so as to fully understand and implement the process of how to apply technical means to solve technical problems and achieve technical effects in the present invention.

[0124] By summarizing the current Web information extraction technology, the design goal of the present invention is to enable ordinary users to extract the content they are interested in from the page through simple operations, and convert these contents into a structured form, and at the same time Provide personalized search services based on user interests. Taking the product name, price, and sales ranking information extracted from the product detail page of Yixun.com as an example, how to use the Web information extraction service provided by the present invention will be described in detail below.

[0125] figure 1 It is the overall frame diagram of the focused...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a sparse sample-oriented focus type Web information extraction system and method. The sparse sample-oriented focus type Web information extraction system includes: a webpage interaction module for providing extraction template definition and structuralized extraction result search service; an extraction engine module for providing functions of similar webpage acquisition, sample feature modeling, feature selection, and information extraction; and a data service module for providing a relationship type data service and a non-relationship type data service for the front end and the back end of the system. Based on a small number of samples, high-efficient information extraction can be performed, and the structuralized information can be extracted out form the fields to which different samples belong.

Description

technical field [0001] The invention relates to an information extraction technology, in particular to a sparse sample-oriented focused Web information extraction system and method. Background technique [0002] The main problem to be solved by information extraction is excessive information. Using information extraction technology is expected to directly compare and display information in a structured (for example, in the form of a table). Information extraction can be defined as a method of extracting structured information from semi-structured or unstructured texts. Compared with information retrieval, the obtained content is richer and more detailed and has the characteristics of structured query. It can be regarded as information An extension of retrieval technology. Web information extraction can further be extended to the process of extracting a specified type of information from web page text and converting it into structured data. [0003] Web information extracti...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/951G06F16/9566G06F18/22
Inventor 朱文浩郭心怡刘懿霆陈洁徐钊姚文心
Owner SHANGHAI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products