Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for extracting structured data

A technology for extracting structure and data, which is applied in the field of data processing, can solve problems such as insufficient accuracy of network data mining, and achieve the effect of accurate result data and improved ability

Active Publication Date: 2018-08-07
深圳市国信互联科技有限公司
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The technical problem to be solved by the embodiments of the present invention is to provide an efficient and accurate method and device for extracting structured data in view of the defect of insufficient accuracy of network data mining in the prior art

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for extracting structured data
  • Method and device for extracting structured data
  • Method and device for extracting structured data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] Vertical search engine is a new search engine service model proposed relative to the shortcomings of general search engines, such as large amount of information, inaccurate query, and insufficient depth. Information and related services of a certain value are characterized by "specialization, precision, and depth" and have industry characteristics. Compared with the disordered massive information of general search engines, vertical search engines are more foc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and a device for extracting structural data. The method comprises the following steps: dividing a webpage into a body area and an auxiliary area, and constructing an XPATH (Extensible Markup Language) tag tree used for representing the body area; causing XPATH nodes in the XPATH tag tree to correspond to CSS (Cascading Style Sheets) labels, and grouping the CSS labels according to the similarity of CSS label content; and if the CSS labels of all XPATH nodes in the XPATH tag tree belong to the same group and the CSS labels of all XPATH nodes are sub-labels which belong to the same parent label, extracting data corresponding to each XPATH node in the XPATH tag tree to serve as main body content.

Description

technical field [0001] The invention relates to the field of data processing, in particular to a method and device for extracting structured data. Background technique [0002] Documents published on the Internet are generally called web pages, and are generally published in a language called HTML, and HTML specifies a standard format for documents. Although it is very convenient for users to read network information presented in HTML format, it is difficult for users to retrieve information from HTML documents for automatic processing. This is because the data in the webpage is complex, and some information, such as navigation (menu) information and advertisement information, will cause a lot of junk information in the search engine results, which will lead to a decrease in the accuracy of web mining. Contents of the invention [0003] The technical problem to be solved by the embodiments of the present invention is to provide an efficient and accurate method and device ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/9535
Inventor 欧阳科杜建欣齐彦申
Owner 深圳市国信互联科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products