An Online Web News Content Extraction System
A content and news technology, which is applied in the field of online Web news content extraction, can solve the problems of wrapper failure, high cost, single consideration angle, etc., and achieve the effects of improving adaptability, improving versatility, and strong real-time performance
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0049] see figure 1 , the online Web news content extracting method is to carry out as follows in the present embodiment:
[0050] Step 1, use an HTML parser to parse the extracted web news web page to obtain the DOM tree of the extracted web news web page; obtain the HTML text of the extracted news web page according to the URL address of the extracted web news web page, and use Jtidy to modify the HTML Error messages in the text include label matching errors, label writing errors, and HTML encoding errors; use the HTML parser HTMLParser to scan the characters in the HTML text one by one, analyze the structural hierarchical relationship of the HTML text, and obtain the DOM of the extracted Web news web page Tree;
[0051] Step 2, traverse the DOM tree, visit each node in the DOM tree in turn, and construct the text node information sequence and the label path information sequence of the text node; each element in the text node information sequence has two attributes, which a...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com