Distributed real-time news information acquisition system
An information collection and distributed technology, applied in the field of information collection, can solve problems such as the bottleneck of the URL distribution module, and achieve good scalability, low cost, and easy deployment
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0031] The present invention will be further described below in conjunction with accompanying drawing:
[0032] In the present invention, each independent collection node is responsible for the collection of news pages, and all collected pages communicate with the central node through TCP / IP, and the sub-collection nodes forward the collected pages to the central node. The central node is responsible for storing all downloaded news pages into the database.
[0033] The Web information collection system can basically be divided into seven parts: URL processor, protocol processor, duplicate content detector, URL extractor, Meta information acquirer, semantic information parser and database, which coordinate to obtain information from the Web .
[0034] 1. URL processor: This component mainly sorts the URLs to be collected, and assigns URLs to the protocol processor according to a certain strategy. Depending on the scale of the collection system, the URL can be multiple collect...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com