Topic crawler system based on social labels
A topic crawler, social technology, applied in the field of topic crawler system based on social annotation, can solve the problems of search deviation from topic topic drift, ignoring relevance, computational complexity increase, etc., to achieve high program operation efficiency, improve crawler efficiency, network The effect of high bandwidth utilization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0024] Below in conjunction with accompanying drawing and example the present invention is described in further detail.
[0025] Such as figure 1 As shown, the present invention proposes a web crawler strategy based on social labeling, and a multi-thread crawler system of asynchronous IO is designed based on this strategy, and the system includes a page acquisition module 100, a page processing module 200, a correlation calculation module 300, and a storage module 400 , the link extraction module 500 , and the link analysis module 600 .
[0026]The page acquiring module 100 is responsible for acquiring webpages, and acquires pages according to the robots.txt (robots.txt: robot prohibition protocol file) of the acquired website, the limitation of network bandwidth, and the priority of webpages. The page acquisition module 100 hands over the acquired page to the page processing module 200 for processing.
[0027] The page acquisition module 100 starts from the list of seed URL...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com