Dynamic webpage crawling method and device
A technology of webpage crawling and webpage application in the Internet field, which can solve the problems of repeated webpage crawling and high complexity of crawling time, and achieve the effect of improving the efficiency of leaving and entering the team and improving the efficiency of crawling
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0046] The present invention will be described in further detail below through specific embodiments and in conjunction with the accompanying drawings.
[0047] Existing webpage crawling generally realizes the crawling of the link library through a scheduling strategy, and the crawling process is carried out according to the priority of the webpages in the webpage database. However, when the number of webpages reaches a million level, each selection During the step of fetching the url list, the crawler can only wait, which wastes the crawling ability of the crawler.
[0048] In order to solve the above problems, the present invention provides a dynamic-based webpage crawling method. Such as figure 1 As shown, the method includes the following steps:
[0049] S101. Set up at least two queues, crawl and store URLs and priorities of webpages to be crawled in the at least two queues, and perform scheduling according to the priorities of the URLs stored in the at least two queues....
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com