Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Webpage crawling cycle adjusting method and device

A technology of webpage crawling and adjustment methods, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as waste of resources, high frequency of crawling, and inability to actually reflect the update of the target webpage

Active Publication Date: 2013-05-08
人民数据管理(北京)有限公司
View PDF5 Cites 37 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

So, in this case, even though there is a new link in the landing page, it doesn't actually reflect that the landing page has been updated
In this way, if the cycle of crawling the target web page is only adjusted based on the number and proportion of newly added links in the target web page, the crawl cycle may be determined to be inconsistent with the actual update frequency of the target web page, resulting in an excessively high crawl frequency , resulting in a waste of resources

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage crawling cycle adjusting method and device
  • Webpage crawling cycle adjusting method and device
  • Webpage crawling cycle adjusting method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, but not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

[0051] The present application may be used in numerous general purpose or special purpose computing device environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet-type devices, multi-processor devices, distributed computing environments including any of the above, and the like.

[0052] The application may be described in the general context of computer-executable instructio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a webpage crawling cycle adjusting method and a device. The method comprises the steps of acquiring a link assembly included in a target webpage crawled by a web crawler currently and information pages pointed by links in the link assembly; determining new-generated links in the link assembly and belonging to the target webpage, and taking links, belonging to websites which are the same as a first website, in the new-generated links, as links to be analysed, wherein webpage addresses in crumb navigation of information pages pointed by the links to be analysed are the same as a first webpage addresses; and adjusting crawling cycles of target webpages according to the links to be analysed existing in the target webpages crawled at different crawling moments within an appointed time. The webpage crawling cycle adjusting method is capable of improving accuracy of the determined crawling cycle of a webpage and reducing resource waste.

Description

technical field [0001] The present application relates to the technical field of network information processing, and in particular, to a method and device for adjusting a webpage crawling period. Background technique [0002] A web crawler is a program that automatically extracts web pages and is an important part of search engines. The webpage information of the webpage is crawled by the web crawler, and the crawled webpage information is analyzed to determine whether a new link is generated in the webpage to determine whether a new content page (that is, the actual page pointed to by the link) is generated in the webpage. content), so that changes to web pages can be monitored. [0003] Under normal circumstances, the web crawler crawls the webpage information of the target webpage at fixed time intervals according to the crawling cycle of the target webpage, but if the crawling frequency of the target webpage is too low, it is likely to miss the target webpage. Correspo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 崔世起杨青
Owner 人民数据管理(北京)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products