Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Automatic optimized crawler grab method

An automatic optimization and crawler technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of reducing system performance, increasing resource consumption, and high frequency of information capture, so as to optimize system resources, improve efficiency, The effect of improving system performance

Active Publication Date: 2008-05-28
北京酷讯科技有限公司
View PDF0 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, the above method of using crawlers to grab online information is still an information grabbing method based on an ideal state. In practical applications, the crawler's crawling efficiency cannot be maximized, because the release of new information often has It is extremely time-sensitive, and the release is more concentrated in one time period, and it is relatively quiet at other times. For example, the annual peak sales of train tickets, air tickets and long-distance bus tickets are winter and summer vacations and Golden Week. The peak is a period of time before and after the graduates of colleges and universities leave school every year, etc.
If we use the same frequency to capture the above information during the peak release period and the information during the low release period, obviously the highest efficiency of information capture cannot be achieved, because the frequency of information capture during the peak release period is relatively low, which affects all The timeliness of capturing information, the frequency of information capturing is relatively high for the low release period, which reduces system performance and increases unnecessary resource consumption
Until now, there is no way to effectively solve this problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Automatic optimized crawler grab method
  • Automatic optimized crawler grab method
  • Automatic optimized crawler grab method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] Below in conjunction with accompanying drawing, the present invention will be described in further detail:

[0015] Kuxun's crawler scheduling algorithm uses several factors such as observing whether the index page download is successful, whether the size changes, when the page information symbol needs, whether there are valid information links, the number of valid information to be crawled, and the crawling time. Calculate refresh rate. This method mainly corrects the frequency of information capture in the computer system according to the following formula.

[0016] freq ( n , ch , t ) = fCH ( ( αK down ( 1 - ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a grasping means for automatically optimizing a creeper. The prior creeper grasping web page means uses the same frequency to grasp issued peak information and issued trough information, thereby influencing the timeliness for grasping the information, reducing the efficiency of a system and increasing the pointless resource consumption. In order to solve the problem, the invention includes the following steps: firstly, the information is cramped out of an information page grasped from the Internet, if the cramping-out is successful, then the frequency for cramping out the information page again is quickened; or else, the frequency for cramping out the information page again is slowered; secondly, the step one is repeated when the amended frequency is reached. The invention is applicable for the various prior search engines.

Description

technical field [0001] The invention relates to a method for grabbing information by a web crawler, in particular to a method for a search engine to grab information by using crawler technology and automatically optimize the grabbing frequency. Background technique [0002] Search engine is a technology that is widely used on the Internet today. People only need to input some keywords of the information they are looking for to find a large amount of information related to this keyword through search engines, such as Google and Baidu. [0003] There are various sources of information for search engines, some of which pay advertising fees to the search engine operator from the advertiser who initiates the advertisement in the form of bidding advertisement, and then the search engine operator publishes the brief information and information of the advertisement in its own search engine. The link of the advertisement, and more non-advertising information, such as news and academi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 陈华
Owner 北京酷讯科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products