Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for obtaining proxy of web crawler

A crawler and program technology, applied in the field of big data analysis, can solve the problems of low crawler collection efficiency, lack of differentiated levels, and inability to replenish crawler program agent queues in a timely manner

Active Publication Date: 2018-03-23
BEIJING JINTI TECH CO LTD
View PDF5 Cites 26 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is how to solve the problem that the existing crawler program lacks a distinction level based on "quality" for the use of agents and cannot replenish the corresponding agent queue of the crawler program in time, resulting in low crawler collection efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for obtaining proxy of web crawler
  • Method and device for obtaining proxy of web crawler
  • Method and device for obtaining proxy of web crawler

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0060] figure 1 It is a schematic flow chart of the method for proxy acquisition of the crawler program provided in this embodiment, see figure 1 , the method includes:

[0061] 101: Obtain an agent that has passed the first test as an available agent, add the available agent to the first queue, obtain an available agent that has passed the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention discloses a method and device for obtaining a proxy of a web crawler. The method directs at a valid proxy in a first queue to obtain a valid proxy through second detection. After receiving return information of the webpage to be crawled by the web crawler through the valid proxy, evaluation parameters of the valid proxy are updated according to the return information. The 'high quality' proxy is added to a second queue through the evaluation parameters. Since the 'high quality' proxy is added to the first queue from the second queue at the interval of first preset time period, the situation that the valid proxy is banned because the web crawler frequently fetches the valid proxy from the first queue is avoided. The method realizes the level distinguishing ofthe proxy according to 'quality' through the second queue, and the valid proxy is added to the first queue from the second queue at the interval of first preset time period so as to avoid the frequent use of the 'high quality' proxy, supplement the number of the proxy in the first queue and improve the efficiency of information collection of the web crawler.

Description

technical field [0001] The embodiment of the present invention relates to the technical field of big data analysis, and in particular to a method and device for obtaining a proxy of a crawler program. Background technique [0002] A crawler is a program or script that automatically grabs information according to certain rules. With the development of the big data industry, crawler technology for data information collection has become an important link. Crawler programs usually use a large number of third-party proxy IP polling switches to collect information, so as to avoid being banned too frequently. [0003] However, usually a large number of third-party proxies obtained are not necessarily effective and available, and many of them are invalid proxies; and some proxies have slow access speed and low collection efficiency; sometimes the same proxies are frequently used, resulting in the proxies being blocked. In addition, the agents provided by the agent provider general...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06F11/34
CPCG06F11/3452G06F16/951
Inventor 吕光增柳超
Owner BEIJING JINTI TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products