Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method of adopting proxy IP to crawl website data, storage medium and server

A technology for fetching data and data, applied in the field of using proxy IP to crawl website data, can solve problems such as the disadvantage of website data crawling, and achieve the effect of ensuring reliability

Active Publication Date: 2018-07-31
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF2 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, repeatedly using the same proxy IP to crawl website data will also cause the website to blacklist the proxy IP, which is extremely unfavorable to crawling website data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method of adopting proxy IP to crawl website data, storage medium and server
  • Method of adopting proxy IP to crawl website data, storage medium and server
  • Method of adopting proxy IP to crawl website data, storage medium and server

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The embodiment of the present invention provides a method, a storage medium and a server for crawling website data by proxy IP, which are used to solve the problem that the proxy IP is easily blocked when crawling website data.

[0033] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the following The described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0034] see figure 1 In an embodiment of the present invention, an embodiment of a method for crawling ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method of adopting a proxy IP to crawl website data, a storage medium and a server, which are used for solving the problem that proxy IPs are liable to be blocked when website data are crawled. The method provided by the invention includes: determining a target website of to-be-crawled data; determining an IP class, which is suitable for use in the target website, according to attribute information of the target website, wherein attribute information of websites in which preset IP classes are respectively suitable for use is preset for all the preset IP classes; screening out all proxy IPs, which belong to the determined IP class, from an IP address pool, wherein the IP address pool includes multiple proxy IPs which are pre-collected and pre-classified and are mutually different; selecting the one proxy IP from all the screened-out proxy IPs; and adopting the selected proxy IP to access the target website, and executing a current data crawling task.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method, a storage medium and a server for crawling website data by proxy IP. Background technique [0002] In the Internet environment, data is a very important asset, and the crawler system is one of the important ways to effectively obtain data. [0003] At present, many websites adopt anti-crawler technology, and use the same IP to crawl website data, which is often easily identified by the website and pulled into the blacklist, so that the website data cannot continue to be crawled. At this time, if you want to continue to obtain the data of the website, using proxy IP to access the website for data crawling is one of the effective ways. [0004] However, repeatedly using the same proxy IP to crawl website data will also cause the website to blacklist the proxy IP, which is extremely unfavorable to crawling website data. Contents of the invention [0005] Embodi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/951G06F16/955
Inventor 李晨光王盼
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products