Data crawling method and system

A data and final data technology, applied in the field of data crawling methods and systems, can solve the problems of complex crawler programs and increased crawler costs.

Inactive Publication Date: 2016-09-07
ZHUHAI GOTECH INTELLIGENT TECH
View PDF4 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the webpage adopts many methods to prevent crawlers, then the crawler program becomes very complicated, which directly increases the cost of crawlers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data crawling method and system
  • Data crawling method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0037] The embodiment of the invention discloses a data crawling method and system to realize low-cost crawling data. It should be noted that the method for crawling data in this embodiment first needs to install Selenium, which is a tool for testing web applications. Selenium tests run directly in the browser, just like real users. It supports the current mainstream browsers and can run on multiple platforms.

[0038] see figure 1 , a data crawling method provi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data crawling method and system. The method comprises following steps: obtaining a target website from a url queue and obtaining source codes of the target website; saving the source codes of the target website in to a html queue, and resolving final data of the target website from the source codes of the target website; determining whether a url website exists in the source codes of the target website; if a url website exists, extracting the url website from the source codes of the target website and saving the url website into the url queue. According to the embodiment of the invention, source codes are obtained from pre-stored websites saved in the url queue, and the url websites extracted from the source codes are put in the url queue, and final data of the source codes is obtained from the html queue. By means of browser accessing, anti-crawling means are bypassed, assigned information can be obtained, quick data crawling is realized and data crawling cost is reduced.

Description

technical field [0001] The present invention relates to the technical field of data crawling, and more specifically, to a data crawling method and system. Background technique [0002] In the daily work of Web front-end development, it is often necessary to collect a large amount of information from the Internet. If it is done manually, it will consume a lot of manpower and time, so a better way is to write crawler scripts to help us complete the information collection. The crawler program will always send http requests to the server, and the server needs to receive these requests, process them accordingly, and finally return the data. However, crawlers can also use this principle to maliciously attack the server, using multiple programs to send http requests to the same server at the same time, causing the server to be busy with processing, thereby reducing server performance and affecting server stability. Therefore, some servers take measures to prevent their content fr...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
CPCG06F16/951
Inventor 祝奔
Owner ZHUHAI GOTECH INTELLIGENT TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products