Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method, storage medium and server for crawling website data

A technology for fetching data and websites, which is applied in the field of data processing and can solve problems such as system inaccessibility

Active Publication Date: 2021-02-05
ONE CONNECT SMART TECH CO LTD SHENZHEN
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the crawler system is one of the important ways to effectively obtain data. However, many websites block the crawler system by requiring the input of verification codes, so that the system cannot access these websites and complete data crawling.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method, storage medium and server for crawling website data
  • A method, storage medium and server for crawling website data
  • A method, storage medium and server for crawling website data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Embodiments of the present invention provide a method, storage medium and server for crawling website data, which are used to solve the problem that many websites use a method of requiring input of a verification code to shield the crawler system, resulting in the crawler system being unable to crawl data.

[0031] In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the following The described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0032] see figure 1 , ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method, a storage medium and a server for crawling website data, which are used to solve the problem that many websites adopt the method of requiring the input of a verification code to shield the crawler system, resulting in the crawler system being unable to crawl data. The method provided by the present invention includes: initiating an access request to the target website for crawling data; obtaining the target verification code corresponding to the feedback information on the target website after receiving the feedback information that the target website requires input of a verification code picture; put the target verification code picture into the pre-trained machine learning model for identification, and obtain the verification code answer output by the machine learning model; execute the verification code input required by the target website according to the output verification code answer A verification operation; crawling data from the target website after passing the verification of the target website.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a method for crawling website data, a storage medium and a server. Background technique [0002] In the Internet environment, data is a very important asset. At present, the crawler system is one of the important ways to effectively obtain data. However, many websites block the crawler system by requiring input of a verification code, so that the system cannot access these websites and complete data crawling. Contents of the invention [0003] The embodiment of the present invention provides a method, a storage medium and a server for crawling website data, which can automatically complete the verification of the target website, break through the obstacles of the website to crawling data, and enable the crawler system to smoothly crawl the data on the website. [0004] In the first aspect, a method for crawling website data is provided, including: [0005] Initiate an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/953G06K9/62
CPCG06F16/951G06F18/2411G06F18/214
Inventor 李晨光王盼
Owner ONE CONNECT SMART TECH CO LTD SHENZHEN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products