Webpage data capturing method and device, storage medium and equipment

A webpage data and data capture technology, applied in the computer field, can solve problems such as interruption of webpage data capture, inconvenient handling of sharing problems, weak redirection, etc.

Pending Publication Date: 2020-08-18
BEIJING MININGLAMP SOFTWARE SYST CO LTD
View PDF6 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, under such a complex login process, current web crawlers such as Scrapy (crawling) frameworks have very weak ability to redirect (redirect) between different domain names, and it is inconvenient to handle cookies under different domain names (stored on the user's local terminal) data) sharing problem, when encountering some content that requires user authentication and authorization or triggering the anti-robot detection of the website, it will cause the web crawler to stop grabbing web page data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Webpage data capturing method and device, storage medium and equipment
  • Webpage data capturing method and device, storage medium and equipment
  • Webpage data capturing method and device, storage medium and equipment

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application.

[0025] An embodiment of the present application provides a method for capturing web page data. figure 1 It is a schematic flow chart of the web page data capture method in the embodiment of the present application, see figure 1 As shown, the method may include:

[0026] Step 101: After successfully logging in the target webpage through the headless browser, if the headless browser monitors the AJAX (Asynchronous JavaScript and XML, asynchronous JavaScript and XML) request of the target webpage, the AJAX request will be monitored by the headless browser The authorization authentication information carried in the request is stored in the cache;

[0027] Step 102: read the authorization and authentication information from the cache through the data grabbing script, and add th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a webpage data capturing method and device, a storage medium and equipment. The method comprises the steps that after a target webpage is successfully logged in through a headless browser, if the headless browser monitors an AJAX request of the target webpage, authorization authentication information carried in the monitored AJAX request is stored in a cache through the headless browser; the authorization authentication information is read from the cache through the data capture script, and the read authorization authentication information is added intoan access request for capturing webpage data; and webpage data returned by the server is captured after passing the authentication based on the authorization authentication information through the data capture script and based on the access request containing the authorization authentication information. Therefore, the webpage data can be effectively captured.

Description

technical field [0001] The present application relates to the field of computer technology, and in particular to a method, device, storage medium and equipment for capturing web page data. Background technique [0002] With the rapid development of Internet information technology, more and more websites have emerged. Corresponding further processing can take place based on the data captured from the website. For example, data services such as data analysis and intelligent recommendation can be provided by capturing relevant data of the website for analysis and processing. [0003] At present, in large-scale web application clusters, Single Sign On (SSO) is often used as a user authentication method. This authentication method allows users to log in only once with the same account in multiple application systems. , you can access other mutually trusted application systems. However, under such a complex login process, current web crawlers such as Scrapy (crawling) framework...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F21/52
CPCG06F16/951G06F21/52
Inventor 张雨张树强
Owner BEIJING MININGLAMP SOFTWARE SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products