Cascade crawling method and device for multi-level pages based on web crawlers
A web crawler and page technology, applied in the field of data crawling, can solve the problems of difficult data access, failure to restore the original data level, task identification does not reflect the hierarchical relationship, etc., to achieve the effect of ensuring data integrity and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0030] Such as figure 1 As shown, a web crawler-based multi-level page cascading crawling method includes the following steps: grabbing the upper-level page data, storing the captured data in the upper-level page data analysis table, and analyzing the upper-level page data In the table, the primary key value is set for the object that needs to continue to grab the lower-level page. The primary key value has uniqueness, and the corresponding primary key values of each described object are all different; the superior page where the object is identified by the primary key value and Associate the lower-level page through the primary key value; click the URL link of the upper-level page, access the lower-level page through crawler simulation, capture the data of the lower-level page and store the captured data in the lower-level page data analysis table, and compare the data of the lower-level page The analysis table is set to associate the foreign key value of the upper-level pa...
Embodiment 2
[0042] A cascade crawling device based on a web crawler for multi-level pages, the device includes a microprocessor and a memory, a program is stored on the memory, the microprocessor runs the program and performs the following steps: grabbing the upper level Page data, and the captured data is stored in the upper-level page data analysis table, and the primary key value is set for the object that needs to continue to grab the lower-level page in the upper-level page data analysis table. The primary key value is unique. Through The primary key value identifies the upper-level page where the object is located and associates the lower-level page through the primary key value; click the URL link of the upper-level page, access the lower-level page through crawler simulation, grab the data of the lower-level page and store the captured data in the lower-level page data analysis table, and set the foreign key value used to associate the upper-level page with the lower-level page dat...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com