Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Website asynchronous sequence data intelligent acquisition method based on dynamic self-adaption

A dynamic self-adaptive, sequential data technology, applied in the direction of network data indexing, network data retrieval, and other database retrieval, can solve the problems of high cost and low efficiency of the rule engine, prevent data analysis, ensure continuity and security Effect

Pending Publication Date: 2022-04-08
THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Its advantage is that the rules are clear and reliable, and the rules can be set in real time according to the characteristics of the found crawlers, so as to achieve confrontation with crawlers. When the rules become more and more complex and the number is large, the efficiency of the rule engine will become lower and lower, and the cost will be higher. Big

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Website asynchronous sequence data intelligent acquisition method based on dynamic self-adaption
  • Website asynchronous sequence data intelligent acquisition method based on dynamic self-adaption
  • Website asynchronous sequence data intelligent acquisition method based on dynamic self-adaption

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] The account registration and keep-alive technology based on user behavior learning realizes the automatic update and dynamic keep-alive of the collection resource account pool, and realizes the intelligent distribution of crawler clusters and IP resources through the study of crawler scheduling model based on user Internet access behavior learning Scheduling to improve the concealment of crawlers and ensure that crawlers are highly anti-crawling in the monitoring of website abnormal access; through non-hacking intrusive restricted content access technology research and automatic allocation of account pools, ensure that crawlers can smoothly collect the resources of the target website ;Research on intelligent automatic compilation technology and adaptive matching and extraction technology of web page elements to achieve high-precision identification of target website resources.

[0057] Step 1. Crawler scheduling based on user Internet access behavior learning: By using t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a website asynchronous sequence data intelligent acquisition method based on dynamic self-adaption, which aims at common anti-crawling technologies such as an ajax dynamic loading technology, dynamic cookie access restriction, system security protection, abnormal access behavior monitoring and abnormal account monitoring on the basis of an existing crawler cluster, an account pool and an IP pool. Real-time data acquisition of a target mainstream website is realized through a dynamic self-adaptive intelligent acquisition system, continuous acquisition and hidden transmission of data of the target website are realized through distributed web crawler secure return, and a foundation is laid for further data analysis.

Description

technical field [0001] The invention relates to the field of data collection, in particular to an intelligent collection method for website asynchronous sequence data based on dynamic self-adaptation. Background technique [0002] With the open and explosive growth of the Internet, the value of data has become more and more important, especially for e-commerce, media, social networking and other businesses. It is not an exaggeration to compare data to gold. As a result, web crawler technology was born. Hackers can obtain valuable data in batches by calling the free interface opened by the website for data mining and analysis of industry conditions. However, a large number of illegal crawlers will cause huge pressure on the website server, and even affect the access of normal users; and the theft of valuable data will also have a negative impact on the commercial interests of the website. [0003] Therefore, anti-reptile technology came into being. Anti-crawler technology g...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/951G06F16/955G06F12/12G06F9/48
Inventor 梁增玉卜华奇贺成龙丁灿顾学海刘蛰张志垚尹晓阳吴嘉逸刘佳林
Owner THE 28TH RES INST OF CHINA ELECTRONICS TECH GROUP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products