Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Directional and quantitative Internet data acquisition method and system

A technology of data collection and Internet, applied in the direction of network data query, network data retrieval, network data browsing optimization, etc., can solve the problems of large resource node occupancy, long collection time, missing target number, etc., and achieve less occupied collection nodes. , The collection time is short, and the effect of avoiding data leakage

Inactive Publication Date: 2020-09-29
浪潮卓数大数据产业发展有限公司
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The technical task of the present invention is to provide a directional and quantitative Internet data collection method and system to solve the problems of how to avoid the long collection time caused by the large collection range, the large resource node occupancy, and the missed collection of the target number

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Directional and quantitative Internet data acquisition method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0040] as attached figure 1 As shown in the figure, the directional quantitative Internet data collection method of the present invention, the method is to send a retrieval request to a website through a self-defined data display upper limit and an offset value, obtain the associated customized retrieval result, and traverse through one or a few requests. Obtain the full amount of data, and then combine the retrieved results for structured processing, and save them into the database to achieve the purpose of data collection; among them, the associated customized retrieval results refer to the customized data display upper limit and offset value consistent with Response data; obtaining the full amount of data through one or less request traversal means that the number of access requests sent to obtain the full amount of data published by the website only needs to be sent once or less than the total number of pages displayed by default on the website. details as follows:

[004...

Embodiment 2

[0052] The directional quantitative Internet data collection system of the present invention includes,

[0053] The default parameter acquisition module is used to intercept the retrieval request or page-turning request sent to the target website through browser development tools or data collection tools, and obtain the display upper limit of each page and the current number of pages (ie offset value). each request parameter name and value;

[0054] The parameter customization module is used to artificially adjust and increase the value of the display upper limit according to the total amount of target data of the website and set a reasonable offset, and divide the full amount of data into blocks smaller than the total number of pages of the website; among them, the offset The product of the maximum value of the amount and the value of the display upper limit per page is less than or equal to the total amount of target data.

[0055] The test request sending module is used fo...

Embodiment 3

[0059] Embodiments of the present invention also provide an electronic device, including: a memory and a processor;

[0060] wherein, the memory stores computer-executed instructions;

[0061] The one processor executes the computer-executable instructions stored in the memory, so that the at least one processor executes the targeted quantitative Internet data collection method in Embodiment 1.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a directional and quantitative Internet data acquisition method and system, and belongs to the field of big data application and analysis. The technical problems to be solved by the invention are how to avoid long acquisition time, large resource node occupation amount and missing acquisition of a target number caused by a large acquisition range. According to the technicalscheme, the method comprises the steps that a retrieval request is sent to a website through a self-defined data display upper limit and an offset value, an associated customized retrieval result isobtained, full data is obtained through one-time or few-time request traversal, the obtained retrieval results are combined and then subjected to structured processing, the combined retrieval resultsare stored in a warehouse, and therefore the purpose of data collection is achieved. The system comprises a default parameter acquisition module, a parameter customizing module, a test request sendingmodule, a full data acquisition module and a data processing and storage module.

Description

technical field [0001] The invention relates to the field of big data application and analysis, in particular to a data acquisition method and web crawler technology in the field of data mining, in particular to a directional quantitative Internet data acquisition method and system. Background technique [0002] Today's society is a rapidly developing society. With the rapid development and popularization of computer and information technology, the scale of industrial application systems has expanded rapidly, and the data generated by industrial applications has exploded. People are increasingly aware of the importance of data. The concept of data has attracted widespread attention from practitioners and users in various industries. The mining and application of massive data heralds the arrival of a new wave of productivity growth and consumer surplus. With the development of government information disclosure and enterprise digitization, a large amount of valuable data can ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/953G06F16/957G06F16/958
CPCG06F16/953G06F16/9577G06F16/958
Inventor 邢荣李一峰
Owner 浪潮卓数大数据产业发展有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products