Data acquisition method and device, electronic device and storage medium

A technology of data acquisition and electronic equipment, applied in the field of web crawlers, can solve the problem of low efficiency of crawling data, and achieve the effect of improving crawling efficiency

Inactive Publication Date: 2019-06-25
BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Some related web crawlers generally crawl data in the network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data acquisition method and device, electronic device and storage medium
  • Data acquisition method and device, electronic device and storage medium
  • Data acquisition method and device, electronic device and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0110] As an embodiment, the device further includes:

[0111] The conduit module is configured to perform structured processing on the data crawled by the crawling module to obtain structured data; and store the structured data in the database of the cluster.

[0112] The conduit module can be a Pipeline (computer term, conduit), and the Pipeline can also filter the data, and can also construct an instance object, that is, the above-mentioned structured processing.

[0113] In one case, the conduit module can directly use the data crawled by the crawling module as the crawling result. In another case, the conduit module can also perform structured processing on the data crawled by the crawling module. Structured processing can be understood as processing the crawled data into data with a preset structure, which is referred to as structured data here. The structured data is then used as the crawling result. Or, in another case, the conduit module can also filter the data cr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a data acquisition method and device, an electronic device and a storage medium. The method comprises the following steps of: each processing module in the cluster can obtain the crawling request from the distributed queue; after crawling the data according to the obtained crawling request, the processing module can continue to package the link into a new crawling request if the link is still included in the data, and the new crawling request is delivered to the distributed queue, so that each processing module can crawl the data in parallel, and the crawling efficiency is improved.

Description

technical field [0001] The present application relates to the technical field of web crawlers, and in particular, to a data acquisition method, apparatus, electronic device and storage medium. Background technique [0002] The amount of data in the network is getting larger and larger, and it is far from being able to effectively utilize the data in the network only through manual search and visual analysis. At present, data in the network is generally crawled through a web crawler. A web crawler (also known as a web spider, web robot, etc.) is a program or script that automatically crawls data in the web according to certain rules. [0003] Some related web crawlers generally crawl data in the network based on a single device, so the efficiency of crawling data is low. SUMMARY OF THE INVENTION [0004] The purpose of the embodiments of the present application is to provide a data acquisition method, apparatus, electronic device, and storage medium, so as to improve the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/951
Inventor 胡凌云丁国航
Owner BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products