Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Cloud computing system

A cloud computing and crawler technology, applied in the field of cloud computing systems, can solve problems such as target website banning, failure to realize distribution, target website discovery and banning, etc.

Inactive Publication Date: 2019-10-08
BEIJING DIDI INFINITY TECH & DEV
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] (1) Can only crawl webpages, or only have the ability to simply crawl new links;
[0005] (2) Only html pages can be captured, and valid data of dynamic pages cannot be captured;
[0006] (3) Crawling is not distributed, it is only based on a single machine or a simple homogeneous cluster, and the efficiency of data reading and parsing is low;
[0007] (4) Lack of export pressure control on crawling operations, which is easy to be discovered and blocked by the target website;
[0008] (5) If the export IP (Internet Protocol, Internet Protocol) address is an IP address segment provided by a local operator in China, it is easy to be blocked by the target website;
[0009] (6) The capture systems of various companies and business lines cannot form a platform, and the cost of independent maintenance and development is extremely high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Cloud computing system
  • Cloud computing system
  • Cloud computing system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0063] figure 1 A schematic block diagram of a cloud computing system according to an embodiment of the present invention is shown.

[0064] image 3 A schematic diagram of data interaction of a cloud computing system according to an embodiment of the present invention is shown.

[0065] Such as figure 1 and image 3 As shown, the cloud computing system 100 according to an embodiment of the present invention includes: an application program interface 102, which is used to provide a user interface to obtain a grabbing task submitted by a user; a seed bank 104, connected to the application program interface 102 , for pre-storing the resource locator corresponding to the crawling task; the task generator 106 is connected to the seed library 104, used to obtain the resource locator, and delivers the resource locator to the corresponding crawler module 108; the crawler module 108 is connected to The task generator 106 is configured to capture corresponding website data and / or w...

Embodiment 2

[0113] figure 2 A schematic block diagram of a cloud computing system according to another embodiment of the present invention is shown.

[0114] image 3 A schematic diagram of data interaction of a cloud computing system according to an embodiment of the present invention is shown.

[0115] Such as figure 2 and image 3 As shown, the cloud computing system 200 according to the embodiment of the present invention includes: a webpage crawling subsystem and a webpage parsing subsystem. The two subsystems are independent in function, and the data flow is decoupled through HDFS to execute a separate task management system.

[0116] Such as figure 1 and image 3 As shown, the webpage crawling subsystem obtains submission URL and creation / deletion tasks through the API 102, and the webpage parsing subsystem obtains and submits the analysis algorithm / application through the API 102, and edits the analysis algorithm / application and other operation requests.

[0117] Specificall...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a cloud computing system which comprises: an application program interface used for providing a user interface so as to obtain a capture task submitted by a user; a seed librarywhich is connected to the application program interface interface and is used for pre-storing resource locators corresponding to the grabbing tasks; a task generator which is connected to the seed library and is used for acquiring the resource locators and delivering the resource locators to the corresponding crawler modules; and a crawler module which is connected to the task generator and is used for capturing corresponding website data and / or webpage data according to the resource locators. Through the technical scheme of the invention, the whole network is supported to capture data, and the reliability of capturing valid data is improved.

Description

technical field [0001] The present invention relates to the field of network technology, in particular to a cloud computing system. Background technique [0002] Web scraping (also known as web data extraction or web crawling) refers to obtaining data from the Internet, converting the obtained unstructured data into structured data, and finally storing the effectively structured data in a local computer or database , for further data analysis. [0003] Among related technologies, web crawling has at least the following technical defects: [0004] (1) Can only crawl webpages, or only have the ability to simply crawl new links; [0005] (2) Only html pages can be captured, and valid data of dynamic pages cannot be captured; [0006] (3) Crawling is not distributed, it is only based on a single machine or a simple homogeneous cluster, and the efficiency of data reading and parsing is low; [0007] (4) Lack of export pressure control on crawling operations, which is easy to ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/951G06F16/955G06F16/18
CPCG06F16/1815G06F16/951G06F16/958G06F16/9566G06F9/541H04L67/02
Inventor 陈桦
Owner BEIJING DIDI INFINITY TECH & DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products