Spark streaming based big data stream processing method and system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A processing method and technology of a processing system, applied in the field of big data flow processing, can solve problems such as incorrect update of variable state, difficulty, non-native support, etc., and achieve better fault tolerance, faster processing speed, and improved processing efficiency.

Inactive Publication Date: 2016-09-07

北京思特奇信息技术股份有限公司

View PDF3 Cites 14 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] But Storm has its own flaws, for example: In terms of fault tolerance, data guarantee, each individual record in Storm must be tracked as it passes through the system, so Storm can at least guarantee that each record will be processed once, but in recovering from errors Duplicate records are allowed when coming over, which means that the mutable state may be incorrectly updated twice; in terms of implementation and programming API, because the core of Storm is written in clojure (but most of the expansion work is written in java) , which brings some difficulties for us to understand its implementation; in terms of cluster management integration, Storm can run on its own cluster, and Storm can also run on Mesos, but when running on YARN, a third-party support component is required Storm on YARN, not natively supported

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0049] The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

[0050] Spark Streaming is an extension of the spark core API, which enables high-throughput, fault-tolerant stream processing of real-time data streams. There are many data sources for Spark Streaming, including kafka, flume, twitter, ZeroMQ or traditional TCP sockets.

[0051] Spark Streaming is an extension of the core Spark API. It does not process data streams one at a time like Storm, but pre-segments them into batch jobs at time intervals before processing. Spark's abstraction for continuous data flow is called DStream (DiscretizedStream), a DStream is a micro-batching (micro-batching) RDD (elastic distributed data set); and RDD is a distributed data set that can be The two methods operate in parallel, namely...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a spark streaming based big data stream processing method and system. The method includes: a step S1, receiving data sent by a data source at an appointed position, executing a step S2 if the data source is an HDFS, and executing a step S3 if the data source is an FLUME; the step S2, storing the data in a file form, and executing the step S3; the step S3, processing the received data or file through the spark streaming; and a step S4, writing the processing result of the file or the data in a result catalogue through the spark streaming according to a time interval. The method and system provide good fault-tolerant state calculation for fault-tolerant and data assurance, can support Scala programming and Java programming in the aspect of API programming; and in cluster management integration, the Spark Streaming can run on clusters thereof, and can run on a YARN and an Mesos.

Description

technical field [0001] The invention relates to the field of big data stream processing, in particular to a spark streaming-based big data stream processing method and system. Background technique [0002] In the prior art, Storm is often used to implement a data flow model. When Storm is used to implement a data flow model, data continuously flows through a transformation entity network. An abstraction of a stream of data is called a stream, which is an infinite sequence of tuples. A tuple is like a structure that uses some additional serialization code to represent standard data types (such as integers, floats, and byte arrays) or user-defined types. Each stream is defined by a unique ID, which can be used to build a topology of data sources and sinks. [0003] But Storm has its own flaws, for example: In terms of fault tolerance, data guarantee, each individual record in Storm must be tracked as it passes through the system, so Storm can at least guarantee that each rec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

CPCG06F16/24568

Inventor 杜旭苗

Owner 北京思特奇信息技术股份有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Spark streaming based big data stream processing method and system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology