Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing method and device based on spark

A data processing and data technology, applied in the computer field, can solve the problems of data processing efficiency reduction, achieve the effect of saving storage space, saving data processing time, and improving data processing efficiency

Active Publication Date: 2018-06-26
BEIJING GRIDSUM TECH CO LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The main purpose of the present invention is to provide a data processing method and device based on Spark, to solve the problem of reduced data processing efficiency due to the use of existing data processing methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing method and device based on spark
  • Data processing method and device based on spark
  • Data processing method and device based on spark

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0023] According to an embodiment of the present invention, a Spark-based data processing method is provided, such as figure 1 As shown, the method includes:

[0024] S102, acquiring data to be processed;

[0025] S104, extracting the feature identifier of the data to be processed, wherein the feature identifier is used to identify the file type of the data to be processed;

[0026] S106. Write the data to be processed into the target file corresponding to the feature identifier according to the feature identifier.

[0027] Optionally, in this embodiment, the above-mentioned Spark-based data processing method can be applied to the log data writing process, but is not limited to, for example, the above-mentioned data to be processed is the log data obtained after parsing the log file, from which The characteristic identifier of the log data, and write the log data into a corresponding file according to the characteristic identifier, so that the log data with the same characte...

Embodiment 2

[0063] According to an embodiment of the present invention, a Spark-based data processing device for implementing the above-mentioned Spark-based data processing method is also provided, such as image 3 As shown, the device includes:

[0064] 1) an acquisition unit 302, configured to acquire data to be processed;

[0065] 2) Extraction unit 304, configured to extract the feature identifier of the data to be processed, wherein the feature identifier is used to identify the file type of the data to be processed;

[0066] 3) The processing unit 306 is configured to write the data to be processed into the target file corresponding to the feature identifier according to the feature identifier.

[0067] Optionally, in this embodiment, the above-mentioned Spark-based data processing device may be applied in the process of writing log data, but not limited to, for example, the above-mentioned data to be processed is the log data obtained after parsing the log file, from which The c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Spark based data processing method and device. The Spark based data processing method comprises the steps of obtaining data to be processed; extracting feature identifications of the data to be processed, wherein the feature identifications are used for identifying file types of the data to be processed; writing the data to be processed into the file types corresponding to the feature identifications according to the feature identifications. The technical problem of data processing efficiency reduction caused by means of an existing data processing mode is solved by means of the Spark based data processing method and device.

Description

technical field [0001] The present invention relates to the field of computers, in particular to a Spark-based data processing method and device. Background technique [0002] At present, in order to record the user's daily operations, the system usually saves the user's operation log. During the processing of the log, the input log data may be a one-hour log file or a multi-day log file. Among them, due to The data volume of log files is huge, and it usually needs to be processed in parallel by computer clusters. Currently the most popular parallel computing framework is Spark, which uses a unified RDD data structure for data processing. However, in the official application programming interface (Application Programming Interface, API), an RDD data structure can only generate one kind of file output, and it is not possible to directly output multiple files. However, the data ParsedObject obtained after parsing each log line set in the input log file may belong to differen...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1734G06F16/1737
Inventor 饶峰云
Owner BEIJING GRIDSUM TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products