Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data processing and distribution method and system based on hadoop system

A data processing and data technology, applied in the field of big data processing

Inactive Publication Date: 2017-01-04
BEIJING GEO POLYMERIZATION TECH
View PDF7 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, in this method, the data is screened and then interacted. The data does not need to be numbered sequentially. When the data size is very large, the limitation of memory and bandwidth will restrict the execution of the task.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data processing and distribution method and system based on hadoop system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0015] like figure 1 As shown, this hadoop system-based data processing and distribution method includes the following steps:

[0016] (1) Massive data is numbered in a multi-task sequence so that the number of each data is unique;

[0017] (2) Perform multi-task concurrent transmission of massive data, and start multiple tasks to transmit a part of numbered data respectively.

[0018] The present invention carries out the multi-task concurrent transmission of the massive data by sequentially numbering the massive data, so that when the data scale is extremely large, the task execution will not be limited by the system memory and bandwidth.

[0019] In addition, the step (1) includes the following sub-steps:

[0020] (1.1) Start multiple tasks to process a part of the data, complete the part number, and record the maximum value;

[0021] (1.2) On the basis of the part number, scan the number data of each task, add the maximum value of the previous task, output the data, and...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data processing and distribution method based on a hadoop system for preventing execution of tasks from being limited by system memory and bandwidth when the data scale is oversized. The method comprises the following steps of (1) carrying out multi-task serial numbering on massive data to make a serial number of each piece of data unique; and (2) carrying out multi-task concurrent transmission on the massive data and starting multiple tasks to respectively transmit some data with the serial numbers. The invention also provides a data processing and distribution system based on the hadoop system.

Description

technical field [0001] The present invention relates to the technical field of big data processing, in particular to a Hadoop system-based data processing and distribution method, and a Hadoop system-based data processing and distribution system. Background technique [0002] The Chinese patent "A Method and System for Distributing Hadoop Cluster Management Tasks" (CN 201510347803.9) provides a method and device for distributing Hadoop cluster management tasks. The method first carries out stage planning to the management tasks according to the dependencies of the Hadoop components, then processes the management tasks in each stage in turn, and plans the management tasks assigned to the same component node in the same stage as a sub-stage; then when entering a After a scheduling cycle, scan all sub-phases currently to be scheduled and sort them. Finally, according to the preset filter conditions, it is judged whether the current sub-phase is suitable for task distribution i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/48G06F9/50
CPCG06F9/4881G06F9/5088G06F2209/483G06F2209/5018
Inventor 孙超齐振华王俊邱鹿于勇新崔晶晶林佳婕
Owner BEIJING GEO POLYMERIZATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products