An optimal localization task scheduling method based on mapreduce

A task scheduling and task technology, applied in the computer field, can solve the problems of inability to obtain the degree of data localization and the overall execution time of the job, the scope of application is not wide, the applicability is not wide, etc., to shorten the overall execution time, improve the degree of parallelism, The effect of reducing network bandwidth usage

Inactive Publication Date: 2017-06-09
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] There are various scheduling methods to improve the degree of data localization in the Map stage, but there are some problems such as low practicability and limited scope of application.
Zaharia et al. proposed a delay scheduling algorithm that can effectively improve the degree of data localization (“Delay scheduling: a simple technique for achieving locality and fairness including cluster scheduling,” in Proceedings of the 5th European conference on Computersystems.ACM,2010,pp. 265–278.), but this method of delayed scheduling is based on the loss of execution efficiency of local jobs, and this scheduling algorithm is not widely applicable, when only one or a few jobs are running, it cannot Achieve optimal data localization and overall job execution time
Xie et al proposed a method to distribute data in advance according to the performance of computing nodes (“Improving mapreduce performance through data placement intensive hadoop clusters,” in Parallel & Distributed Processing, Workshops and PhD Forum (IPDPSW), 2010IEEE International Symposium on.IEEE, 2010, pp .1–9.), this method needs to measure the performance of each computing node in advance, and this method is not very practical under the MapReduce platform where computing resources of computing nodes can be dynamically set by adjusting parameters

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An optimal localization task scheduling method based on mapreduce
  • An optimal localization task scheduling method based on mapreduce
  • An optimal localization task scheduling method based on mapreduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0068] This embodiment is set up in the Hadoop cluster experiment of 11 physical computing nodes (1 master node, 10 slave nodes), with 128MB as a data block, respectively run test cases wc16, wc22, wc38, wc60, wc98, wherein wc16 represents a wordcount test case with 16 data block sizes, and the obtained localization improvement ratio is as follows Figure 4 As shown, the localization degree improved by up to 17.9%. The performance improvement diagram of the Map phase and the entire MapReduce phase when the network is not congested is as follows Figure 5 As shown in the figure, it can be seen from the figure that the performance of the Map phase has increased by 19.7%, and the performance of the entire MapReduce phase has increased by 17.8%; ) The performance improvement diagram of the Map stage and the entire MapReduce stage is as follows Image 6 As shown, the performance of the Map stage has been improved by 70.4%, and the performance of the entire MapReduce stage has bee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes a MapReduce task scheduling algorithm that can work simultaneously in homogeneous and heterogeneous cluster environments, belonging to the field of computer technology. The scheduling algorithm can comprehensively consider the processing performance of each computing node in the cluster, abstract the computing nodes and computing tasks into a bipartite graph, and form the final global task scheduling scheme by appropriately expanding the bipartite graph and combining the KM weighted optimal matching algorithm. Experimental data show that the scheduling algorithm can improve the data localization degree in the Map stage to nearly 100%, and the overall execution time of the MapReduce job can be reduced by 67.1%.

Description

technical field [0001] The invention belongs to the technical field of computers, and in particular relates to an optimal localization task scheduling method based on MapReduce. Background technique [0002] MapReduce task scheduling directly affects the execution time of MapReduce computing jobs, and an efficient scheduling algorithm can effectively improve job execution efficiency. [0003] The degree of data localization directly affects the execution efficiency of MapReduce jobs. The MapReduce job is mainly composed of the Map stage and the Reduce stage. The intermediate output data generated by the computing nodes in the Map stage needs to be transmitted through the network to the computing nodes in the Reduce stage as their input data. This intermediate stage is called Shuffle. The resource consumption of network bandwidth brought about by the data transmission in the Shuffle stage and the persistent storage of data in the Reduce stage is inevitable. Under the conditi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/50
Inventor 高胜立薛瑞尼敖立翔管仲洋
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products