Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Task Scheduling Method for Distributed Computing System

A distributed computing and task scheduling technology, applied in the field of cloud computing, can solve problems such as system bottlenecks, high risk of single point failure, and large amount of calculations, and achieve self-management, reduce system bottlenecks, and improve efficiency.

Active Publication Date: 2016-01-20
中金数谷科技有限公司
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] From the above description of the MapReduce computing model and running examples, it can be seen that in its computing process, especially the Reduce link that summarizes the results at the end, it is necessary to summarize the computing results of dozens, hundreds, or even thousands of nodes Calculation, the amount of calculation is very large, and this pressure is concentrated on the master node, and the master node also has to undertake heavy task distribution work, which is easy to form a system bottleneck and also brings a greater risk of a single point of failure in the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Task Scheduling Method for Distributed Computing System
  • A Task Scheduling Method for Distributed Computing System
  • A Task Scheduling Method for Distributed Computing System

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] like figure 2 As shown, a task scheduling method of a distributed computing system, the nodes in the distributed computing system cluster include a master node and a plurality of work nodes, wherein the master node is used to perform tasks for the plurality of work nodes Scheduling includes the following steps:

[0035] S1, the user program starts the MapReduce program, and the user program enters a dormant state at the same time; the master node obtains the input file, and the master node sends a request for executing and processing the input file, and the distributed computing system responds to the request and converts the input file The file is divided into multiple data segments, and the master node generates a plurality of Map subtasks according to the number of data segments based on the MapReduce model, and copies copies of the Map subtasks to each working node, wherein each of the Map subtasks It is used to execute a request for processing one data segment; a...

Embodiment 2

[0052] like image 3 As shown, based on the same inventive concept, the present invention also provides a task scheduling system for a distributed computing system, including a starting device Q1, an allocating device Q2, a judging device Q3, a partitioning device Q4, an acquiring device Q5, and a waking device Q6;

[0053] in,

[0054] The starting device Q1 is used for the user program to start the MapReduce program, and the user program enters a dormant state at the same time; the master node obtains the input file, and the master node sends a request for executing and processing the input file, and the distributed computing system responds to the request , and divide the input file into multiple data segments, the master node generates multiple Map subtasks according to the number of data segments, and copies copies of the Map subtasks to each working node, wherein each of the The Map subtask is used to execute a request for processing one of the data segments; meanwhile,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a task scheduling method and a task scheduling system for a distributed computing system. The method comprises the following steps of: (1) setting nodes in a distributed computing system cluster as a main node and a working node; (2) partitioning an input file into a plurality of data segments by the main node; (3) distributing data segments and a task to the working node by using the main node, extracting an initial key value pair and processing to generate a middle key value pair by using the main node, and storing in a public area; (4) judging whether processing of all data segments is finished or not by using the main node, if so, implementing a step (5), otherwise, implementing the step (3); (5) partitioning the middle key value pair into a plurality of areas, and sequencing; (6) processing the middle key value pair by using the working node for which data segment processing is completed, and storing a final result in an area to which the working node belongs; and (7) returning. According to the task scheduling method and the task scheduling system for the distributed computing system, the main node for distributing tasks is only used for distributing tasks while final summarization and computation of processing results are completed by using the working node, so that the problem of system bottleneck at a simplifying stage is solved.

Description

technical field [0001] The invention relates to the technical field of cloud computing, in particular to a task scheduling method for a distributed computing system. Background technique [0002] In the field of cloud computing, the current mainstream computing model is the MapReduce model proposed by Google and the model formed by its improvement. For general users, the MapReduce model is a programming paradigm. Programs written according to the MapReduce programming paradigm can run in parallel on multiple computer nodes in the cluster; for cloud service providers, MapReduce is a The architecture of the computing environment uses MapReduce to organize multiple computer nodes to form a large cluster to run the MapReduce program. MapReduce divides the entire job into multiple subtasks according to the size of the data, and the above multiple subtasks run in parallel on the computer nodes in the cluster. [0003] In the MapReduce model, a MapReduce job (Job) is divided into...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/08
Inventor 岳洋钮毅
Owner 中金数谷科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products