Task scheduling method and system in distributed data warehouse

A distributed data and task scheduling technology, applied in the field of data processing, can solve problems such as inclusion, unreasonable allocation of resources, failure to satisfy distributed data warehouses, etc., and achieve the effect of rational allocation of resources

Active Publication Date: 2011-11-16
SHENZHEN TENCENT COMP SYST CO LTD
View PDF2 Cites 31 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, this traditional task scheduling method has the problem of unfair task allocation and resource allocation.
For example, when large tasks and small tasks are running together, some small tasks in the queue may be mixed with large

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Task scheduling method and system in distributed data warehouse
  • Task scheduling method and system in distributed data warehouse
  • Task scheduling method and system in distributed data warehouse

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] Such as figure 2 As shown, a task scheduling method in a distributed data warehouse includes the following steps:

[0029] Step S10, divide the tasks into multiple task groups according to their types, and respectively set the proportion of resources that should be allocated to the task groups. In one embodiment, the tasks are divided into multiple task groups according to the types of tasks, such as critical task groups, real-time task groups, and non-real-time task groups, etc., wherein the critical task groups include some critical tasks that require timing output and are very important , such as departmental daily and monthly reports; the real-time task group includes small tasks that need to be processed in a timely manner; the non-real-time task group includes large tasks that do not need to be processed in a timely manner. After tasks are grouped, priorities can be set for different task groups, for example, tasks in the critical task group are processed first,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a task scheduling method and system in a distributed data warehouse. The method comprises the following steps of: A, dividing tasks into a plurality of task groups according to types, and respectively setting the proportion of resources required to be allocated for the task groups; and B, allocating the resources to the plurality of task groups according to the proportion of the resources. The system comprises a grouping module and a resource allocation module, wherein the grouping module is used for dividing tasks into a plurality of task groups according to types and respectively setting the proportion of resources required to be allocated for the task groups, and the resource allocation module is used for allocating the resources to the plurality of task groups according to the proportion of the resources. By adopting the task scheduling method and system in the distributed data warehouse, provided by the invention, the resources can be reasonably allocated, the requirements for calculating small tasks in real time can be met, and the requirements for calculating large tasks not in real time can also be met.

Description

【Technical field】 [0001] The invention relates to the technical field of data processing, in particular to a task scheduling method and system in a distributed data warehouse. 【Background technique】 [0002] Data warehouse (Data Warehouse) is a structured data environment for decision support systems and online analysis application data sources, which can study and solve the problems of obtaining information from databases. Distributed data warehouse refers to data that provides massive storage and computing services based on GFS (Google File System, a scalable distributed file system) and MapReduce (a programming model for parallel computing of large-scale data sets) related technologies Warehouse Solutions. [0003] The distributed data warehouse implemented by the MapReduce programming model usually adopts the FIFO (First Input First Output) scheduling strategy when performing multi-task scheduling, that is, after the user submits a task (job), the time and task The pri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/48G06F9/50
Inventor 李均郭玮洪坤乾赵伟
Owner SHENZHEN TENCENT COMP SYST CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products