Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data block balancing method in operation process of HDFS (Hadoop Distributed File System)

A data block and balance technology, applied in the computer field, can solve the problems of low map task data locality and uneven distribution of HDFS data blocks, and achieve the effect of improving task balance, locality, and execution balance

Inactive Publication Date: 2013-02-20
XI AN JIAOTONG UNIV
View PDF4 Cites 21 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] The purpose of the present invention is to solve the problem of low locality of map task data in the Map stage caused by the uneven distribution of HDFS data blocks, and provide a method for balancing HDFS data blocks during runtime. The method proposes to move HDFS balance based on data blocks during runtime Strategy, by predicting node task requests to judge possible non-local map task execution in advance, and moving appropriate data blocks between corresponding nodes, so that when nodes send actual task requests, they can get local map task allocation responses, thereby improving the Map stage completion efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data block balancing method in operation process of HDFS (Hadoop Distributed File System)
  • Data block balancing method in operation process of HDFS (Hadoop Distributed File System)
  • Data block balancing method in operation process of HDFS (Hadoop Distributed File System)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0045] The HDFS data block balancing strategy based on runtime data block movement, its specific implementation steps are as follows:

[0046] The first step is node local task list preprocessing. The local task list of each node is preprocessed, and it is divided into a completely local task part and a non-completely local task part. The fully local task parts of all nodes implement a complete processing of the input dataset, and there is no task overlap with each other. Ideally, if each node is assigned all local tasks at the same time, the distribution of HDFS data blocks is in line with the allocation of the scheduler to each node, that is, the placement of HDFS data blocks is balanced. At this time, the conflicting task allocation can be determined by predicting the future task request of the node, and the possible non-local task allocation can be judged...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data block balancing method in an operation process of an HDFS (Hadoop Distributed File System). The method comprises the following steps of: at first, pre-processing local task lists of nodes, and dividing the local task list of each node into entirely local tasks and non-entirely local tasks, so as to provide the basis for starting data block balance judgment of the HDFS; secondly, carrying out estimation and task request prediction on an operation rate of each node; thirdly, designing and realizing an assignment process of each node after completing said steps; fourthly, selecting proper nodes to move a data block between the proper nodes, so that the distribution of the data block can be matched with a predicted node task request sequence; and finally, balancing the data block. With the adoption of the data block balancing method, non-local map task execution which is possible to occur is judged by predicting the node task request in advance, and the proper data block is moved between the corresponding nodes, so that the distribution response of the local map tasks can be obtained when the nodes send an actual task request. Therefore, the completion efficiency of a Map step can be improved.

Description

technical field [0001] The invention belongs to the technical field of computers, and relates to a method for balancing data blocks, in particular to a method for balancing data blocks during HDFS (Hadoop Distributed File System) operation in a cloud computing environment. Background technique [0002] Hadoop is a highly reliable and highly scalable storage and distributed parallel computing platform developed by the Apache open source organization. It was first developed as the basic platform of the open source search engine project Nutch, and then became independent from the Nutch project and became One of the typical open source cloud computing platforms. The Hadoop core implements a block-based distributed file system (Hadoop Distributed File System, HDFS) and a MapReduce computing model for distributed computing. HDFS provides Hadoop clusters with a storage system composed of many nodes. When storing large-scale data files, the files are divided into multiple data bloc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/50G06F17/30
Inventor 曹海军伍卫国董小社樊源泉魏伟朱霍
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products