Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Improved MapReduce data processing method under virtual machine cluster

A technology of data processing and virtual machine clusters, applied in the transmission system, electrical components, etc., can solve the problems of performance delay, no perception, and no considerable method, and achieve the effect of saving overhead and short operation time

Inactive Publication Date: 2011-03-16
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 22 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the virtual machine cluster environment was not considered at the beginning of MapReduce design, so, so far, there has not been a considerable method in the research of combining the advantages of the two
Take EC2 (Elastic Cloud Computing) proposed by Amazon as an example. After users start MapReduce on a custom virtual machine cluster, MapReduce does not perceive changes in the underlying operating environment and still executes according to the original strategy. Serious performance lag problem: After the virtual cluster is enabled, the data needs to be imported from the physical cluster to the virtual cluster before it can be used by the MapReduce job, and after the calculation is completed, the data also needs to be exported. In this way, for the import and export of large-scale data, It takes a lot of time, for example, it takes a day to move 100TB of data on a 10Gb / s channel
Even if the data is not exported after the calculation is completed, the virtual machine cannot be shut down, affecting the performance of other users on the physical cluster, and will also bring additional power consumption

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved MapReduce data processing method under virtual machine cluster
  • Improved MapReduce data processing method under virtual machine cluster
  • Improved MapReduce data processing method under virtual machine cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention re-establishes the MapReduce working environment on the virtual machine cluster, that is, decouples storage and computing units. Such as figure 1 As shown, the host of virtual machine 1.1.1 and virtual machine 1.1.2 is physical machine 1.1 (and so on, the host of virtual machine 1.2.1, 1.2.2 is physical machine 1.2, virtual machine 1.N.1, 1 The host of .N.2 is the physical machine 1.N).

[0020] With the virtual machine cluster environment, the data distribution strategy is a very critical issue. Because the density of computing units is higher, if the density of data stored on the physical machine remains the same as before, it will inevitably lead to the computing unit cannot find the local Data is read, so the input data needs to be redistributed on the physical cluster, adjusted according to the number of physical nodes and virtual nodes required by the user; and it is necessary to ensure that the physical nodes between users do not overlap as muc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an improved MapReduce data processing method under a virtual machine cluster, which comprises the following steps: a task dispatching and managing center distributes tasks to virtual machines, the virtual machines which obtain the distributed tasks inquires a data storing and managing center for physical machines at which input data of the tasks is located, and each virtual machine selects the physical machine which is nearest to the topology of the virtual machine from the obtained virtual machines for carrying out data reading and data processing. By adopting the method, data storage and processing can be separate and independent, the physical machines are specially used for storing the data, the virtual machines are specially used for data processing, and a data reading strategy which is applicable to the virtual machine cluster is designed, thereby shortening the operation time, solving the performance hysteresis, reducing the resource consumption and improving the MapReduce data processing performance.

Description

Technical field [0001] The invention belongs to the field of distributed computing models, and specifically relates to the improvement of MapReduce performance under a virtual machine cluster. Background technique [0002] Cloud computing technology is developing rapidly and its applications are becoming more and more extensive. Cloud computing has two important components. One is virtualization technology: Virtualization technology is a decoupling technology that separates the underlying physical equipment from the upper layer operating system and software. It can realize the efficient and flexible use of computing resources. It can make fuller and reasonable use of computing resources to meet increasingly diverse computing needs, enabling people to use computing resources transparently, efficiently, and customizable, so as to truly realize the concept of flexible construction and on-demand computing; the second is large-scale data processing Middleware, a large-scale data proc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): H04L29/08
Inventor 金海吴松石宣化黄大川
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products