An automatic data equalization method and tool based on hadoop

A data and balanced technology, applied in the direction of data exchange network, electronic digital data processing, digital data information retrieval, etc., can solve the problems of ineffective utilization, network bandwidth consumption, uneven data distribution, etc., and reduce the cost of manual operation and maintenance , data balance and reasonable, guaranteed performance effect

Inactive Publication Date: 2019-06-25
SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In Hadoop cluster, when a new node is added or the original node is deleted, if the data balancing service (load balancing) is not enabled, the data will be unevenly distributed in the cluster
As a result, it is impossible to effectively use the advantages of MR localized computing. Generally speaking, the data required for the map task running on node A is not on node A, but on node B, so data needs to be read across nodes, resulting in Unnecessary consumption of network bandwidth

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An automatic data equalization method and tool based on hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The present invention provides a hadoop-based automatic data balancing method, which calls hadoop API through a java application program, obtains the disk usage rate of each data node, calculates the standard deviation of the disk usage rate of each data node, and real-time and preset startup data balancing script Start threshold comparison, the current standard deviation is greater than the start threshold, obtain the current cluster load through the Hadoop API, set the data balancing bandwidth and number of threads according to the load value, and then start the data balancing script in the background through the java application to start data balancing, and Check the current equalization progress through the application.

[0029] At the same time, a hadoop-based automatic data equalization tool corresponding to the above method is provided, including a call unit, a calculation unit, an analysis unit, a setting unit and a startup unit, and the communication connection ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an automatic data equalization method and tool based on hadoop, and relates to the technical field of software information. The automatic data equalization method comprises steps of calling the hadoop API through the java application program; obtaining the disk utilization rate of each data node; calculating the standard deviation of the disk utilization rate of each data node; comparing with a preset starting threshold value of a starting data balance script in real time; and when the current standard deviation is greater than a starting threshold value, obtaining a current cluster load condition through a hadoop API, setting a data balance bandwidth and a thread number according to a load value, starting a data balance script at a background through a java application program, starting data balance, and detecting a current balance progress through the application program.

Description

technical field [0001] The invention discloses a Hadoop-based automatic data equalization method and tool, and relates to the technical field of software information. Background technique [0002] Hadoop is a distributed system infrastructure developed by the Apache Foundation. Users can develop distributed programs without knowing the underlying details of the distribution. Make full use of the power of the cluster for high-speed computing and storage. Hadoop implements a distributed file system (Hadoop Distributed File System), referred to as HDFS. HDFS is highly fault-tolerant and designed to be deployed on inexpensive hardware. [0003] In Hadoop cluster, when a new node is added or the original node is deleted, if the data balancing service (load balancing) is not enabled, the data will be unevenly distributed in the cluster. As a result, it is impossible to effectively use the advantages of MR localized computing. Generally speaking, the data required for the map t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): H04L29/08H04L12/26G06F9/50G06F16/182
Inventor 赵明超贾德星臧勇真
Owner SHANDONG LANGCHAO YUNTOU INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products