Method and device for determining duplicate copy placement nodes on basis of hadoop

A technology for determining methods and nodes, which is applied in special data processing applications, instruments, electrical digital data processing, etc., and can solve problems such as high load, decreased data transmission efficiency, and random distribution of copies

Inactive Publication Date: 2018-02-23
ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
View PDF3 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the default replica placement strategy has certain limitations, which are mainly reflected in the following: when selecting replica storage nodes, a random machine method is used. Although HDFS also considers the load information of the number of working connections of data nodes, it is relatively simple and is selected randomly. The judgment is made after the storage node
Such a copy placement method will lead to random distribution of copies, especially in a heterogeneous environment, it is very likely that the node that allocates more data copies is a node with poor performance, which will further cause some nodes to have a high load , some nodes are in an idle state, resulting in a drop in data transmission efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for determining duplicate copy placement nodes on basis of hadoop
  • Method and device for determining duplicate copy placement nodes on basis of hadoop

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044]The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0045] The embodiment of the present invention discloses a Hadoop-based replica placement node determination method, device, device, and computer-readable storage medium, so as to determine the replica placement node, improve the load balance of cluster nodes, and finally achieve the purpose of improving data transmission efficiency .

[0046] see figure 1 , a Hadoop-based replica placement node determination method provided by an embodiment of the present inventio...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method, a device and equipment for determining duplicate copy placement nodes on the basis of hadoop and a computer readable storage medium. The method includes determining target rack servers according to duplicate copy types of target duplicate copies; selecting nodes for to-be-placed duplicate copies from the target rack servers and forming clusters of the nodes for the to-be-placed duplicate copies; selecting nodes with working connection smaller than connection thresholds from the clusters of the nodes for the to-be-placed duplicate copies and determining nodes with the minimum real-time load from the nodes with the working connection smaller than the connection thresholds. The determined nodes with the minimum real-time load are used as nodes for placing thetarget duplicate copies. According to the scheme, the method, the device, the equipment and the computer readable storage medium have the advantages that the real-time load of the nodes and HDFS (hadoop distributed file system) working process numbers need to be simultaneously considered when the nodes for placing the duplicate copies are about to be selected, accordingly, reasonable distributionof the duplicate copies can be effectively improved, and optimized duplicate copy placement strategies are definite in purposefulness as compared with default duplicate copy placement strategies; thenodes with the minimum real-time load are selected to the greatest extent, the duplicate copies can be prevented from being stored in high-load nodes, and accordingly the duplicate copy transmissiontime can be shortened.

Description

technical field [0001] The present invention relates to the technical field of distributed file system copy storage, and more specifically, relates to a Hadoop-based copy placement node determination method, device, equipment and computer-readable storage medium. Background technique [0002] At present, Hadoop is the current mainstream enterprise big data analysis platform. Hadoop uses the HDFS distributed file system for data storage. HDFS adopts the master-slave architecture design mode (master / slave architecture), a name node (NameNode) and several data nodes (DataNode) form the HDFS cluster. Among them, HDFS adopts a three-copy redundancy mechanism to ensure data security. The principle of HDFS default copy placement strategy is: try to store two of the data block copies on one rack, and store the other data block copy on another rack, which is very good in terms of bandwidth resources and reliability. Made a balance. [0003] However, the default replica placement ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/113G06F16/13G06F16/182
Inventor 王宜燕江超
Owner ZHENGZHOU YUNHAI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products