Dynamic copy management method based on HDFS
A copy management and copy technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., to achieve the effect of improving concurrent performance and accuracy
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0048] A HDFS-based dynamic copy management method, comprising a copy placement strategy, a dynamic copy creation strategy and a dynamic copy deletion strategy, characterized in that: the copy placement strategy includes the placement strategy of the master copy default copy and the placement strategies of other copies, It is an active leveling strategy, which fully considers the problem of load balancing at the beginning of replica creation, actively places replicas in the best position, and the best position is the position with the lightest load, so as to eliminate the potential of load imbalance as much as possible Risk, this strategy avoids the arbitrary placement of the created copy in the entire storage system, but judges the best location according to the computing power of the storage node and the number of data blocks already stored.
Embodiment 2
[0050] On the basis of Embodiment 1, in the replica placement strategy of this embodiment, the master replica and the default replica placement strategy are: for each data block in HDFS, when the file is written into the file system, there will be 1 master replica by default. copy and two default copies. The master copy and one of the default copies are saved on the local rack (the cluster under the same router where the uploaded file is located), and the other default copy is placed on any other rack except the local rack.
Embodiment 3
[0052] On the basis of embodiment 2, the selection of the machine in the rack of this embodiment has two parameter indexes:
[0053] How many data blocks have been stored
[0054] cpu processing performance
[0055] Among them, let the number of stored data blocks of the i-th machine be Ni, the cpu processing performance be CAi, let the variable where k 1 、k 2 is a constant coefficient. Calculate the P value of all nodes in the local rack, select the two machines with the smallest P value to create the primary copy and one of the default copies, calculate the P value of all nodes in the remote rack, and select the machine with the smallest P value to create another A default copy. During the selection process, machines that already have a copy of this data block are skipped; at the same time, the size of the space is detected, and machines with insufficient space to save the copy are skipped.
PUM

Abstract
Description
Claims
Application Information

- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com