A data placement method for a distributed file system supporting deep learning

A distributed file and deep learning technology, applied in the field of data placement in distributed file systems, can solve problems such as large network overhead and inability to support the locality of upper-layer computing data, so as to improve operating efficiency and reduce network overhead. , improve the effect of data locality

Active Publication Date: 2022-07-08
SUZHOU INST FOR ADVANCED STUDY USTC
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Doing so will cause the boards on each node to fetch data from other nodes to generate a large network overhead when reading data blocks of a file, and fetching data in batches according to the path given by the user will also generate data from other nodes. The case of nodes reading data also cannot well support the data locality of upper-level computing

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data placement method for a distributed file system supporting deep learning
  • A data placement method for a distributed file system supporting deep learning
  • A data placement method for a distributed file system supporting deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] like figure 1Shown is a schematic diagram of a distributed file system framework that supports deep learning based on HDFS. The overall structure of the system includes Namenode for managing metadata of the entire system, Datanode for actual data storage, and DFS Client for interacting with users. Among them, the DFS Client can perform file query and storage operations on the Namenode, and the Namenode returns the storage location and other information to the DFS Client. Namenode can control DFS Client to perform file storage and read operations to Datanode. At the same time, Namenode can obtain storage status or issue related instructions from Datanode. There are system namespaces and upper-layer logical storage in the entire Namenode. The system namespace is the mechanism used by HDFS to display data files to users, and the upper-layer logical storage is used to consider the storage method and placement strategy of data files. Namenode is also responsible for obtain...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data placement method for a distributed file system supporting deep learning, and a storage, reading and deletion method for supporting deep learning based on HDFS. The data placement method includes the design of a unit Pile that proposes data aggregation , The design of data placement strategy according to Pile, the design of data placement in Pile units on physical storage; the smallest unit of processing data, Pile, can support the characteristics of batch data in deep learning computing and consider supporting upper-level computing systems data locality, and can perform load balancing of files; data placement strategy in units of Pile is used to plan the storage and reading of Pile for different users and different data sets; data blocks in physical storage are in units of Pile The design of the storage strategy is used to deal with the organization of the files in the Pile on the underlying physical storage, reduce the memory pressure of the master node, effectively support deep learning inference operations, and support the storage and calculation of massive deep learning data.

Description

technical field [0001] The present invention relates to data placement of a distributed file system, in particular to a data placement method of a distributed file system supporting deep learning based on HDFS, and HDFS-based storage, reading and deletion supporting deep learning method. Background technique [0002] Different from the traditional centralized storage system, the distributed file system belongs to one of the distributed storage systems. It connects a large number of PC servers through a local area network and provides storage services as a whole. It has good scalability, distributes data to each server in the entire cluster for storage, and uses the management server to manage all storage locations and storage status information. Relying on this structure, it not only improves the reliability and scalability of the system, but also improves the overall access efficiency of data. Distributed file systems are widely used in various fields of current life, suc...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/182G06F16/13G06F16/16
CPCY02D10/00
Inventor 李曦周学海王超谭璐超
Owner SUZHOU INST FOR ADVANCED STUDY USTC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products