Cluster packet synchronization optimization method and system for distributed deep neutral network

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of deep neural network and group synchronization, which is applied in the field of distributed optimization of deep neural network, can solve the problems of increased number of parameter server requests, twists and turns in parameter update direction, poor model convergence effect, etc., to achieve high resource utilization and increase Affects the effect of weights, reducing the impact of stale gradients

Active Publication Date: 2017-08-04

HUAZHONG UNIV OF SCI & TECH

View PDF1 Cites 55 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the synchronization overhead between nodes is relatively high. While a node is waiting for other nodes to complete the current round of iterations, its own computing resources and network resources are idle. In heterogeneous clusters and large-scale homogeneous clusters, this The phenomenon is particularly serious

In a heterogeneous cluster, due to the large difference in the hardware configuration of the nodes, there are obvious performance differences among the nodes. Some nodes run fast, while others run slowly. Therefore, in each round of iteration, the fast All nodes need to wait for the slow nodes, resulting in idle resources of the fast nodes. The bottleneck of training lies in the slowest nodes; in a large-scale homogeneous cluster, although the performance of the nodes is the same, due to the large number of nodes, the overall The stability of the parameter server will be reduced, and some node performance fluctuations will inevitably occur. At the same time, the number of requests that the parameter server needs to process will also increase greatly, resulting in a relatively large synchronization overhead for each round of iteration.

The asynchronous parallel mechanism eliminates the time overhead of nodes waiting for each other because nodes do not need to consider the state of other nodes during each round of iteration, so the resource utilization of nodes is high, and its training speed is fast. However, because there is no parameter update synchronous operation , there will be old gradient problems. In the hyperdimensional space of model parameters, the update direction of parameters is more twists and turns. Therefore, under the same number of iterations, the convergence effect of the model will be worse than that of the synchronous parallel mechanism.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0044] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. In addition, the technical features involved in the various embodiments of the present invention described below can be combined with each other as long as they do not constitute a conflict with each other.

[0045] Below at first explain and illustrate with regard to the technical terms involved in the present invention:

[0046] Training data: also known as input data, that is, the processing object input to the network model when training the neural network, such as images, audio, text, etc.;

[0047] Model parameters: the weight of the interconnection of neurons in the neural network model and t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a cluster packet synchronization optimization method and system for a distributed deep neutral network. The method comprises the following steps of grouping nodes in a cluster according to the performance, allocating training data according to the node performance, utilizing a synchronous parallel mechanism in the same group, using an asynchronous parallel mechanism between the different groups and using different learning rates between the different groups. The nodes with the similar performance are divided into one group, so that the synchronization overhead can be reduced; more training data can be allocated to the nodes with the good performance, so that the resource utilization rate can be improved; the synchronous parallel mechanism is used in the groups with the small synchronization overhead, so that an advantage of good convergence effect of the synchronous parallel mechanism can be exerted; the asynchronous parallel mechanism is used between the groups with the large synchronization overhead, so that the synchronization overhead can be avoided; the different groups uses the different learning rates to facilitate model convergence. According to the method and the system, a packet synchronization method is used for a parameter synchronization process of the distributed deep neutral network in a heterogeneous cluster, so that the model convergence rate is greatly increased.

Description

technical field [0001] The invention belongs to the technical field of distributed optimization of deep neural networks, and more particularly relates to a method and system for grouping synchronization optimization of distributed deep neural network clusters. Background technique [0002] At present, Deep Neural Network (DNN) has been applied in many fields such as image, speech, and natural language processing, and has made many breakthroughs. Due to the large scale of its training data and trained model parameters, deep neural networks require sufficient computing resources and storage resources. Therefore, the traditional single-machine node training mode can no longer meet the requirements, and distributed computing modes such as clusters must be used. [0003] Distributed Deep Learning usually adopts data parallel mode for model training. Such as figure 1 As shown, data parallelism refers to splitting the training data, storing one or more split training data on eac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): H04L29/08H04L12/24G06N3/08

Inventor 蒋文斌金海叶阁焰张杨松马阳祝简彭晶

Owner HUAZHONG UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Cluster packet synchronization optimization method and system for distributed deep neutral network

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology