Sparse accelerator applied to on-chip training

An accelerator, sparse technology, applied in biological neural network models, neural architectures, neural learning methods, etc., can solve the problems of ineffective operations on-chip training, inability to guarantee on-chip training, etc., to improve hardware utilization and accurate elimination.

Pending Publication Date: 2022-05-13
NANJING UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] This application provides a sparse accelerator applied to on-chip training, which is used to solve the technical problem that after the acceleration of the existing accelerator, all invalid operations in the on-chip training cannot be eliminated, and the most efficient implementation of on-chip training on the terminal device cannot be guaranteed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Sparse accelerator applied to on-chip training
  • Sparse accelerator applied to on-chip training
  • Sparse accelerator applied to on-chip training

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0057] In order to make the purpose, technical solution and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below in conjunction with the accompanying drawings.

[0058] The terms used in the following examples are for the purpose of describing particular examples only, and are not intended to limit the application. As used in the specification and appended claims of this application, the singular expressions "a", "an", "said", "above", "the" and "this" are intended to also Expressions such as "one or more" are included unless the context clearly dictates otherwise. It should also be understood that in the following embodiments of the present application, "at least one" and "one or more" refer to one, two or more, and "multiple" refers to two or more. The term "and / or" is used to describe the relationship between associated objects, indicating that there may be three relationships; for ex...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

According to the sparse accelerator applied to on-chip training provided by the embodiment of the invention, a plurality of input numerical values in the input numerical value buffer module, a plurality of reference numerical values in the reference numerical value buffer module and a mask in the mask buffer module are dynamically adjusted in different acceleration stages; the method comprises the following steps of: performing primary screening on invalid operations by a coarse-grained unit in a coarse-grained manner, and further screening the invalid operations by a fine-grained unit included in each processing unit in a processing module in a fine-grained manner, thereby eliminating all invalid operations in three training stages of on-chip training; in addition, the plurality of computing cores can perform parallel acceleration processing, and the plurality of processing units included in the processing modules in the plurality of computing cores can also perform parallel acceleration processing, so that the hardware utilization rate of the sparse accelerator is further improved. Therefore, the sparse accelerator applied to on-chip training can efficiently and accurately eliminate all invalid operations in three stages of on-chip training.

Description

technical field [0001] This application relates to the field of computer and electronic information technology, in particular to a sparse accelerator applied to on-chip training. Background technique [0002] In recent years, Convolutional Neural Networks (CNN) have performed well in the fields of computer vision, speech recognition, and natural language processing. In order to improve the accuracy of recognition, it is necessary to train the CNN model on-chip. Fine-tune the CNN model with the user data on the network, and then improve the training accuracy of the CNN model when it is used. [0003] The on-chip training process includes three stages, namely the forward propagation (FP) stage, the backward propagation (BP) stage and the weight gradient calculation (weight gradient, WG) stage. In the FP stage, the previous layer The activation value of the current layer is used as the input activation value of the current layer, combined with the convolution kernel weight cor...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/063G06N3/08
CPCG06N3/063G06N3/082G06N3/084G06N3/045
Inventor 王中风黄健鲁金铭
Owner NANJING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products