Convolution operation memory access optimization method based on GPU

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A memory access and convolution operation technology, which is applied in the field of convolution operation memory access optimization, can solve the problems of large convolution operation memory access overhead and reduced number of convolution memory accesses.

Active Publication Date: 2020-10-20

HARBIN INST OF TECH

View PDF11 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The purpose of the present invention is to solve the problem that the memory access cost of the convolution operation in the prior art is large, and the number of memory accesses of the convolution is too many to reduce the performance of the convolution operation

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment approach

[0072] Embodiments of the present invention are shown below through the aforementioned examples.

[0073] In order to achieve the purpose of memory access optimization, an embodiment of the present invention is such as Image 6 shown, including:

[0074] S1: Load the convolution kernel data into the shared memory.

[0075] S2: Divide the convolution output into sub-blocks in units of 32 columns to obtain several sub-blocks containing 32-column data and one sub-block with less than 32-column data. which is Figure 5 The division method of the real place.

[0076] S3: It is assumed that there are N threads for processing sub-blocks; each thread calculates the index of the first data required by the thread. The index of the first data is the first and the left and right data required by each thread shown in FIG. 2 . Other required data can be obtained through the index operation of the first data.

[0077] S4: Each thread acquires the remaining required input data from the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

A convolution operation memory access optimization method based on a GPU relates to a convolution operation memory access optimization technology. According to the invention, the defect of high memoryaccess overhead of convolution operation in the prior art can be solved. The method is characterized by comprising the steps of loading convolution kernel data into a shared memory; dividing the convolution output into sub-blocks by taking 32 columns as units to obtain a plurality of sub-blocks containing 32 columns of data and a sub-block containing less than 32 columns of data; enabling each thread to calculate an index of first data required by the thread; enabling each thread to obtain the remaining required input data from the index of the first data through a column reuse algorithm andtransmit the obtained input data to a row reuse algorithm; calculating an output result through the row reuse algorithm and storing the output result in register data sum; writing the sum into a global memory; and calculating other to-be-calculated data in the convolution output. The method is used for performing memory access optimization on convolution operation in the fields of image processing, video processing and machine learning.

Description

technical field [0001] The invention relates to a convolution operation memory access optimization technology, in particular to a GPU-based convolution operation memory access optimization method. Background technique [0002] In the fields of image processing, video processing and machine learning, convolution operation has become a core computing mode. 2D convolution is widely used in image filtering and frame difference. Depth-wise convolution is often used in mobile neural networks. Multi-channel 2D convolution is the core operation in neural networks. However, convolution operations consume a lot of computing resources and memory resources, and convolution operations occupy 90% of the execution time in image processing and machine learning. Many optimization methods for convolution operations have been proposed, among which methods based on GEMM (matrix multiplication), FFT and Winograd are the most widely used. However, these methods need to convert the input and out...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N3/063G06N3/04G06F9/50

CPCG06N3/063G06F9/5016G06N3/045Y02D10/00

Inventor 张伟哲鲁刚钊王峥李克勤孙广中

Owner HARBIN INST OF TECH

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Convolution operation memory access optimization method based on GPU

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment approach

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology