An integrated performance prediction method for cuda programs based on multi-feature coupling

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A performance prediction and multi-feature technology, applied in the field of electronics and information, can solve the problem of low accuracy of CUDA program performance prediction

Active Publication Date: 2022-06-07

HARBIN INST OF TECH

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0003] In order to solve the problem of low prediction accuracy of CUDA program performance in the prior art, the present invention provides a multi-feature coupling-based CUDA program integrated performance prediction method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

specific Embodiment approach 1

[0032] Specific implementation mode one: combine figure 1 This embodiment will be described. The multi-feature coupling-based CUDA program integrated performance prediction method provided in this embodiment specifically includes the following steps:

[0033] Step 1, extract data, program (or algorithm), inherent characteristics of GPU hardware; specifically include: obtain CUDA program kernel unit Warp running time Wtime, Warp size Wsize, design kernel execution configuration parameters >>, Calculate the number of Warps NBW (Number of Block Warps) contained in the thread block according to Db; calculate the number of registers RPT (Register per Thread) applied by the thread, and the size of shared memory SMPB (Shared Memoryper Block) applied by the unit thread block; obtain the GPU device Computing capability Capability, CUDA Core number NCC (Number of CUDACores), stream multiprocessor SM number NSM (Number of Streaming Multiprocessors); Warp is the thread warp Core is the ha...

specific Embodiment approach 2

[0039] Specific implementation mode 2: The difference between this implementation mode and specific implementation mode 1 is that the specific calculation process of NAW described in step 2 is:

[0040] NAW=calculator(Db,RPT,SMPB,Capability) (1)

[0041] The actual NKW (Dg×NBW) Warps cannot be scored on the SM, so the above formula is used to calculate and correct NKW, so that the NSMW calculated based on NKW represents the kernel load (equivalent to the load assigned to the maximum number of Warps).

[0042] Other steps and parameters are the same as those in the first embodiment.

specific Embodiment approach 3

[0043] Specific embodiment three: the difference between this embodiment and specific embodiment one is that the specific process of computing device parallel space DPS described in step two is:

[0044] DPS=NAW×NSM (2)

[0045] Other steps and parameters are the same as those in Embodiment 1 or 2.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a CUDA program integrated performance prediction method based on multi-feature coupling, which belongs to the field of electronics and information technology. The present invention firstly extracts the inherent characteristics of data, programs, and GPU hardware, and defines high-level performance-related features such as equipment parallel space DPS, equipment parallel space idleness DPSID, SM Warp load NSMW, and parallel effect factor PEF; and then by comparing Dg and APDG , and the numerical relationship between DPSID and 1, determine the state of the CUDA program kernel and obtain the corresponding kernel duration volume KDTV, and finally obtain the kernel duration KDT at this time, and complete the prediction. The invention solves the problem of low prediction accuracy of CUDA program performance in the prior art. The invention can be used for accurate prediction of parallel program performance.

Description

technical field [0001] The invention relates to a CUDA program integrated performance prediction method, which belongs to the field of electronics and information technology. Background technique [0002] Program performance prediction is an important link in the process of parallel program design. It plays an important role in locating program performance bottlenecks and optimizing program performance. The relationship between the order of magnitude increase in the running time of a serial program and the size of the problem can be described by the time complexity of the algorithm. However, since the instruction execution mechanism of the GPU (Graphics Processing Unit) program is specially designed for parallelism and is closely related to the hardware architecture of the device, in addition, the running state of the CUDA (Compute Unified Device Architecture, Unified Computing Device Architecture) program will also be affected by the computing instructions. The influence o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F11/34

CPCG06F11/3419G06F11/3433Y02D10/00

Inventor 陈浩曲海城于思淼陈稳

Owner HARBIN INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

An integrated performance prediction method for cuda programs based on multi-feature coupling

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

specific Embodiment approach 1

specific Embodiment approach 2

specific Embodiment approach 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology