An integrated performance prediction method for cuda programs based on multi-feature coupling

A performance prediction and multi-feature technology, applied in the field of electronics and information, can solve the problem of low accuracy of CUDA program performance prediction

Active Publication Date: 2022-06-07
HARBIN INST OF TECH
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In order to solve the problem of low prediction accuracy of CUDA program performance in the prior art, the present invention provides a multi-feature coupling-based CUDA program integrated performance prediction method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An integrated performance prediction method for cuda programs based on multi-feature coupling
  • An integrated performance prediction method for cuda programs based on multi-feature coupling
  • An integrated performance prediction method for cuda programs based on multi-feature coupling

Examples

Experimental program
Comparison scheme
Effect test

specific Embodiment approach 1

[0032] Specific implementation mode one: combine figure 1 This embodiment will be described. The multi-feature coupling-based CUDA program integrated performance prediction method provided in this embodiment specifically includes the following steps:

[0033] Step 1, extract data, program (or algorithm), inherent characteristics of GPU hardware; specifically include: obtain CUDA program kernel unit Warp running time Wtime, Warp size Wsize, design kernel execution configuration parameters >>, Calculate the number of Warps NBW (Number of Block Warps) contained in the thread block according to Db; calculate the number of registers RPT (Register per Thread) applied by the thread, and the size of shared memory SMPB (Shared Memoryper Block) applied by the unit thread block; obtain the GPU device Computing capability Capability, CUDA Core number NCC (Number of CUDACores), stream multiprocessor SM number NSM (Number of Streaming Multiprocessors); Warp is the thread warp Core is the ha...

specific Embodiment approach 2

[0039] Specific implementation mode 2: The difference between this implementation mode and specific implementation mode 1 is that the specific calculation process of NAW described in step 2 is:

[0040] NAW=calculator(Db,RPT,SMPB,Capability) (1)

[0041] The actual NKW (Dg×NBW) Warps cannot be scored on the SM, so the above formula is used to calculate and correct NKW, so that the NSMW calculated based on NKW represents the kernel load (equivalent to the load assigned to the maximum number of Warps).

[0042] Other steps and parameters are the same as those in the first embodiment.

specific Embodiment approach 3

[0043] Specific embodiment three: the difference between this embodiment and specific embodiment one is that the specific process of computing device parallel space DPS described in step two is:

[0044] DPS=NAW×NSM (2)

[0045] Other steps and parameters are the same as those in Embodiment 1 or 2.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a CUDA program integrated performance prediction method based on multi-feature coupling, which belongs to the field of electronics and information technology. The present invention firstly extracts the inherent characteristics of data, programs, and GPU hardware, and defines high-level performance-related features such as equipment parallel space DPS, equipment parallel space idleness DPSID, SM Warp load NSMW, and parallel effect factor PEF; and then by comparing Dg and APDG , and the numerical relationship between DPSID and 1, determine the state of the CUDA program kernel and obtain the corresponding kernel duration volume KDTV, and finally obtain the kernel duration KDT at this time, and complete the prediction. The invention solves the problem of low prediction accuracy of CUDA program performance in the prior art. The invention can be used for accurate prediction of parallel program performance.

Description

technical field [0001] The invention relates to a CUDA program integrated performance prediction method, which belongs to the field of electronics and information technology. Background technique [0002] Program performance prediction is an important link in the process of parallel program design. It plays an important role in locating program performance bottlenecks and optimizing program performance. The relationship between the order of magnitude increase in the running time of a serial program and the size of the problem can be described by the time complexity of the algorithm. However, since the instruction execution mechanism of the GPU (Graphics Processing Unit) program is specially designed for parallelism and is closely related to the hardware architecture of the device, in addition, the running state of the CUDA (Compute Unified Device Architecture, Unified Computing Device Architecture) program will also be affected by the computing instructions. The influence o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F11/34
CPCG06F11/3419G06F11/3433Y02D10/00
Inventor 陈浩曲海城于思淼陈稳
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products