An integrated performance prediction method for cuda programs based on multi-feature coupling
A performance prediction and multi-feature technology, applied in the field of electronics and information, can solve the problem of low accuracy of CUDA program performance prediction
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
specific Embodiment approach 1
[0032] Specific implementation mode one: combine figure 1 This embodiment will be described. The multi-feature coupling-based CUDA program integrated performance prediction method provided in this embodiment specifically includes the following steps:
[0033] Step 1, extract data, program (or algorithm), inherent characteristics of GPU hardware; specifically include: obtain CUDA program kernel unit Warp running time Wtime, Warp size Wsize, design kernel execution configuration parameters >>, Calculate the number of Warps NBW (Number of Block Warps) contained in the thread block according to Db; calculate the number of registers RPT (Register per Thread) applied by the thread, and the size of shared memory SMPB (Shared Memoryper Block) applied by the unit thread block; obtain the GPU device Computing capability Capability, CUDA Core number NCC (Number of CUDACores), stream multiprocessor SM number NSM (Number of Streaming Multiprocessors); Warp is the thread warp Core is the ha...
specific Embodiment approach 2
[0039] Specific implementation mode 2: The difference between this implementation mode and specific implementation mode 1 is that the specific calculation process of NAW described in step 2 is:
[0040] NAW=calculator(Db,RPT,SMPB,Capability) (1)
[0041] The actual NKW (Dg×NBW) Warps cannot be scored on the SM, so the above formula is used to calculate and correct NKW, so that the NSMW calculated based on NKW represents the kernel load (equivalent to the load assigned to the maximum number of Warps).
[0042] Other steps and parameters are the same as those in the first embodiment.
specific Embodiment approach 3
[0043] Specific embodiment three: the difference between this embodiment and specific embodiment one is that the specific process of computing device parallel space DPS described in step two is:
[0044] DPS=NAW×NSM (2)
[0045] Other steps and parameters are the same as those in Embodiment 1 or 2.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com