Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

L1 cache sharing method for GPU

A high-speed cache and S12 technology, applied in the GPU field, can solve problems such as large memory access overhead, unbalanced resource utilization, and inability to fully utilize L1 cache resources.

Active Publication Date: 2021-09-10
NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, although spatially multitasking GPUs that run computation-intensive programs and memory-intensive programs at the same time can effectively improve the overall resource utilization of the system, running different programs on different SMs will cause resources on the SMs, especially the L1 cache (Level 1 cache, L1 Cache) Resource utilization is unbalanced, which affects the further improvement of multi-tasking GPU performance
Specifically, for SMs running memory-intensive programs, such programs will generate a large number of memory access requests, resulting in excessive use of L1 cache resources, high L1 cache failure rate, and invalid requests through the on-chip interconnection network It is sent to the L2 cache (secondary cache, L2 Cache) and the storage system, which will bring a large memory access overhead; for SMs running computationally intensive programs, there are very few memory access requests for such programs, Causes L1 cache resources to be underutilized

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • L1 cache sharing method for GPU
  • L1 cache sharing method for GPU

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the purpose, technical solution and advantages of the present invention clearer, the technical solution of the present invention will be clearly and completely described below in conjunction with specific embodiments of the present invention and corresponding drawings. Apparently, the described embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts fall within the protection scope of the present invention.

[0034] The technical solution provided by an embodiment of the present invention will be described in detail below with reference to the accompanying drawings.

[0035] see figure 1 , an embodiment of the present invention provides a method for sharing an L1 cache of a GPU, the method is used for a spatially multitasking GPU that simultaneously runs a computati...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an L1 cache sharing method for a GPU (Graphics Processing Unit), which comprises the following steps: S11, judging whether a local memory access request is empty or not, if so, executing S21, and if not, executing S12; s12, taking out the request to access the L1 cache; s13, judging whether the target is hit or not, if yes, returning data, and if not, executing S14; s14, judging whether the program is a storage-intensive program or not, if so, sending the request to other SM and executing S15, and if not, sending the request to L2 cache; s15, judging whether a cache data block needs to be replaced or not, and if yes, sending a data block replacement request to other SM; s21, judging whether the far-end memory access request is empty or not, and if not, executing S22; s22, taking out the request to access the L1 cache; s23, judging whether the request is hit or not, if yes, returning data, and executing S24, and if not, sending the request to the L2 cache, and executing S24; and S24, judging whether the far-end data request is empty or not, and if not, storing the data block needing to be replaced into the L1 cache. According to the invention, the operation of the storage-intensive program and the use of the L1 cache on the SM of the operation calculation-intensive program can be realized.

Description

technical field [0001] The invention relates to the technical field of GPUs, in particular to an L1 cache sharing method for GPUs. Background technique [0002] Graphics Processing Unit (GPU) is a microprocessor used to do image and graphics-related calculations. GPU is widely used in cloud computing platforms and data centers because of its powerful computing capabilities, providing users with the calculation. Compared with a single-task GPU that only runs one task on the GPU, a multi-task GPU can run multiple tasks on the GPU at the same time, which can effectively improve resource utilization. Specifically, a multi-task GPU can simultaneously run a computation-intensive program and a storage-intensive program on one GPU, and the computation resources and storage resources on the GPU can be fully utilized at the same time. [0003] At present, the spatial multitasking method is mainly used to realize the GPU to run multiple tasks at the same time. Specifically, in the sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F12/0811G06T1/20
CPCG06T1/20G06F12/0811Y02D10/00
Inventor 赵夏何益百张拥军张光达陈任之隋京高王承智王璐王君展
Owner NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products