Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Instruction execution based on outstanding load operations

a technology of execution and load operations, applied in the field of program execution, can solve problems such as complicated thread schedulers

Inactive Publication Date: 2012-03-29
NVIDIA CORP
View PDF5 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

[0007]One embodiment of the present invention sets forth a technique for scheduling and executing dependent instructions based on outstanding load operations. This invention sets forth a multi-threaded processor architecture intended to reduce the area and power of a thread execution unit and increase thread processing efficiency. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. The non-strand threads of the pending threads have encountered a latency event, such as a non-local memory access, so the threads are separated into two sets to hide the longer main memory access latency. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, including whether outstanding load operations have been executed. The two-level scheduler selects strands for execution based on strand state.
[0009]The outstanding load count enables a scheduler to track the number of outstanding load operations that need to complete execution before a dependent instruction may be executed.

Problems solved by technology

Extreme multi-threading requires a complicated thread scheduler.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Instruction execution based on outstanding load operations
  • Instruction execution based on outstanding load operations
  • Instruction execution based on outstanding load operations

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021]In the following description, numerous specific details are set forth to provide a more thorough understanding of the present invention. However, it will be apparent to one of skill in the art that the present invention may be practiced without one or more of these specific details. In other instances, well-known features have not been described in order to avoid obscuring the present invention.

System Overview

[0022]FIG. 1 is a block diagram illustrating a computer system 100 configured to implement one or more aspects of the present invention. Computer system 100 includes a central processing unit (CPU) 102 and a system memory 104 communicating via an interconnection path that may include a memory bridge 105. Memory bridge 105, which may be, e.g., a Northbridge chip, is connected via a bus or other communication path 106 (e.g., a HyperTransport link) to an I / O (input / output) bridge 107. I / O bridge 107, which may be, e.g., a Southbridge chip, receives user input from one or mor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

One embodiment of the present invention sets forth a technique for scheduling thread execution in a multi-threaded processing environment. A two-level scheduler maintains a small set of active threads called strands to hide function unit pipeline latency and local memory access latency. The strands are a sub-set of a larger set of pending threads that is also maintained by the two-leveler scheduler. Pending threads are promoted to strands and strands are demoted to pending threads based on latency characteristics, such as whether outstanding load operations have been executed. The longer latency of the pending threads is hidden by selecting strands for execution. When the latency for a pending thread is expired, the pending thread may be promoted to a strand and begin (or resume) execution. When a strand encounters a latency event, the strand may be demoted to a pending thread while the latency is incurred.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS[0001]This application is a continuation of U.S. patent application titled, “Two-Level Scheduler for Multi-Threaded Processing”, filed on Jun. 1, 2011 and having Ser. No. 13 / 151,094 (Attorney Docket Number NVDA / SC-10-0208-US0-US1) which claims priority benefit to U.S. provisional patent application titled, “Strands: Exploiting Sub-Threads Free from Long-Latency Operations”, filed on Sep. 24, 2010 and having Ser. No. 61 / 386,248 (Attorney Docket Number NVDA / SC-10-0208-US0). This application also claims priority benefit to U.S. patent application titled, “Multi-Stranding,” filed on Sep. 24, 2010 and having Ser. No. 61 / 386,244 (Attorney Docket Number NVDA / SC-10-0209-US0). These related applications are also hereby incorporated by reference in their entirety.BACKGROUND OF THE INVENTION[0002]1. Field of the Invention[0003]The present invention generally relates to program execution and more specifically to instruction execution based on outstanding l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/30G06F9/312
CPCG06F9/3851G06F9/3887G06F9/4881
Inventor DALLY, WILLIAM JAMESLINDHOLM, JOHN ERIK
Owner NVIDIA CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products