Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method For Changing A Thread Priority In A Simultaneous Multithread Processor

a multi-thread processor and priority technology, applied in the field of methods and circuitry of processors, can solve the problems of diminishing returns, instruction results also have a tendency to depend on the outcome, and a rather impressive downturn in the rate of increased performan

Inactive Publication Date: 2008-05-08
IBM CORP
View PDF18 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Benefits of technology

The present invention relates to a system for altering the priority of threads under software control. The system uses a special form of a "no operation" (NOP) instruction, termed thread priority NOP, to change the priority of a thread. The NOP is decoded and a special code is written into the completion table of the NOP, indicating which instruction group contains the code. The priority of a thread is changed only when the special code is completed. This prevents the priority of a thread from changing speculatively. The technical effect of the invention is to improve the efficiency and accuracy of software control over threads.

Problems solved by technology

The only problem is that these instructions also have a tendency to depend upon the outcome of prior instructions.
However, using this technique has achieved a rather impressive downturn in the rate of increased performance and in fact has been showing diminishing returns.
Assuming that the application is written to execute in a parallel manner (multithreaded), there are inherent difficulties in making the program written in this fashion execute faster proportional to the number of added processors.
However, there are problems with CMP.
In this way, a CMP chip is comparatively less flexible for general use, because if there is only one thread, an entire half of the allotted resources are idle and completely useless (just as adding another processor in a system that uses a singly threaded program is useless in a traditional multiprocessor (MP) system).
So, it can only store as many threads as there are physical locations for each of these threads to store the state of their execution.
Whereas much of a CMP processor remains idle when running a single thread and the more processors on the CMP chip makes this problem more pronounced, an SMT processor can dedicate all functional units to the single thread.
However, in some instances, this disrupts the traditional organization of data, as well as instruction flow.
The branch prediction unit becomes less effective when shared, because it has to keep track of more threads with more instructions and will therefore be less efficient at giving an accurate prediction.
This means that the pipeline will need to be flushed more often due to miss prediction, but the ability to run multiple threads more than makes up for this deficit.
The penalty for a misprediction is greater due to the longer pipeline used by an SMT architecture (by two stages), which is in turn due to the rather large register file required.
Another issue is the number of threads in relation to the size of caches, the line sizes of caches, and the bandwidth afforded by them.
As is the case for single-threaded programs, increasing the cache-line size decreases the miss rate but also increases the miss penalty.
Having support for more threads which use more differing data exacerbates this problem and thus less of the cache is effectively useful for each thread.
As before, increasing the associative level of blocks increased the performance at all times; however, increasing the block size decreased performance if more than two threads were in use.
This was so much so that the increase in the degree of association of blocks could not make up for the deficit caused by the greater miss penalty of the larger block size.
A variety of conditions may lead to pipeline stalls wherein instructions from a thread cannot be immediately executed.
If a process for examining NOP instructions for conditions allowing thread priority modification is done incorrectly, it may require special processing by the FXUs or it may slow down the processing of all NOP instructions.
When to switch from one thread to another is made more difficult by the fact that instructions are pipelined and execution may occur out-of-order.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method For Changing A Thread Priority In A Simultaneous Multithread Processor
  • Method For Changing A Thread Priority In A Simultaneous Multithread Processor
  • Method For Changing A Thread Priority In A Simultaneous Multithread Processor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention. However, it will be obvious to those skilled in the art that the present invention may be practiced without such specific details. In other instances, well-known circuits may be shown in block diagram form in order not to obscure the present invention in unnecessary detail. For the most part, details concerning timing, data formats within communication protocols, and the like have been omitted inasmuch as such details are not necessary to obtain a complete understanding of the present invention and are within the skills of persons of ordinary skill in the relevant art.

[0031] Refer now to the drawings wherein depicted elements are not necessarily shown to scale and wherein like or similar elements are designated by the same reference numeral through the several views.

[0032] Referring to FIG. 1, there are illustrated details of CPU 410. CPU 410 i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

An SMT system is designed to allow software alteration of thread priority. In one case, the system signals a change in a thread priority based on the state of instruction execution and in particular when the instruction has completed execution. To alter the priority of a thread, the software uses a special form of a “no operation” (NOP) instruction (hereafter termed thread priority NOP). When the thread priority NOP is dispatched, its special NOP is decoded in the decode unit of the IDU into an operation that writes a special code into the completion table for the thread priority NOP. A “trouble” bit is also set in the completion table that indicates which instruction group contains the thread priority NOP. The trouble bit indicates that special processing is required after instruction completion. The thread priority instruction is processed after completion using the special code to change a thread's priority.

Description

TECHNICAL FIELD [0001] The present invention relates in general to methods and circuitry for a processor having simultaneous multithreading (SMT) and single thread operation modes. BACKGROUND INFORMATION [0002] For a long time, the secret to more performance was to execute more instructions per cycle, otherwise known as Instruction Level Parallelism (ILP), or decreasing the latency of instructions. To execute more instructions each cycle, more functional units (e.g., integer, floating point, load / store units, etc.) have to be added. In order to more consistently execute multiple instructions, a processing paradigm called out-of-order processing (OOP) may be used, and in fact, this type of processing has become mainstream. [0003] OOP arose because many instructions are dependent upon the outcome of other instructions, which have already been sent into the processing pipeline. To help alleviate this problem, a larger number of instructions are stored in order to allow immediate execut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(United States)
IPC IPC(8): G06F9/30G06F9/38G06F15/00
CPCG06F9/30076G06F9/3009G06F9/3865G06F9/3851G06F9/3842
Inventor BURKY, WILLIAM E.KALLA, RONALD N.SCHROTER, DAVID A.SINHAROY, BALARAM
Owner IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products