Processor device, instruction execution method thereof and computing equipment

A technology of instruction execution and processor, applied in the field of processor device and its instruction execution method, and computing equipment, can solve the problems of system performance loss, small number, predicate registers cannot be shared, etc., and achieve the effect of reducing usage and making full use of it

Active Publication Date: 2022-05-13
METAX INTEGRATED CIRCUITS (SHANGHAI) CO LTD
View PDF10 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the inventors of the present application found that in the current GPU chip system, the predicate registers corresponding to each warp are fixed and the number is too small, resulting in In complex scenarios, such as complex IF-ELSE nesting and Switch-Case branch condition instructions, predicate registers are not enough; and when resources are limited, each warp can still only use 7 predicate registers, and the remaining resources cannot fully utilized
In addition, the predicate registers of each warp cannot be shared between multiple warps, and the existing mechanism will also cause a loss of system performance, requiring more instructions to be executed to achieve the desired function.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Processor device, instruction execution method thereof and computing equipment
  • Processor device, instruction execution method thereof and computing equipment
  • Processor device, instruction execution method thereof and computing equipment

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0093] set_predicate_base 32

[0094] @P1 FMA R4,R5,R6,R7

[0095] The meaning of the above instruction means that the predicate base address setting special instruction is executed to set the predicate base address register PredicateBase to 32, that is, the general register W32 is used as the predicate register P0 of the current warp, and the default value is no longer 0.

[0096] Assume that general-purpose registers W32=0x 01 FF FFFF, W33=0x 80 FF FFFF, thread mask register LaneMask=0x FFFFFFFF, the above instruction indicates that the instruction corresponding to the current program counter PC is to execute the fusion multiplication and addition FMA instruction, and the fusion is executed based on the predicate register P1 Multiply and add FMA instruction.

[0097] Since the general-purpose register W33 represents the predicate register P1, this FMA instruction is invalid for threads 24-30 in the thread warp, and is valid for threads 0-23, that is, for threads 0-23, perfo...

example 2

[0099] set_predicate_base 8

[0100] @! P0 sub R7, R1, R2

[0101] The meaning of the above instruction means that the predicate base address setting special instruction is executed to set the predicate base address register PredicateBase to 8, that is, the general register W8 is the predicate register P0 of the current warp, use W8 as P0, and execute the SUB instruction after negating P0.

[0102] Assuming general register W8=0x 00 FF FFFF, then! P0=0xFF 00 00 00, then the above instruction SUB is invalid for threads 0-23 and valid for threads 24-31, that is, the subtraction operation of the operands in registers R1 and R2 is performed on threads 24-31, and the result is output to register R7.

example 3

[0104] set_predicate_base 16

[0105] @! P1 sub R7,R1,R2

[0106] The meaning of the above instruction means that the predicate base address register PredicateBase is set to 16 by executing the special predicate base address setting instruction, that is, the general register W16 is the predicate register P0 of the current warp, and W17 is the predicate register P1 of the current warp.

[0107] Assuming general register W17=0x 00 0F FF FF, then! P1=0x FFF0 00 00, then the above instruction SUB is invalid for threads 0-19 and valid for threads 20-31, that is, the subtraction of operands in registers R1 and R2 is performed on threads 20-31, and the result is output to register R7.

[0108] Figure 4 is a schematic flowchart of a method for executing an instruction by a processor device according to an embodiment of the present application. like Figure 4 As shown, the instruction execution method in the embodiment of the present application is applicable to the processor dev...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a processor device, an instruction execution method thereof and computing equipment. The apparatus comprises one or more single-instruction multi-thread processing units, and the single-instruction multi-thread processing units comprise one or more thread bundles used for executing instructions; a shared register group including a plurality of general purpose registers shared among the thread bundles; a predicate base address register which is arranged corresponding to each thread bundle and is used for indicating a base address of a group of general purpose registers which are used as predicate registers of each thread bundle in the shared register group; wherein each thread bundle performs asserted execution on instructions based on predicate values in the set of general purpose registers used as predicate registers for each thread bundle. According to the embodiment of the invention, the inherent special predicate register of each thread bundle in the original processor architecture can be canceled, the dynamic expansion of the predicate register resource of each thread bundle is realized, the full utilization of processor resources is realized, the overhead of switching instructions is reduced, and the instruction processing performance is improved.

Description

technical field [0001] The present application relates to the technical field of processors, and in particular, to a processor device, an instruction execution method thereof, and a computing device. Background technique [0002] At present, single-instruction multi-thread processing (SIMT) cores are usually used in multi-core processing chips such as CPUs and GPUs. The single-instruction multi-thread processing (SIMT) cores can have multiple basic units for executing instructions, called warps of threads (WARP). The size of the warp dictates the number of parallel threads for Single Instruction Multiple Threading. Each warp usually includes a variety of special-purpose registers, among which the predicate (Predicate) register is a key control method for implementing efficient branch instructions in the GPU. In NVIDIA's chip instruction system, usually a thread warp has 32 parallel threads (Thread), each thread warp has seven 32-bit Predicate registers P0-P6, each Predicate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F9/30G06F9/38G06F9/445
CPCG06F9/30098G06F9/3891G06F9/4451G06F9/38G06F9/445G06F9/30
Inventor 李颖
Owner METAX INTEGRATED CIRCUITS (SHANGHAI) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products