Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

74 results about "Superscalar" patented technology

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor (or a core if the processor is a multi-core processor), but an execution resource within a single CPU such as an arithmetic logic unit.

Cycle segmented prefix circuits

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU. Similarly, to read an argument register, an instruction must somehow communicate with the most recent preceding instruction that wrote that register. A cspp circuit can implement such functions by computing for each instruction within a wrap-around instruction sequence the accumulative result of applying some associative operator to all the preceding instructions. A cspp circuit has a critical path gate delay logarithmic in the length of the instruction sequence. Depending on its associative operation and its layout, a cspp circuit can have a critical path wire delay sublinear in the length of the instruction sequence.
Owner:YALE UNIV

Fast just-in-time (JIT) scheduler

A just-in-time (JIT) compiler typically generates code from bytecodes that have a sequence of assembly instructions forming a "template". It has been discovered that a just-in-time (JIT) compiler generates a small number, approximately 2.3, assembly instructions per bytecode. It has also been discovered that, within a template, the assembly instructions are almost always dependent on the next assembly instruction. The absence of a dependence between instructions of different templates is exploited to increase the size of issue groups using scheduling. A fast method for scheduling program instructions is useful in just-in-time (JIT) compilers. Scheduling of instructions is generally useful for just-in-time (JIT) compilers that are targeted to in-order superscalar processors because the code generated by the JIT compilers is often sequential in nature. The disclosed fast scheduling method has a complexity, and therefore an execution time, that is proportional to the number of instructions in an instruction block (N complexity), a substantial improvement in comparison to the N2 complexity of conventional compiler schedulers. The described fast scheduler advantageously reorders instructions with a single pass, or few passes, through a basic instruction block while a conventional compiler scheduler such as the DAG scheduler must iterate over an instruction basic block many times. A fast scheduler operates using an analysis of a sliding window of three instructions, applying two rules within the three instruction window to determine when to reorder instructions. The analysis includes acquiring the opcodes and operands of each instruction in the three instruction window, and determining register usage and definition of the operands of each instruction with respect to the other instructions within the window. The rules are applied to determine ordering of the instructions within the window.
Owner:ORACLE INT CORP

Instruction issue control within a multi-threaded in-order superscalar processor

A multi-threaded in-order superscalar processor 2 is described having a fetch stage 8 within which thread interleaving circuitry 36 interleaves instructions taken from different program threads to form an interleaved stream of instructions which is then decoded and subject to issue. Hint generation circuitry 62 within the fetch stage 8 adds hint data to the threads indicating that parallel issue of an associated instruction is permitted with one of more other instructions.
Owner:ARM LTD

Efficient circuits for out-of-order microprocessors

The poor scalability of existing superscalar processors has been of great concern to the computer engineering community. In particular, the critical-path delays of many components in existing implementations grow quadratically with the issue width and the window size. This patent presents a novel way to reimplement these components and reduce their critical-path delay growth. It then describes an entire processor microarchitecture, called the Ultrascalar processor, that has better critical-path delay growth than existing superscalars. Most of our scalable designs are based on a single circuit, a cyclic segmented parallel prefix (cspp). We observe that processor components typically operate on a wrap-around sequence of instructions, computing some associative property of that sequence. For example, to assign an ALU to the oldest requesting instruction, each instruction in the instruction sequence must be told whether any preceding instructions are requesting an ALU. Similarly, to read an argument register, an instruction must somehow communicate with the most recent preceding instruction that wrote that register. A cspp circuit can implement such functions by computing for each instruction within a wrap-around instruction sequence the accumulative result of applying some associative operator to all the preceding instructions. A cspp circuit has a critical path gate delay logarithmic in the length of the instruction sequence. Depending on its associative operation and its layout, a cspp circuit can have a critical path wire delay sublinear in the length of the instruction sequence.
Owner:YALE UNIV

Multi-Level Dispatch for a Superscalar Processor

In an embodiment, a processor includes a multi-level dispatch circuit configured to supply operations for execution by multiple parallel execution pipelines. The multi-level dispatch circuit may include multiple dispatch buffers, each of which is coupled to multiple reservation stations. Each reservation station may be coupled to a respective execution pipeline and may be configured to schedule instruction operations (ops) for execution in the respective execution pipeline. The sets of reservation stations coupled to each dispatch buffer may be non-overlapping. Thus, if a given op is to be executed in a given execution pipeline, the op may be sent to the dispatch buffer which is coupled to the reservation station that provides ops to the given execution pipeline.
Owner:APPLE INC

Configuration steering for a reconfigurable superscalar processor

A reconfigurable processor including a plurality of reconfigurable execution units, a memory, an instruction queue, a configuration selection unit, and a configuration loader. The memory stores a plurality of steering vector processing hardware configurations for configuring the reconfigurable execution units. The instruction queue stores a plurality of instructions to be executed by at least one of the reconfigurable execution units. The configuration selection unit analyzes the instructions stored in the instruction queue and chooses one of the steering vector processing hardware configurations. The configuration loader determines whether one of the reconfigurable slots is available and reconfigures at least one of the reconfigurable slots with at least a part of the chosen steering vector processing hardware configuration responsive to at least one of the reconfigurable slots being available.
Owner:THE BOARD OF RGT UNIV OF OKLAHOMA

Multi-instruction out-of-order transmitting method based on instruction withering and processor

ActiveCN111538534ASolve the problem of not being able to increase the number of entries in the launch queueSolve the problem of increasing latencyConcurrent instruction executionEnergy efficient computingEngineeringLow delay
The invention discloses a multi-instruction out-of-order transmitting method based on instruction withering and a processor, and belongs to the field of processor design. According to the invention, aredundant arbitration structure in a traditional transmitting architecture is abandoned, an instruction withering circuit is added, and an instruction age array is adopted to represent the storage time of instructions in a CPU. In addition, an awakening state bit is added, the instructions exceeding the withering threshold value are stored in a settling pond so that a CPU can directly transmit the instructions, circuit structures such as an instruction request circuit, an instruction distribution circuit and an awakening circuit are improved, and the time sequence of a key path in the processor for multi-instruction transmission is effectively improved; and when an instruction is awakened, delayed awakening is performed on an instruction with a short execution period, the instruction witha long execution period is awakened in advance so as to ensure that the instruction can be executed back to back, the requirements of high power consumption ratio, low delay and high IPC in a modernsuperscalar out-of-order processor are met, and the problems that in the prior art, the number of items of a launch queue table of a processor cannot be increased day by day, and delay is also increased day by day are solved.
Owner:JIANGNAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products