Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

33 results about "Superscalar microprocessor" patented technology

A superscalar processor is a specific type of microprocessor that uses instruction-level parallelism to help to facilitate more than one instruction executed during a clock cycle.

Data address prediction structure and a method for operating the same

A data address prediction structure for a superscalar microprocessor is provided. The data address prediction structure predicts a data address that a group of instructions is going to access while that group of instructions is being fetched from the instruction cache. The data bytes associated with the predicted address are placed in a relatively small, fast buffer. The decode stages of instruction processing pipelines in the microprocessor access the buffer with addresses generated from the instructions, and if the associated data bytes are found in the buffer they are conveyed to the reservation station associated with the requesting decode stage. Therefore, the implicit memory read associated with an instruction is performed prior to the instruction arriving in a functional unit. The functional unit is occupied by the instruction for a fewer number of clock cycles, since it need not perform the implicit memory operation. Instead, the functional unit performs the explicit operation indicated by the instruction.
Owner:GLOBALFOUNDRIES US INC

System and method for handling load and/or store operations in a superscalar microprocessor

The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I / O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load / store unit aligns this data properly before it is returned to the instruction execution unit. Thus, the three main tasks of the load store unit are: (1) handling out of order cache requests; (2) detecting address collisions; and (3) alignment of data.
Owner:SEIKO EPSON CORP

Instruction alignment unit for routing variable byte-length instructions

An instruction alignment unit is provided which is capable of routing variable byte length instructions simultaneously to a plurality of decode units which form fixed issue positions within a superscalar microprocessor. The instruction alignment unit may be implemented with a relatively small number of cascaded levels of logic gates, thus accomodating very high frequencies of operation. In one embodiment, the superscalar microprocessor includes an instruction cache for storing a plurality of variable byte-length instructions and a predecode unit for generating predecode tags which identify the location of the start byte of each variable byte-length instruction. An instruction alignment unit is configured to channel a plurality of the variable byte-length instructions simultaneously to predetermined issue positions depending upon the locations of their corresponding start bytes in a cache line. The issue position or positions to which an instruction may be dispatched is limited depending upon the position of the instruction's start byte within a line. By limiting the number of issue positions to which a given instruction within a line may be dispatched, the number of cascaded levels of logic required to implement the instruction alignment unit may be advantageously reduced.
Owner:GLOBALFOUNDRIES INC

Register renaming system and method for managing and renaming registers

The invention relates to a register renaming system and method for managing and renaming registers. The invention specifically provides the register renaming system for managing and renaming the registers by adopting multiple renamed register queues, and the system comprises a physical register group, a register alias table (RAT), an architecture register mapping table (ARMT), a select finger of the renamed register queues, a decoder, a logic register renaming device, an RAT modifying device and an updating device of the renamed register queues. In addition, the invention further provides the method for managing and renaming the registers by adopting the multiple renamed register queues. According to the technical scheme provided by the invention, renaming operation can be simultaneously performed on the multiple registers within a same period, the implementation method is simple, the time cost is small, and the register renaming system and method are suitable for superscalar microprocessors with higher transmission width.
Owner:北京国睿中数科技股份有限公司

Multi-pipe dispatch and execution of complex instructions in a superscalar processor

In a computer system, a method and apparatus for dispatching and executing multi-cycle and complex instructions. The method results in maximum performance for such without impacting other areas in the processor such as decode, grouping or dispatch units. This invention allows multi-cycle and complex instructions to be dispatched to one port but executed in multiple execution pipes without cracking the instruction and without limiting it to a single execution pipe. Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). The FXU logic then execute these instructions on the available FXU pipes. This method results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.
Owner:IBM CORP

Apparatus and method for predicting a first microcode instruction of a cache line and using predecode instruction data to identify instruction boundaries and types

A superscalar microprocessor predecodes instruction data to identify the boundaries of instructions and the type of instruction. To expedite the dispatch of instructions, when a cache line is scanned, the first scanned instruction is predicted to be a microcode instruction and is dispatched to the MROM unit. A microcode scan circuit uses the microcode pointer and the functional bits of the predecode data to multiplex instruction specific bytes of the first microcode instruction to the MROM unit. If the predicted first microcode instruction is not the actual first microcode instruction, then in a subsequent clock cycle, the actual microcode instruction is dispatched the MROM unit and the incorrectly predicted microcode instruction is canceled.
Owner:GLOBALFOUNDRIES INC

Superscalar microprocessor having multi-pipe dispatch and execution unit

In a computer system for use as a symetrical multiprocessor, a superscalar microprocessor apparatus allows dispatching and executing multi-cycle and complex instructions Some control signals are generated in the dispatch unit and dispatched with the instruction to the Fixed Point Unit (FXU). Multiple execution pipes correspond to the instruction dispatch ports and the execution unit is a Fixed Point Unit (FXU) which contains three execution dataflow pipes (X, Y and Z) and one control pipe (R). The FXU logic then execute these instructions on the available FXU pipes. This results in optimum performance with little or no other complications. The presented technique places the flexibility of how these instructions will be executed in the FXU, where the actual execution takes place, instead of in the instruction decode or dispatch units or cracking by the compiler.
Owner:IBM CORP

Pipelined instruction dispatch unit in a superscalar processor

A pipelined instruction dispatch or grouping circuit allows instruction dispatch decisions to be made over multiple processor cycles. In one embodiment, the grouping circuit performs resource allocation and data dependency checks on an instruction group, based on a state vector which includes representation of source and destination registers of instructions within said instruction group and corresponding state vectors for instruction groups of a number of preceding processor cycles.
Owner:SUN MICROSYSTEMS INC

System and method for handling load and/or store operations an a supperscalar microprocessor

The present invention provides a system and method for managing load and store operations necessary for reading from and writing to memory or I / O in a superscalar RISC architecture environment. To perform this task, a load store unit is provided whose main purpose is to make load requests out of order whenever possible to get the load data back for use by an instruction execution unit as quickly as possible. A load operation can only be performed out of order if there are no address collisions and no write pendings. An address collision occurs when a read is requested at a memory location where an older instruction will be writing. Write pending refers to the case where an older instruction requests a store operation, but the store address has not yet been calculated. The data cache unit returns 8 bytes of unaligned data. The load / store unit aligns this data properly before it is returned to the instruction execution unit. Thus, the three main tasks of the load store unit are: (1) handling out of order cache requests; (2) detecting address collisions; and (3) alignment of data.
Owner:SEIKO EPSON CORP

System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor

An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results. In addition, the retirement control block further controls the retiring of a group of instructions determined to be retirable, by simultaneously transferring their results from the temporary buffer to the register array, and retires instructions executed in order by storing their results directly in the register array. The method comprises the steps of monitoring the status of the instructions to determine which group of instructions have been executed, determining whether each executed instruction is retirable, storing results of instructions executed out of program order in a temporary buffer, storing retirable-instruction results in a register array and retiring a group of retirable instructions by simultaneously transferring their results from the temporary buffer to the register array, and retiring instructions executed in order by storing their results directly in the register array.
Owner:SAMSUNG ELECTRONICS CO LTD

System and method for retiring approximately simultaneously a group of instructions in a superscalar microprocessor

An system and method for retiring instructions in a superscalar microprocessor which executes a program comprising a set of instructions having a predetermined program order, the retirement system for simultaneously retiring groups of instructions executed in or out of order by the microprocessor. The retirement system comprises a done block for monitoring the status of the instructions to determine which instruction or group of instructions have been executed, a retirement control block for determining whether each executed instruction is retirable, a temporary buffer for storing results of instructions executed out of program order, and a register array for storing retirable-instruction results. In addition, the retirement control block further controls the retiring of a group of instructions determined to be retirable, by simultaneously transferring their results from the temporary buffer to the register array, and retires instructions executed in order by storing their results directly in the register array. The method comprises the steps of monitoring the status of the instructions to determine which group of instructions have been executed, determining whether each executed instruction is retirable, storing results of instructions executed out of program order in a temporary buffer, storing retirable-instruction results in a register array and retiring a group of retirable instructions by simultaneously transferring their results from the temporary buffer to the register array, and retiring instructions executed in order by storing their results directly in the register array.
Owner:SAMSUNG ELECTRONICS CO LTD

A device for extending the capacity of access queue by distribution control

The invention discloses a device for extending the capacity of access queue by distribution control in the superscalar microprocessor, namely, in the instruction production line of the microprocessor, an access queue distributor is arranged at a register renaming station, and an access instruction transmission controller is arranged at an instruction transmission station. The access queue distributor checks whether the current new entry numbers to be distributed are matched with the access queue entry numbers loaded by access instructions in the transmission queue, when distributing the access queue entry numbers to each access instruction, if not, distributes new entry and sends the access instructions to the transmission queue; if so, does not distribute new entry and blocks the access instructions at the register renaming station. The access instruction transmission controller adds an judgement condition based on the normal instruction transmission condition when ready to transmit the access instructions, namely checks whether the access queue entry numbers loaded by the access instructions are matched with the access queue entry numbers loaded by the access instructions transmitted but not exited, if not, the transmission is allowed to transmit the access instruction to the executive parts; if so, the transmission is stopped to keep the access instructions in the transmission in the queue. The device pre-distributes the access queue occupied by these access instructions to the new access instructions prior to the exit of the access instructions, and the cache is in theexisting transmission queue, and on the premise of no coverage of access queue information, the number of the access instruction on the stream line is increased to indirectly extend the capacity of the access queue and make up the performance loss of the common access queue control methods.
Owner:上海高性能集成电路设计中心

Method, system, computer program product, and hardware product for implementing result forwarding between differently sized operands in a superscalar processor

Result and operand forwarding is provided between differently sized operands in a superscalar processor by grouping a first set of instructions for operand forwarding, and grouping a second set of instructions for result forwarding, the first set of instructions comprising a first source instruction having a first operand and a first dependent instruction having a second operand, the first dependent instruction depending from the first source instruction; the second set of instructions comprising a second source instruction having a third operand and a second dependent instruction having a fourth operand, the second dependent instruction depending from the second source instruction, performing operand forwarding by forwarding the first operand, either whole or in part, as it is being read to the first dependent instruction prior to execution; performing result forwarding by forwarding a result of the second source instruction, either whole or in part, to the second dependent instruction, after execution; wherein the operand forwarding is performed by executing the first source instruction together with the first dependent instruction; and wherein the result forwarding is performed by executing the second source instruction together with the second dependent instruction.
Owner:IBM CORP

Method for realizing streamline retiring of store instruction in superscalar microprocessor

The invention relates to a method for realizing streamline retiring of a store instruction in a superscalar microprocessor. The method is characterized in that the automatic sequencing function of buffer of three types is utilized, the interface protocol of the buffer of the three types is improved, the executing conditions of the store instruction are weakened, and therefore, the retiring of the store instruction is sped up. According to the method, the quantity of instructions retired in each clock period and the quantity of the store instruction of writing first-stage data cache in each clock cycle are properly allocated, thus the store instruction streamline retiring function can be realized under the condition that the store address hits the first-stage data cache and the writing is authorized when executing the continuous store instruction sequence, and as a result, the performance of the microprocessor can be obviously improved.
Owner:上海高性能集成电路设计中心

Implementation method of vector aggregation loading instruction

The invention relates to the technical field of microprocessor design, in particular to a method for realizing a vector aggregation loading instruction, which comprises the following steps of: splitting the vector aggregation loading instruction into a plurality of single-element common loading microoperations; sending the split micro-operation and the corresponding element serial number to an instruction queue; after the operands are prepared, sending the single-element loading microoperation to a storage pipeline to obtain data; writing the obtained data into corresponding elements of the corresponding data cache items; and after all element data of the data cache item is written in, writing result data to a result bus from the data cache, and finishing the execution of the vector aggregation loading instruction. The method can effectively improve the execution performance of the vector aggregation loading instruction, can utilize the path of a common loading instruction to the maximum extent, is suitable for a high-performance out-of-order superscalar microprocessor, and has the advantages of being simple to implement and high in performance.
Owner:NAT UNIV OF DEFENSE TECH

Zero-value register implementation method and device

The invention relates to an out-of-order superscalar microprocessor design technology, in particular to a zero-value register implementation method and device. The method comprises the following stepsof: adding an identification field is_zero in a register renaming mapping table, wherein the identification field is_zero represents whether the register is a zero value register or not; and readingand transmitting the field in the register renaming stage to an execution component step by step along with an assembly line, wherein the identification field is_zero is used as a selection signal, data 0 is directly selected when a source operand comes from a zero value register, and data does not need to be obtained from a bypass or a physical register. According to the method, the complexity ofregister renaming, physical register writing and data bypass logic is reduced, and the zero-value register implementation method and device are suitable for the advantage of simple logic implementation.
Owner:NAT UNIV OF DEFENSE TECH

Instruction processing method and its applicable superscalar pipeline microprocessor

ActiveCN101907984BAdded lookahead capabilityTake advantage of instruction-level parallelizationConcurrent instruction executionArithmetic logic unitProcessor register
A super-scale pipeline microprocessor and command processing method. The super-scale pipeline microprocessor has a register aggregation, a high speed cache, a performing unit and a load application unit coupled to the high speed cache, that are defined by the command set framework of the super-scale pipeline microprocessor. The load application unit is different from other performing units of the super-scale pipeline microprocessor, and the load application unit includes an arithmetic logic unit. The load application unit receiving a first command, the first command designates the first storage address of the first origin operand, operation performed on the first origin operand and generating result, and a first destination register in the register aggregation to store the result. The load application unit reads the first origin operand from the high speed cache. The arithmetic logic unit performs operation on the first origin operand to generate result, rather than forwarding the first origin operand to anyone of the other performing units to perform operation on the first origin operand to generate result. The load application unit further outputs the result for subsequent fallback to the first destination register.
Owner:VIA TECH INC

Retransmission self-trapping immediate processing method in superscalar microprocessor

The invention relates to a retransmission self-trapping immediate processing method in a superscalar microprocessor. The method comprises the following steps of after arbitration in a memory access component, reporting the retransmission self-trapping to a reordering buffer and an integer execution component; using the reordering buffer to register the received retransmission self-trapping information into a reordering buffer entry after carrying out fixed registration for two beats; after the retransmission self-trapping information received by the integer execution part is fixedly registeredfor two beats, arbitrating and sending the broadcast emptying assembly line information; using the reordering buffer to send a retransmission self-trapping request and an instruction fetching addressto the value fetching component at the next beat of receiving the retransmission self-trapping, and restarting the assembly line platform; and after the retransmission self-trapping in the reorderingbuffer entry reaches the header, deleting the retransmission self-trapping from the entry, and recovering the transmission of the retransmission self-trapping instruction. According to the method, the processing time of retransmission self-trapping is shortened, so that the instruction pipeline can be recovered and restarted more quickly.
Owner:上海高性能集成电路设计中心
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products