Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

153 results about "Floating-point unit" patented technology

A floating-point unit (FPU, colloquially a math coprocessor) is a part of a computer system specially designed to carry out operations on floating point numbers. Typical operations are addition, subtraction, multiplication, division, square root, and bitshifting. Some systems (particularly older, microcode-based architectures) can also perform various transcendental functions such as exponential or trigonometric calculations, though in most modern processors these are done with software library routines.

Vector register addressing

A floating point unit 26 is provided with a register bank 38 comprising 32 registers that may be used as either vector registers V or scalar registers S. Data values are transferred between memory 30 and the registers within the register bank 38 using contiguous block memory access instructions. Vector processing instructions specify a sequence of processing operations to be performed upon data values within a sequence of registers. The register address is incremented between each operation by an amount controlled by a stride value. Accordingly, the register address can be incremented by values such as 0, 1, 2 or 4 between each iteration. This provides a mechanism for retaining block memory access instructions to contiguous memory addresses whilst supporting vector matrix and / or complex operations in which the data values needed for each iteration are not adjacent to one another in the memory 30.
Owner:ARM LTD

Rapid execution of FCMOV following FCOMI by storing comparison result in temporary register in floating point unit

A microprocessor with a floating point unit configured to rapidly execute floating point compare (FCOMI) type instructions that are followed by floating point conditional move (FCMOV) type instructions is disclosed. FCOMI-type instructions, which normally store their results to integer status flag registers, are modified to store a copy of their results to a temporary register located within the floating point unit. If an FCMOV-type instruction is detected following an FCOMI-type instruction, then the FCMOV-type instruction's source for flag information is changed from the integer flag register to the temporary register. FCMOV-type instructions are thereby able to execute earlier because they need not wait for the integer flags to be read from the integer portion of the microprocessor. A computer system and method for rapidly executing FCOMI-type instructions followed by FCMOV-type instructions are also disclosed.
Owner:ADVANCED MICRO DEVICES INC

Method for checkpointing instruction groups with out-of-order floating point instructions in a multi-threaded processor

A method and apparatus are provided for dispatch group checkpointing in a microprocessor, including provisions for handling partially completed dispatch groups and instructions which modify system coherent state prior to completion. An instruction checkpoint retry mechanism is implemented to recover from soft errors in logic. The processor is able to dispatch fixed point unit (FXU), load / store unit (LSU), and floating point unit (FPU) or vector multimedia extension (VMX) instructions on the same cycle. Store data is written to a store queue when a store instruction finishes executing. The data is held in the store queue until the store instruction is checkpointed, at which point it can be released to the coherently shared level 2 (L2) cache.
Owner:IBM CORP

Smartcard power management

Portable smartcard devices, methods of operating smartcard devices, systems including a smartcard device and a terminal, and computer readable storage media including instructions for smartcard devices are provided. According to some embodiments, the smartcard includes a controller for executing commands received from a terminal, where execution of at least one command affects a power consumption of the smartcard device during subsequent execution of at least one other command. Optionally, the command which modifies smartcard power consumption is issued by the terminal in accordance with a power consumption decision. According to some embodiments, as an allowed power consumption of the smartcard device increases, the performance of the smartcard increases, and vice versa. According to some embodiments, the execution of the at least one command sets an operating parameter of the smartcard device, such as a clock frequency or a time of a non-volatile memory operation, thereby affecting the power consumption of the smartcard device during the subsequent executing. Alternatively or additionally, the execution of the at least one command enables or disables a functional unit of the smartcard device. Exemplary functional units include but are not limited to floating-point units and cryptographic units. Alternatively or additionally, the execution of the at least one command enables or disables memory such as non-volatile memory of the smartcard device. According to some embodiments, the execution of the at least one command sets an operating parameter of internal circuitry of the smartcard.
Owner:WESTERN DIGITAL ISRAEL LTD

Apparatus and method for processing data having a mixed vector/scalar register file

A floating point unit is provided with a register bank comprising 32 registers that may be used as either vector registers of scalar registers. A data processing instruction includes at least one register specifying field pointing to a register containing a data value to be used in that operation. An increase in the instruction bit space available to encode more opcodes or to allow for more registers is provided by encoding whether a register is to be treated as a vector or a scalar within the register field itself. Further, the register field for one register of the instruction may encode whether another register is a vector or a scalar. The registers can be initially accessed using the values within the register fields of the instruction independently of the opcode allowing for easier decode.
Owner:ARM LTD

Processor including hybrid redundancy for logic error protection

A processor core includes an instruction decode unit that may dispatch a same integer instruction stream to a plurality of integer execution units and may consecutively dispatch a same floating-point instruction stream to a floating-point unit. The integer execution units may operate in lock-step such that during each clock cycle, each respective integer execution unit executes the same integer instruction. The floating-point unit may execute the same floating-point instruction stream twice. Prior to the integer instructions retiring, compare logic may detect a mismatch between execution results from each of the integer execution units. In addition, prior to the results of the floating-point instruction stream transferring out of the floating-point unit, the compare logic may also detect a mismatch between results of execution of each consecutive floating-point instruction stream. Further, in response to detecting any mismatch, the compare logic may cause instructions causing the mismatch to be re-executed.
Owner:GLOBALFOUNDRIES INC

Instruction group formation and mechanism for SMT dispatch

A more efficient method of handling instructions in a computer processor, by associating resource fields with respective program instructions wherein the resource fields indicate which of the processor hardware resources are required to carry out the program instructions, calculating resource requirements for merging two or more program instructions based on their resource fields, and determining resource availability for simultaneously executing the merged program instructions based on the calculated resource requirements. Resource vectors indicative of the required resource may be encoded into the resource fields, and the resource fields decoded at a later stage to derive the resource vectors. The resource fields can be stored in the instruction cache associated with the respective program instructions. The processor may operate in a simultaneous multithreading mode with different program instructions being part of different hardware threads. When the resource availability equals or exceeds the resource requirements for a group of instructions, those instructions can be dispatched simultaneously to the hardware resources. A start bit may be inserted in one of the program instructions to define the instruction group. The hardware resources may in particular be execution units such as a fixed-point unit, a load / store unit, a floating-point unit, or a branch processing unit.
Owner:IBM CORP

Floating point unit using a central window for storing instructions capable of executing multiple instructions in a single clock cycle

A floating point unit capable of executing multiple instructions in a single clock cycle using a central window and a register map is disclosed. The floating point unit comprises: a plurality of translation units, a future file, a central window, a plurality of functional units, a result queue, and a plurality of physical registers. The floating point unit receives speculative instructions, decodes them, and then stores them in the central window. Speculative top of stack values are generated for each instruction during decoding. Top of stack relative operands are computed to physical registers using a register map. Register stack exchange operations are performed during decoding. Instructions are then stored in the central window, which selects the oldest stored instructions to be issued to each functional pipeline and issues them. Conversion units convert the instruction's operands to an internal format, and normalization units detect and normalize any denormal operands. Finally, the functional pipelines execute the instructions.
Owner:ADVANCED MICRO DEVICES INC

Rapid execution of floating point load control word instructions

A microprocessor with a floating point unit configured to rapidly execute floating point load control word (FLDCW) type instructions in an out of program order context is disclosed. The floating point unit is configured to schedule instructions older than the FLDCW-type instruction before the FLDCW-type instruction is scheduled. The FLDCW-type instruction acts as a barrier to prevent instructions occurring after the FLDCW-type instruction in program order from executing before the FLDCW-type instruction. Indicator bits may be used to simplify instruction scheduling, and copies of the floating point control word may be stored for instruction that have long execution cycles. A method and computer configured to rapidly execute FLDCW-type instructions in an out of program order context are also disclosed.
Owner:ADVANCED MICRO DEVICES INC

Apparatus and method for performing fused multiply add floating point operation

A fused multiply add floating point unit 1 includes multiplying circuitry 4 and adding circuitry 8. The multiply circuitry 4 multiplies operands B and C having N-bit significands to generate an unrounded product B*C. The unrounded product B*C has an M-bit significand, where M>N. The adding circuitry 8 receives an operand A that is input at a later processing cycle than a processing cycle at which the multiplying circuitry 4 receives operands B and C. The adding circuitry 8 commences processing of the operand A after the unrounded product B*C is generated by the multiplying circuitry 4. The adding circuitry 8 adds the operand A to the unrounded product B*C and outputs a rounded result A+B*C.
Owner:ARM LTD

Power throttling method and apparatus

Disclosed is an apparatus which deactivates both the AC as well as the DC component of power for various functions in a CPU. The CPU partitions dataflow registers and arithmetic units such that voltage can be removed from the upper portion of dataflow registers when the software is not utilizing same. Clock signals are also prevented from being applied to these non-utilized components. As an example, if a 64 bit CPU (processor unit) is to be used with both 32 and 64 bit software, the mentioned components may be partitioned in equal sized upper and lower portions. The logic signal for activating the removal of voltage may be obtained from a software-accessible architected control register designated as a machine state register in some CPUs. The same logic may be used in connection with removing voltage and clocks from other specialized functional components such as the floating point unit when software instructions do not presently require same.
Owner:IBM CORP

Floating-point processor with operating mode having improved accuracy and high performance

Floating-point units (FPUs) and processors having a “flush-to-nearest” operating mode that provides improved accuracy over a conventional “flush-to-zero” mode. The FPU or processor includes an operand processing section and an operand flush section. For each floating-point operation, the operand processing section receives and processes one or more input operands to provide a preliminary result. The operand flush section determines whether the preliminary result falls within one of a number of ranges of values and sets the preliminary result to one of a number of set values if the preliminary result falls within one of the ranges. In a specific implementation, a first range of values is defined to include values greater than zero and less than half of a minimum normalized number (i.e., 0<|y|<+amin / 2), a second range of values is defined to include values equal to or greater than +amin / 2 and less than +an, (i.e., amin / 2≦|y|<amin), and the preliminary result is set to zero if it falls within the first range and to +amin or −amin (depending on the sign bit) if it falls within the second range.
Owner:ARM FINANCE OVERSEAS LTD

Floating point unit with fused multiply add and method for calculating a result with a floating point unit

The invention proposes a Floating Point Unit (1) with fused multiply add, with one addend operand (eb, fb) and two multiplicand operands (ea, fa; ec, fc), with a shift amount logic (2) which based on the exponents of the operands (ea, eb and ec) computes an alignment shift amount, with an alignment logic (3) which uses the alignment shift amount to align the fraction (fb) of the addend operand, with a multiply logic (4) which multiplies the fractions of the multiplicand operands (fa, fc), with a adder logic (5) which adds the outputs of the alignment logic (3) and the multiply logic (4), with a normalization logic (6) which normalizes the output of the adder logic (5), which is characterized in that a leading zero logic (7) is provided which computes the number of leading zeros of the fraction of the addend operand (fb), and that a compare logic (8) is provided which based on the number of leading zeros and the alignment shift amount computes select signals that indicate whether the most significant bits of the alignment logic (3) output have all the same value in order to: a) control the carry logic of the adder logic (5) and / or b) control a stage of the normalization logic (6).
Owner:IBM CORP

Paralleling microprocessor and its realization method

The invention relates to a parallel microprocessor and a corresponding realization method which are based on FPGA development. The parallel microprocessor comprises a CPU which is a 32-bit fixed-point CPU formed by a fetch decoding unit, a process management unit and an integer instruction execution unit; a communication module formed by a plurality of units of LINK channels and In / Out controllers; an arbitration controller used for arbitrating internal and external address buses and a data bus of the CPU; an external memory interface used for providing reading / writing time-sequence logic for an external memory; an interruption / time-sequence controller used for providing timing and interruption for the CPU; an internal memory used for providing the instructions of the CPU and quickly accessing data. The CPU is also provided with a floating point unit (FPU) combining the 32-bit fixed-point CPU to form a 64-bit floating point CPU. The 32-bit fixed-point parallel microprocessor and the 64-bit floating point parallel microprocessor provided by the invention work stably, bring convenience for system modification and debugging, accelerate verification speed and provide a low-cost operation platform for programs written in OccamII language.
Owner:NEUSOFT MEDICAL SYST CO LTD

Handling floating point operations

A computing system capable of handling floating point operations during program code conversion is described, comprising a processor including a floating point unit and an integer unit. The computing system further comprises a translator unit arranged to receive subject code instructions including at least one instruction relating to a floating point operation and in response to generate corresponding target code for execution on said processor. To handle floating point operations a floating point status unit and a floating point control unit are provided within the translator. These units are cause the translator unit to generate either: target code for performing the floating point operations directly on the floating point unit; or target code for performing the floating point operations indirectly, for example using a combination of the integer unit and the floating point unit. In this way the efficiency of the computing system is improved.
Owner:IBM CORP

Processor that predicts floating point instruction latency based on predicted precision

A processor includes a prediction circuit and a floating point unit. The prediction circuit is configured to predict an execution latency of a floating point operation. The floating point unit is coupled to receive the floating point operation for execution, and is configured to detect a misprediction of the execution latency. In some embodiments, an exception may be taken in response to the misprediction. In other embodiments, the floating point operation may be rescheduled with the corrected execution latency.
Owner:GLOBALFOUNDRIES INC

Floating point multiply-accumulate unit

A floating point unit 10 provides a multiply-accumulate operation to determine a result B+(A*C). The multiplier 20 takes several processing cycles to determine the product (A*C). Whilst the multiplier 20 and its subsequent carry-save-adder 26 operate, an aligned value B' of the addend B is generated by an alignment-shifter 34. The aligned-addend B' may only partially overlap with the product (A*C) to which it is to be added using an adder 44. Any high-order-portion HOP of the aligned-addend B' that does not overlap with the product (A*C) must be subsequently concatenated with the output of the adder 44 that sums the product (A*C) with the overlapping portion of the aligned-addend B'. If the sum performed by the adder 44 generates a carry then it is an incremented version IHOP of the high-order-portion that should be concatenated with the output of the adder 44. This incremented-high-order-portion is generated by the adder 44 during otherwise idle processing cycles present due to the multiplier 20 operating over multiple cycles.
Owner:ARM LTD

Instruction group formation and mechanism for SMT dispatch

A more efficient method of handling instructions in a computer processor, by associating resource fields with respective program instructions wherein the resource fields indicate which of the processor hardware resources are required to carry out the program instructions, calculating resource requirements for merging two or more program instructions based on their resource fields, and determining resource availability for simultaneously executing the merged program instructions based on the calculated resource requirements. Resource vectors indicative of the required resource may be encoded into the resource fields, and the resource fields decoded at a later stage to derive the resource vectors. The resource fields can be stored in the instruction cache associated with the respective program instructions. The processor may operate in a simultaneous multithreading mode with different program instructions being part of different hardware threads. When the resource availability equals or exceeds the resource requirements for a group of instructions, those instructions can be dispatched simultaneously to the hardware resources. A start bit may be inserted in one of the program instructions to define the instruction group. The hardware resources may in particular be execution units such as a fixed-point unit, a load / store unit, a floating-point unit, or a branch processing unit.
Owner:INT BUSINESS MASCH CORP

Fast operand formatting for a high performance multiply-add floating point-unit

Disclosed are a floating point execution unit, and a method of operating a floating point unit, to perform multiply / add operations using a plurality of operands from an instruction having a plurality of operand positions. The floating point unit comprises a multiplier for calculating a product of two of the operands, and an aligner for combining said product and a third of the operands. A first data path is used to supply to the multiplier operands from a first and a second of the operand positions of the instruction, and a second data path is used to supply the third operand to the aligner. The floating point unit further comprises a multiplexer on the second data path for selecting, for use by the aligner, either the operand from the second operand position or the operand from the third operand position of the instruction.
Owner:IBM CORP

Floating point stack manipulation using a register map and speculative top of stack values

A floating point unit capable of executing multiple instructions in a single clock cycle using a central window and a register map is disclosed. The floating point unit comprises: a plurality of translation units, a future file, a central window, a plurality of functional units, a result queue, and a plurality of physical registers. The floating point unit receives speculative instructions, decodes them, and then stores them in the central window. Speculative top of stack values are generated for each instruction during decoding. Top of stack relative operands are computed to physical registers using a register map. Register stack exchange operations are performed during decoding. Instructions are then stored in the central window, which selects the oldest stored instructions to be issued to each functional pipeline and issues them. Conversion units convert the instruction's operands to an internal format, and normalization units detect and normalize any denormal operands. Finally, the functional pipelines execute the instructions.
Owner:ADVANCED MICRO DEVICES INC

Processing pipeline having parallel dispatch and method thereof

One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
Owner:MEDIATEK INC

Floating point unit with fused multiply add and method for calculating a result with a floating point unit

The invention proposes a Floating Point Unit (1) with fused multiply add, with one addend operand (eb, fb) and two multiplicand operands (ea, fa; ec, fc), with a shift amount logic (2) which based on the exponents of the operands (ea, eb and ec) computes an alignment shift amount, with an alignment logic (3) which uses the alignment shift amount to align the fraction (fb) of the addend operand, with a multiply logic (4) which multiplies the fractions of the multiplicand operands (fa, fc), with a adder logic (5) which adds the outputs of the alignment logic (3) and the multiply logic (4), with a normalization logic (6) which normalizes the output of the adder logic (5), which is characterized in that a leading zero logic (7) is provided which computes the number of leading zeros of the fraction of the addend operand (fb), and that a compare logic (8) is provided which based on the number of leading zeros and the alignment shift amount computes select signals that indicate whether the most significant bits of the alignment logic (3) output have all the same value in order to: a) control the carry logic of the adder logic (5) and / or b) control a stage of the normalization logic (6).
Owner:INT BUSINESS MASCH CORP

Method for floating point round to integer operation

An apparatus and method for computing a rounded floating point number. A floating point unit (FPU) receives an instruction to round a floating point number to a nearest integral value and retrieves a binary source operand having an exponent of a fixed first number of bits and a mantissa of a fixed second number of bits. If the unbiased exponent value is greater than or equal to zero and less than the fixed second number, the FPU generates a mask having N consecutive ‘1’ bits beginning with the least significant bit and whose remaining bits have a value of ‘0’, where N is equal to the fixed second number minus the unbiased exponent value. The FPU computes a bitwise OR of the source operand with the mask, increments the result if the instruction is to round up, and computes a bitwise AND of the result with the inverse of the mask.
Owner:ADVANCED MICRO DEVICES INC

Piping rounding mode bits with floating point instructions to eliminate serialization

A floating point unit is provided which conveys the rounding mode in effect upon dispatch of a particular instruction with that particular instruction into the execution pipeline of the floating point unit. Upon dispatch of a control word update instruction into the execution pipeline, the rounding mode is updated according to the updated control word provided for the control word update instruction. Instructions subsequent to the control word update instruction thereby receive the updated rounding mode as those instructions are dispatched. The updated rounding mode is available to the subsequent instructions prior to retiring the control word update instruction. The rounding mode is therefore updated without serializing the update. If the control word update instruction modifies the value in a field other than the rounding mode, the instructions subsequent to the control word update instruction may be discarded and re-executed subsequent to updating the control word register with the updated control word. In this manner, the control word update is effectively serialized for cases in which a field other than the rounding mode is updated.
Owner:ADVANCED MICRO DEVICES INC

Multiple processor core device having shareable functional units for self-repairing capability

Multiple processor cores are implemented on a single integrated circuit chip, each having its own respective shareable functional units, which are preferably floating point units. A failure of a shareable unit in one processor causes that processor to share the corresponding unit in another processor on the same chip. Preferably, a functional unit is shared on a cycle interleaved basis.
Owner:IBM CORP

Processing pipeline having stage-specific thread selection and method thereof

One or more processor cores of a multiple-core processing device each can utilize a processing pipeline having a plurality of execution units (e.g., integer execution units or floating point units) that together share a pre-execution front-end having instruction fetch, decode and dispatch resources. Further, one or more of the processor cores each can implement dispatch resources configured to dispatch multiple instructions in parallel to multiple corresponding execution units via separate dispatch buses. The dispatch resources further can opportunistically decode and dispatch instruction operations from multiple threads in parallel so as to increase the dispatch bandwidth. Moreover, some or all of the stages of the processing pipelines of one or more of the processor cores can be configured to implement independent thread selection for the corresponding stage.
Owner:ADVANCED MICRO DEVICES INC

Circuit and method for normalizing and rounding floating-point results and processor incorporating the circuit or the method

For use in a floating-point unit that supports floating-point formats having fractional parts of varying widths and employs a datapath wider than the fractional parts, a circuit and method for normalizing and rounding floating-point results and processor incorporating the circuit or the method. In one embodiment, the circuit includes: (1) left-shift circuitry for aligning a fractional part of the floating-point result with a most significant bit of the datapath and irrespective of a width of the fractional part to yield a shifted fractional part and (2) rounding circuitry, coupled to the shift circuitry, that rounds the shifted fractional part.
Owner:AVAGO TECH WIRELESS IP SINGAPORE PTE

Number format pre-conversion instructions

Apparatus for processing data includes processing circuitry 16, 18, 20, 22, 24, 26 and decoder circuitry 14 for decoding program instructions. The program instructions decoded include a floating point pre-conversion instruction which performs round-to-nearest ties to even rounding upon the mantissa field of an input floating number to generate an output floating point number with the same mantissa length but with the mantissa rounded to a position corresponding to a shorter mantissa field. The output mantissa field includes a suffix of zero values concatenated the rounded value. The decoder for circuitry 14 is also responsive to an integer pre-conversion instruction to quantise and input integer value using round-to-nearest ties to even rounding to form an output integer operand with a number of significant bits matched to the mantissa size of a floating point number to which the integer is later to be converted using an integer-to-floating point conversion instruction.
Owner:ARM LTD

Optimized allocation of multi-pipeline executable and specific pipeline executable instructions to execution pipelines based on criteria

A microprocessor with a floating point unit configured to efficiently allocate multi-pipeline executable instructions is disclosed. Multi-pipeline executable instructions are instructions that are not forced to execute in a particular type of execution pipe. For example, junk ops are multi-pipeline executable. A junk op is an instruction that is executed at an early stage of the floating point unit's pipeline (e.g., during register rename), but still passes through an execution pipeline for exception checking. Junk ops are not limited to a particular execution pipeline, but instead may pass through any of the microprocessor's execution pipelines in the floating point unit. Multi-pipeline executable instructions are allocated on a per-clock cycle basis using a number of different criteria. For example, the allocation may vary depending upon the number of multi-pipeline executable instructions received by the floating point unit in a single clock cycle.
Owner:GLOBALFOUNDRIES INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products