Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

69results about "Number-of-one counters" patented technology

Microprocessors

A processor (100) is provided that is a programmable fixed point digital signal processor (DSP) with variable instruction length, offering both high code density and easy programming. Architecture and instruction set are optimized for low power consumption and high efficiency execution of DSP algorithms, such as for wireless telephones, as well as pure control tasks. The processor includes an instruction buffer unit (106), a program flow control unit (108), an address / data flow unit (110), a data computation unit (112), and multiple interconnecting busses. Dual multiply-accumulate blocks improve processing performance. A memory interface unit (104) provides parallel access to data and instruction memories. The instruction buffer is operable to buffer single and compound instructions pending execution thereof. A decode mechanism is configured to decode instructions from the instruction buffer. The use of compound instructions enables effective use of the bandwidth available within the processor. A soft dual memory instruction can be compiled from separate first and second programmed memory instructions. Instructions can be conditionally executed or repeatedly executed. Bit field processing and various addressing modes, such as circular buffer addressing, further support execution of DSP algorithms. The processor includes a multistage execution pipeline with pipeline protection features. Various functional modules can be separately powered down to conserve power. The processor includes emulation and code debugging facilities with support for cache analysis.
Owner:TEXAS INSTR INC

Multiplier accumulator circuits

A multiply-accumulate (MAC) unit, having a first binary operand X, a second binary operand Y, a third binary operand, Booth recode logic for generating a plurality of partial products from said first and second operands, a Wallace tree adder for reducing the partial products and for selectively arithmetically combining the reduced partial products with said third operand, a final adder for generating a final sum, and a saturation circuitry for selectively rounding or saturating said final sum is provided. A dual MAC unit is also provided.
Owner:TEXAS INSTR INC

Library of low-cost low-power and high-performance multipliers

Disclosed is an apparatus and method for producing a library of low-cost, low-power multipliers which are easy to build, have self testing capabilities, and are regular. The multipliers multiply a first word having N bits by a second word having M bits and include a plurality of smaller multipliers each including a single array of borrow parallel counters for receiving a trisected input and processing at least part of a trisected input according to a predetermined formula, an x:2 (where x=3, 2) counter which may be coupled with at least one borrow parallel counter to form a synthesized borrow parallel counter, and an adder coupled to an output of at least one of the borrow parallel counters, the adder for summing the output of the at least one borrow parallel adder. Each of the smaller multipliers receives a trisected input and an adder for receiving and summing the outputs of the smaller multipliers.
Owner:THE RES FOUND OF STATE UNIV OF NEW YORK

Rounding mechanisms in processors

An arithmetic unit, for example a multiply and accumulate (MAC) unit 42, for a processing engine includes a partial product reduction tree 480. The partial product reduction tree will generate carry results and provides a final output to a final adder 470 connected to the partial production reduction tree. Unbiased rounding logic 476 is provided. A carry propagation tree is responsive to the carry results for anticipating a zero on each of N least significant bits of the final adder. When zero is anticipated on each of N least significant bits of the final adder, the carry propagation tree is operable to generate an output signal 477 which is used by the unbiased rounding stage to force the (N+1)th least significant bit of the final adder to zero. Through the use of a carry propagation tree to predict, or anticipate zeros on the N least significant bits, unbiased rounding can be effected without a time penalty in that a carry propagation tree can be configured to be at least a rapid as the carry propagation of the final adder. Where a zero anticipation function is provided, this can also be mapped onto the carry propagation tree, thus providing an efficient hardware implementation through sharing of that hardware between functions.
Owner:TEXAS INSTR INC

Methods, apparatus, instructions, and logic to provide vector address conflict resolution with vector population count functionality

Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
Owner:INTEL CORP

Reconfigurable matrix multiplier architecture and extended borrow parallel counter and small-multiplier circuits

A dynamically or run-time reconfigurable matrix multiplier architecture with a reconfiguration mechanism for computing the product of matrices Xp×r and Yr×q for any integers p, q, r and any item precision b, i.e., bitwidth, ranging from 4 to 64 bits is described. The reconfigurable matrix multiplier uses borrow parallel counters with new circuits, 6—0, and 6—1 and the improved small multiplier library. The reconfigurable matrix multiplier architecture is based on a novel scheme of trading data bitwidth for processing array or matrix size. The matrix multiplier achieves an extra compact, low power, high speed design through the use of a borrow parallel counters and a library of small borrow parallel multiplier circuits. The matrix multiplying processor using area comparable with a single 64×64-b multiplier constructed of very large-scale integrated (VLSI) circuits, can be reconfigured to produce the product of two matrices X(4×4) and Y(4×4) of 8, 16, and 32-bit data items in every 1, 4, and 16 pipeline cycles, respectively, or the product of two 64-b numbers in every pipeline cycle.
Owner:THE RES FOUND OF STATE UNIV OF NEW YORK

A processor with apparatus for verifying instruction parallelism

A processing engine 10 for executing instructions in parallel comprises an instruction buffer 600 for holding at least two instructions, with the first instruction 602 in a first position and the second instruction 604 in a second position. A first decoder 612 provides decoding of the first instruction and generates first control signals. The first control signals include first resource control signals, first address generation control signals, and a first validity signal indicative of the validity of the first instruction in the first position. A second decoder 614 provides decoding of the second instruction and generates second control signals. The second control signals include second resource control signals, second address generation control signals, and a second validity signal indicative of the validity of the second instruction in the second position. Arbitration and merge logic 628, 630 is provided for arbitrating between the first and second control signals and for merging the first and second control signals for controlling power of execution of the instructions in accordance with a set of parallelism rules. A conditional execution unit 634 is responsive to false condition signals from the arbitration and merge logic to inhibit or modify the effect of the control signals. The parallelism rules provide for efficient instruction execution, and the avoidance of resource conflicts.
Owner:TEXAS INSTR INC

High speed low power 4-2 compressor

A high speed low powered 4-2 compressor according to the present invention performs an XOR / XNOR operation of input data by using a single input type NAND / NOR logic circuit and a dual input type NAND / NOR logic circuit. Thus, delays to generate complementary signals are avoided. In addition, the 4-2 compressor uses a single railed multiplexer instead of a dual railed multiplexer, so that gate drive nodes and internal load capacitance are reduced. As a result, circuit area and power consumption of the 4-2 compressor are reduced.
Owner:SAMSUNG ELECTRONICS CO LTD

Family of low power, regularly structured multipliers and matrix multipliers

A family of embodiments of a new class of CMOS VLSI computer multiplier circuits that are simpler to fabricate, smaller, faster, more efficient in their use of power, and easier to scale in size than the prior art. The normal binary adder circuit unit is replaced by the innovative shift switch circuit unit. Use of the shift switch circuit sharply reduces fluctuations of power caused by plurality variations in the bit representations of the input, intermediate and output numbers. Reduced-scale devices are used in shift-switch pass-transistor signal restoration circuits, significantly reducing the size, power demand, and power dissipation of internal circuitry, in contrast to ordinary multiplier design. The simplicity of the circuit design allows multiplier partial-product reduction in fewer logic stages than existing comparable designs allow, showing speed improvement over such designs. The circuit design simplicity and the use of reduced-scale devices require less VLSI area than existing designs need, facilitating integration in VLSI microprocessors. Modular circuit organization simplifies scaling for larger operands without the circuit complications of existing designs. The design includes a critical flip of the physical layout of the partial-product matrix at each size level, simplifying the layout of traces in the circuit at all size scales. Finally, the application of reconfigurable design principles to the easily-scaled layout reduces significantly the mean demand for computing resources over a wide range of multiplication bit-width scales, as compared to existing designs. Overall, the orchestrated integration of these diverse design innovations makes possible the implementation of simpler, faster, smaller, more efficient, more flexible, and easier-to-build VLSI multiplication circuits than the current art reveals.
Owner:THE RES FOUND OF STATE UNIV OF NEW YORK

Data value addition

A data processing apparatus operable to sum data values said data processing apparatus comprising: a plurality of adder logic stages arranged in parallel with each other; control logic operable in response to receipt of a request to sum two data values to forward portions of said two data values to respective ones of said plurality of adder logic stages, such that a first adder logic stage receives a predetermined number of lowest significant bits from each of said two data values and subsequent adder logic stages receive said predetermined number of successively higher significant bits from each of said two data values, each of said plurality of adder logic stages being operable to perform a carry propagate addition of said received portions to generate an intermediate sum, a propagate value and a carry; and further logic stages operable to receive said intermediate sums, carries and propagate values generated from said plurality of adder logic stages and to combine said received intermediate sums, carries and propagate values to produce a sum of said two data values; wherein said control logic is operable in response to receipt of a request to add a third data value to said sum of said two data values, received before said further logic has completed said sum, to forward portions of said third data value to respective ones of said plurality of adder logic stages and to feedback said intermediate sums generated by said plurality of adder logic stages and to selectively feedback a carry generated from a preceding adder logic stage; and said plurality of adder logic stages are operable to perform a carry propagate addition of said fedback intermediate sums and carrys with respective portions of said third data value to generate a plurality of further intermediate sums, further carrys and further propagate values; and wherein said further logic stages are operable to receive said plurality of further intermediate sums, further carries and further propagate values and to combine said received further intermediate sums, carries and propagate values to produce a sum of said three data values.
Owner:ARM LTD

Processor with a computer repeat instruction

A processing engine, such as a digital signal processor, includes an execution mechanism, a repeat count register and a repeat count index register. The execution mechanism is operable for a repeat instruction to initialize the repeat count index register with the content of the repeat count register, and to modify the content of the repeat count register. The repeat instruction comprises two parts, the first of which initializes the repeat count index register and initiates repeat of a subsequent instruction, and the second part of which modifies the content of the repeat count register.
Owner:TEXAS INSTR INC

Parallel processing device and parallel processing method

A row decoding circuit (171) outputs a select signal to a row set in a row range setting unit (172) to select a select signal line (103), processing results from processing circuits (102) on this row are output to a data output line (104), and a row adder (106) adds processing results output to a data output line (104) of a column set in a column range selector (105).
Owner:NIPPON TELEGRAPH & TELEPHONE CORP

A 6-to-3 carry-save adder

A 6-to-3 carry-save binary adder is disclosed. The 6-to-3 carry-save adder includes a means for receiving six data inputs and a means for simultaneously adding the six data inputs to generate a first data output, a second data output, and a third data output. The first data output is a SUM output, the second data output is a CARRY-2 output, and the third data output is a CARRY-4 output.
Owner:IBM CORP

Dual access instruction and compound memory access instruction with compatible address fields

A processing engine 10 includes an instruction buffer 502 operable to buffer single and compound instructions pending execution. A decode mechanism is configured to decode instructions from the instruction buffer. The decode mechanism is arranged to respond to a predetermined tag in a tag field of an instruction, which predetermined tag is representative of the instruction being a compound instruction formed from separate programmed memory instructions. The decode mechanism is operable in response to the predetermined tag to decode at least first data flow control for a first programmed instruction and second data flow control for a second programmed instruction. The use of compound instructions enables effective use of the bandwidth available within the processing engine. A soft dual memory instruction can be compiled from separate first and second programmed memory instructions. A compound address field of the predetermined compound instruction can be arranged at the same bit positions as the address field for a hard compound memory instruction, that is a compound instruction which is programmed. In this case the decoding of the addresses can be started before the operation code of the instructions have been decoded. To reduce the number of bits in the compound instruction, addressing can be restricted to indirect addressing and the operation codes for at least the first instruction can be reduced in size. In this way, the compound instruction can be arranged to have the same number of bits in total as the sum of the bits of the separate programmed instructions.
Owner:TEXAS INSTR INC

Population count approximation circuit and method thereof

A circuit and method provides an estimate of a population count (popcount) of a plurality of input bit values. In one form the input bit values represent respective nodes of an integrated circuit. An approximation circuit uses an approximation input stage which receives a plurality of data inputs and has a plurality of logic circuits. Each logic circuit provides a single bit output. The approximation circuit provides monotonic accuracy. A reduction tree receives the single bit outputs of the plurality of logic circuits and provides an approximate count of how many of the plurality of data inputs are asserted. Size and speed are improved by providing the estimate as opposed to an exact value.
Owner:NXP USA INC

Stack Pointer Management

A processor (100) is provided that is a programmable digital signal processor (DSP) with variable instruction length. A user stack region (910) is used to pass variables to a subroutine and to hold values representative of a first portion of a program counter (1000). A system stack region (911) is used to hold values representative of a remaining portion of the program counter (1001) and to hold additional context information. The user stack region and the system stack region are managed independently so that software from a prior generation processor can be translated to run on processor (100).
Owner:TEXAS INSTR INC

Carry save adder and its system

A 4-to-2 carry store adder that reduces output sum and carry delays. The 4-to-2 store adder may include a lower-order full adder coupled to a higher-order full adder. The carry store adder may also include a logic unit coupled to the higher order full adder, wherein the logic unit is configured to generate a carry input to the higher order full adder, typically generated from a carry store adder at a previous stage. By generating the carry (input bit) in the current stage, rather than the previous stage, the delay of the input bit to the higher order full adder is reduced, and thus the delay of the output sum of the higher order full adder and the carry is reduced.
Owner:INT BUSINESS MASCH CORP

Vector address conflict resolution with vector population count functionality

Instructions and logic provide SIMD address conflict resolution with vector population count functionality. Some embodiments include processors with a register with a variable plurality of data fields, each of the data fields to store a variable second plurality of bits. A destination register has corresponding data fields, each of these data fields to store a count of the number of bits set to one for corresponding data fields. Responsive to decoding a vector population count instruction, execution units count the number of bits set to one for each of data fields in the register, and store the counts in corresponding data fields of the first destination register. Vector population count instructions can be used with variable sized elements and conflict masks to generate iteration counts and completion masks to be used each iteration to resolve dependencies in gather-modify-scatter SIMD operations.
Owner:INTEL CORP

Logic circuit

Circuit for selecting a second set of binary inputs according to the number of high input signals applied to a first input set. A first subcircuit has the first input set, logic generating control output signals, each control output signal represents whether the first input set has exactly a predetermined number of high input signals. Each control output signal corresponds to a different predetermined number of high input signals. A second subcircuit has a second input set, a set of control inputs for receiving control output signals from the first subcircuit, and logic including a plurality of switches including one or more pass gates. Each switching component switches to connect or isolate one of the second input set to a common output. The control inputs control the switches. The first and second subcircuits are configured such that only one switch can be switched to connect at a time.
Owner:FORTE DESIGN SYSTEMS

4:2 Carry save adder and 4:2 carry save adding method

Provided are a simplified 4:2 carry save adder (CSA) cell and a 4:2 carry save adding method. The 4:2 CSA cell is formed of an odd detector and first through sixth switches through logic optimization. The odd detector generates an XOR of the first through fourth input signals, outputs the XOR as an odd signal, generates an XOR of the first and second input signals, and outputs the XOR as a first XOR signal. The first switch outputs the third input signal as a carry output signal in response to the first XOR signal. The second switch outputs the first input signal as the carry output signal in response to an inverted first XOR signal. The third switch outputs the carry input signal as a carry signal in response to the odd signal. The fourth switch outputs the fourth input signal as the carry signal in response to an inverted odd signal. The fifth switch outputs an inverted carry input signal as a sum signal in response to the odd signal. The sixth switch outputs the carry input signal as the sum signal in response to the inverted odd signal.
Owner:SAMSUNG ELECTRONICS CO LTD

Generalized programmable counter arrays

A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
Owner:ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)

High speed low power 4-2 compressor

A high speed low powered 4-2 compressor according to the present invention performs an XOR / XNOR operation of input data by using a single input type NAND / NOR logic circuit and a dual input type NAND / NOR logic circuit. Thus, delays to generate complementary signals are avoided. In addition, the 4-2 compressor uses a single railed multiplexer instead of a dual railed multiplexer, so that gate drive nodes and internal load capacitance are reduced. As a result, circuit area and power consumption of the 4-2 compressor are reduced.
Owner:SAMSUNG ELECTRONICS CO LTD

4-to-2 carry save adder using limited switching dynamic logic

A 4-to-2 carry save adder using limited switching dynamic logic (LSDL) to reduce power consumption while reducing the delay of outputting the sum and carry bits. The 4-to-2 carry save adder may include a first LSDL circuit configured to output a sum bit. The carry save adder may further include a second LSDL circuit configured to output a carry bit. Both the first and second LSDL circuits use a carry generated in the current stage that was previously generated in the previous stage (next lower order bit position). Since the carry is generated in the current stage and not in the previous stage, the delay in outputting the sum and carry bits is reduced and hence the performance of carry save adders is improved. Further, since LSDL circuits were used in the carry save adder, power consumption was reduced while using a small amount of area.
Owner:IBM CORP

Carry-ripple adder

A carry-ripple adder having inputs for supplying three input bits of equal significance 2n that are to be summed and two carry bits of equal significance 2n+1 that are also to be summed. A calculated sum bit of significance 2n and two calculated carry bits of equal significance 2n+1 which are higher than the significance 2n of the sum bit are provided at outputs. A final carry-ripple stage VMA may be used even after a reduction to three bits.
Owner:INFINEON TECH AG

Computing carry-in bit to most significant bit carry save adder in current stage

A 4-to-2 carry save adder with a reduction in the delay of outputting the sum and carry bits. The 4-to-2 carry save adder may include a lower order full order coupled to a higher order full adder. The carry save adder may further include a logic unit coupled to the higher order full adder where the logic unit is configured to generate a carry bit to be inputted to the higher order full adder that normally would be generated from the carry save adder located in the previous stage. By generating this carry bit (carry-in bit) in the current stage and not in the previous stage, the delay of the carry-in bit inputted to the higher order full adder is reduced thereby reducing the delay of outputting the sum and carry bits by the higher order full adder.
Owner:IBM CORP

Data bus inversion detection mechanism

A bus inversion apparatus includes exclusive-OR gates and an inversion detector. The exclusive-OR gates are coupled to an instant data bus and a last data bus. The data buses have a corresponding plurality of bits, where the exclusive-OR gates perform a bitwise comparison of the data buses, and provide an exclusive-OR bus. The states of the exclusive-OR bus indicate whether corresponding bits of the data buses are different. The inversion detector counts the number of the corresponding bits that are different, and indicates that the instant data bus should be inverted. The inversion detector has a plurality of left shift circuits, each configured to perform a logical left shift of input bits as directed by the states of shift bits, where outputs of the each of the plurality of left shift circuits indicate a number of a subgroup of the corresponding bits that are different.
Owner:VIA TECH INC

Circuit and method for correlated inputs to a population count circuit

A circuit includes a plurality of selection circuits. Each of the plurality of selection circuits has a first input, a second input, a control input, and an output. Each of the first inputs receives one of a plurality of correlated signals. Each of the second inputs receives one of a plurality of uncorrelated signals. Each of the control inputs receives a correlation mode control signal, and each of the outputs provides the one of the plurality of correlated signals or the one of the plurality of uncorrelated signals based on the correlation mode control signal. The circuit further includes a population count circuit having a plurality of data inputs coupled to receive the outputs of the plurality of selection circuits. The population count circuit provides a population count for the plurality of data inputs. The population count may be an approximate count or an accurate count.
Owner:NXP USA INC

Parity generation circuit, counter circuit, and counting method

A circuit outputs, upon receipt of data and a parity of the data, count information on the number of bits in the data represented as a base-n number (n: a natural number equal to or larger than 2) and the parity of the count information. The circuit includes a determining unit and an inverting unit. The determining unit determines that the number of bits in the data represented as a base-n number is a specific value. The inverting unit outputs, as the parity of the count information, any one of a value of the parity of the data and an inverted value of the parity depending on a result of determination by the determining unit.
Owner:FUJITSU LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products