Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

84 results about "Floating point multiplication" patented technology

5-grade stream line structure of floating point multiplier adder integrated unit

The invention discloses a design of a full pipeline of a single precision floating point multiplication-add fused unit, which realizes multiplication-add operation in the form of A+B x C. the multiplication-add operation is realized in the following five pipelines: in the first stage pipeline, exponential difference is calculated and a part of the multiplication is completed; in the second stage pipeline, A and B x C are aligned according to the exponential difference, effective subtraction and complement are performed, the rest multiplication is completed, simultaneously, the exponent is divided into six states, and the calculation method of normalized shift amount in different states are different; in the third stage pipeline, the number of leading zero is pre-estimated, simultaneously the sign of the final result is synchronously pre-estimated, and finally first stage normalized shift is performed; in the fourth stage pipeline, second normalized shift is performed first, and then addition and a part of half adjust are performed; in the last stage pipeline, addition and half adjust are completed, exponential terms are amended, and third stage normalized shift is completed in the spacing of the half adjust. The invention has the advantages that high performance and high precision are realized in the condition of low hardware cost.
Owner:TSINGHUA UNIV

Single-accuracy matrix multiplication optimization method based on loongson chip 3A

InactiveCN102214160AImprove computing efficiencyOvercoming the problem of invalid prefetchingComplex mathematical operationsData setParallel computing
The invention discloses a single-accuracy matrix multiplication optimization method based on a loongson chip 3A. The method is characterized by comprising the following steps of: dividing two single-accuracy source matrixes of the loongson chip 3A into two sub matrixes according to a principle that the two single-accuracy source matrixes are less than or equal to a half of a one-level cache and less than or equal to a half of a second-level cache; and pre-fetching data by using a 128-bit access instruction and a concurrent single-accuracy floating point instruction of the loongson chip 3A in a matrix multiplication core computation code of a 32-bit access instruction, a single-accuracy floating point multiplication-addition instruction and a pre-fetching instruction of the loongson chip 3A and using a pre-fetching address calculation mode of subtracting the size of an operation data CDS from the first address CACAS of an operation data set, so that a floating point operation part can basically operate at full load. By the method, the problem of invalid pre-fetching of address-non-aligned data is solved, and the executive efficiency of an address-non-aligned single-accuracy matrix multiplication is approximate to that of an address-aligned single-accuracy matrix multiplication. Compared with a basic linear algebra subprogram library (GotoBLAS) version 2-1.07, the single-accuracy matrix multiplication which is optimized by the method provided by the invention has the advantage that: an operation speed is averagely improved by above 90 percent.
Owner:UNIV OF SCI & TECH OF CHINA

Implementation method of floating point multiply-accumulate unit low in power consumption and high in huff and puff

The invention discloses an implementation method of floating point multiply-accumulate unit low in power consumption and high in huff and puff. The implementation method of the floating point multiply-accumulate unit low in power consumption and high in huff and puff comprises the following steps: 1, when a vector point multiplication operation is calculated, a pair of operating number A and operating number B are input in each period from N periods, and a floating-point multiplication operation of the operating number A and the operating number B is operated by a first level production line, a second level production line and a third level production line; 2, product is transformed in a weight mode in a fourth level production line, mantissa bit width is increased, and index number bit width is reduced; 3, accumulation operation is conducted on the transformed product in a fifth level production line, the product is accumulated for one time in each period; 4, recovering of the product is conducted by a sixth level production line and a seventh level production line, and an eventual accumulated result is output at a N plus 6 period. The implementation method of the floating point multiply-accumulate unit low in power consumption and high in huff and puff can complete the vector point multiplication operation of random length N, multiple accumulation is calculated for one time in each period, thereby avoiding frequent access operation of a register of a processor. The operation can be completed with N plus 6 periods, single precision and double precision floating point number are compatible, and power consumption of floating point operation is effectively reduced.
Owner:ZHEJIANG UNIV

64-bit floating-point multiply accumulator and method for processing flowing meter of floating-point operation thereof

The invention discloses a 64-bit floating-point multiply accumulator and a method for processing the flowing meter of floating-point operation thereof. A first index processing unit of the multiply accumulator is used for calculating the index difference in floating-point multiplication-addition and floating-point multiplication operations; a first symbol processing unit is used for judging the symbol of results of the floating-point multiplication-addition and floating-point multiplication operations and judging whether to conduct effective subtraction; a second index processing unit thereof is used for processing the index of operands when only the addition operations are conducted; a second symbol processing unit is used for processing the symbol of operands when only the addition operations are conducted; and an index and symbol selector thereof is used for selecting the results of the first index processing unit and the first symbol processing unit or selecting the results of the second index processing unit and the second symbol processing unit, and judging the index difference d, wherein if d is equal to 0,1, 2, or minus 1 and the valid subtraction is conducted, the operations are conducted through a CLOSE path, and if not, the operations are conducted through an FAR path, so as to reduce the time delay of the multiply accumulator.
Owner:LOONGSON TECH CORP

Approximate floating-point multiplier for neural network processor and floating-point multiplication

The invention discloses an approximate floating-point multiplier for a neural network processor and a floating-point multiplication. When the approximate floating-point multiplier executes fractional part multiplying operation on an operand, part bits are intercepted from all high bits of a fractional part of the operand according to designated precision, and 1 is supplemented to the front and the back of the intercepted part bits to obtain two new fractional parts; multiplying operation is performed on the two new fractional parts to obtain an approximate fractional part of a product; and zero is supplemented to a low bit of the normalized approximate fractional part so that the bits of the approximate fractional part are consistent with the bits of the fractional part of the operand, and therefore the fractional part of the product is obtained. According to the approximate floating-point multiplier, an approximate calculation mode is adopted, different bits of the fractional part are intercepted according to a precision demand for corresponding multiplying operation, energy loss of multiplying operation is lowered, multiplying operation speed is increased, and therefore the performance of a neural network processing system is more efficient.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Discrete cosine inverse transformation method and its device

InactiveCN101047849AGuarantee the accuracy of the transformation resultSpeed ​​up transform processingPulse modulation television signal transmissionDigital video signal modificationFloating point multiplicationComputer science
This invention discloses a scattered cosine inverse transformation method including: carrying out amplifying, correcting and rounding process to the transformed coefficients input initially in the process of scattered cosine inverse transformation to digital signals based on AAN algorithm to substitute corresponded floating-point multiplication operation appearing in the transforming process based on a fixed-point shift accumulation operation and carrying out corresponding reduction process to the final inverse transformation results. This invention also discloses a device.
Owner:HUAWEI TECH CO LTD

Data processing apparatus and method for performing floating point multiplication

A data processing apparatus and method are provided for multiplying first and second n-bit significands of first and second floating point operands to produce an n-bit result. The data processing apparatus comprises multiplier logic operable to multiply the first and second n-bit significands to produce a pair of 2n-bit vectors. Half adder logic is then arranged to produce a plurality of carry and sum bits representing a corresponding plurality of most significant bits of the pair of 2n-bit vectors. The first adder logic then performs a first sum operation in order to generate a first rounded result equivalent to the addition of the pair of 2n-bit vectors with a rounding increment injected at a first predetermined rounding position appropriate for a non-overflow condition. To achieve this, the first adder logic uses as the m most significant bits of the pair of 2n-bit vectors the corresponding m carry and sum bits, the least significant of the m carry bits being replaced with a rounding increment value prior to the first adder logic performing the first sum operation. Second adder logic is arranged to perform a second sum operation in order to generate a second rounded result equivalent to the addition of the pair of 2n-bit vectors with a rounding increment injected at a second predetermined rounding position appropriate for an overflow condition. To achieve this, the second adder logic uses as the m-1 most significant bits of the pair of 2n-bit vectors the corresponding m−1 carry and sum bits, with the least significant of the m−1 carry bits being replaced with the rounding increment value prior to the second adder logic performing the second sum operation. The required n-bit result is then derived from either the first rounded result or the second rounded result. The data processing apparatus takes advantage of a property of the half adder form to enable a rounding increment value to be injected prior to performance of the first and second sum operations without requiring full adders to be used to inject the rounding increment value.
Owner:ARM LTD

Fusion processing device and method for floating-point number multiplication-addition device

The invention provides a fusion processing device and a fusion processing method for a floating-point number multiplication-addition device. The method comprises the following steps of: inputting real parts and imaginary parts of a multiplier and a multiplicand of a floating point complex number into floating-point multiplication modules M0 and M1, and performing floating-point multiplication operation, wherein output results present products by using a carry bit and a partial sum; inputting the products into a floating-point addition module A2, and performing floating-point addition operation, wherein the output results present addition operation by using the carry bit and the partial sum; inputting the output results which present the addition operation into floating-point addition modules A0 and A1 simultaneously; inputting addends input from the outside into the floating-point addition modules A0 and A1, and performing floating-point addition operation; and outputting operation results. The device and the method can be better applied to butterfly computation of Fourier transform; and by the device and the method, operation steps can be simplified, hardware resources are easy to save, and the multiplication-addition operation of the floating point complex number is realized by less resources.
Owner:SANECHIPS TECH CO LTD

Physical layer multicast and multithread transmission method based on combined block triangularization

The invention discloses a physical layer multicast and multithread transmission method based on combined block triangularization. The physical layer multicast and multithread transmission method is characterized in that a block diagonal precoding matrix of a physical layer multicast wireless scene suitable for a single launching base station and two receiving users is iterated and constructed through a combined triangularization decomposition algorithm by using of a matrix block computational thinking. Compared with an existing combined triangularization decomposition algorithm, the physical layer multicast and multithread transmission method based on the combined block triangularization can not only reduce reduce floating-point multiplication times participating in computing when the block diagonal precoding matrix is used for data transmission, but also obtain better system error code performance, is suitable for is suitable for an environment of receive-transmit antenna of a large scale and a frequency selectivity fading wireless communication channel in a system, and can be conveniently implemented in a new-generation broadband wireless and mobile communication system using a multiple-input-multiple-output technique.
Owner:UNIV OF SCI & TECH OF CHINA

Floating-point complex multiplier

The floating point complex multiplier comprises a data interface, a floating addition unit, a floating deduction unit and four floating point multiplication unit. The invented floating point complex multiplier integrates high frequency land wave radar overall digital radar receiver, realizing the floating point complex multiplication of the timely digital pulse compression, improving the data processing speed and precision. It is high in reliability, good universality, small in size and low in cost.
Owner:WUHAN UNIV

Arithmetic device and arithmetic method

An FMA arithmetic unit has a timing control circuit. The timing control circuit controls bypass selectors to bypass intermediate resisters on performing floating point addition / subtraction, controls another bypass selector to bypass another intermediate register on performing floating point multiplication, and controls still another bypass selectors to bypass a register file / other arithmetic unit result register and operand registers on performing successive FMA arithmetic operations.
Owner:FUJITSU LTD

System and method of floating point multiply operation processing

A processor includes an integer multiplier configured to execute an integer multiply instruction to multiply significand bits of at least one floating point operand of a floating point multiply operation. The processor also includes a floating point multiplier configured to execute a special purpose floating point multiply accumulate instruction with respect to an intermediate result of the floating point multiply operation and the at least one floating point operand to generate a final floating point multiplication result.
Owner:QUALCOMM INC

Reconfigurable floating point multiply-add operation unit and method suitable for multi-precision calculation

The invention discloses a reconfigurable floating point multiply-add operation unit and method suitable for multi-precision calculation, and the method comprises the steps: dividing mantissas of floating points with different precision through employing a unified method, and obtaining a plurality of bit segments; and calling different numbers of same-type unit multipliers to achieve multiplication of a plurality of bit segments in one period and output corresponding products, and then performing shift addition operation on the products to obtain a multiply-accumulate operation result of the floating-point number. The problem of bit redundancy is avoided by adopting a unified mantissa division scheme, the hardware utilization rate is improved by adopting a unified unit multiplier, and the multiply-accumulate operation of half-precision floating-point numbers, the multiply-accumulate operation of single-precision dot product floating-point numbers and the multiply-accumulate operation of double-precision floating-point numbers can be achieved. The problems of bit redundancy, low hardware utilization rate and the like of an operation method supporting multi-precision floating point multiplication in the prior art are solved.
Owner:SOUTH UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA

Parallel processor for efficient processing of mobile multimedia

Provided is a parallel processor for supporting a floating-point operation. The parallel processor has a flexible structure for easy development of a parallel algorithm involving multimedia computing, requires low hardware cost, and consumes low power. To support floating-point operations, the parallel processor uses floating-point accumulators and a flag for floating-point multiplication. Using the parallel processor, it is possible to process a geometric transformation operation in a 3-dimensional (3D) graphics process at low cost. Also, the cost of a bus width for instructions can be minimized by a partitioned Single-Instruction Multiple-Data (SIMD) method and a method of conditionally executing instructions.
Owner:ELECTRONICS & TELECOMM RES INST

Decomposed floating point multiplication

Systems, apparatuses and methods may provide for technology that in response to an identification that one or more hardware units are to execute on a first type of data format, decomposes a first original floating point number to a plurality of first segmented floating point numbers that are to be equivalent to the first original floating point number. The technology may further in response to the identification, decompose a second original floating point number to a plurality of second segmented floating point numbers that are to be equivalent to the second original floating point number. The technology may further execute a multiplication operation on the first and second segmented floating point numbers to multiply the first segmented floating point numbers with the second segmented floating point numbers.
Owner:INTEL CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products