Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

44 results about "Simd architecture" patented technology

SIMD Defined. The SIMD architecture performs a single, identical action simultaneously on multiple data pieces, including retrieving, calculating or storing information. One example is retrieving multiple files at the same time.

Structured programming control flow using a disable mask in a SIMD architecture

One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.
Owner:NVIDIA CORP

Efficient de-quantization in a digital video decoding process using a dynamic quantization matrix for parallel computations

An efficient digital video (DV) decoder process that utilizes a specially constructed quantization matrix allowing an inverse quantization subprocess to perform parallel computations, e.g., using SIMD processing, to efficiently produce a matrix of DCT coefficients. The present invention utilizes a first look-up table (for 8x8 DCT) which produces a 15-valued quantization scale based on class number information and a QNO number for an 8x8 data block ("data matrix") from an input encoded digital bit stream to be decoded. The 8x8 data block is produced from a deframing and variable length decoding subprocess. An individual 8-valued segment of the 15-value output array is multiplied by an individual 8-valued segment, e.g., "a row," of the 8x8 data matrix to produce an individual row of the 8x8 matrix of DCT coefficients ("DCT matrix"). The above eight multiplications can be performed in parallel using a SIMD architecture to simultaneously generate a row of eight DCT coefficients. In this way, eight passes through the 8x8 block are used to produce the entire 8x8 DCT matrix, in one embodiment consuming only 33 instructions per 8x8 block. After each pass, the 15-valued output array is shifted by one value position for proper alignment with its associated row of the data matrix. The DCT matrix is then processed by an inverse discrete cosine transform subprocess that generates decoded display data. A second lookup table can be used for 2x4x8 DCT processing.
Owner:SONY ELECTRONICS INC +1

Processing an indirect branch instruction in a SIMD architecture

One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is an indirect branch instruction, and processing the indirect branch instruction as a sequence of two-way branches to execute an indirect branch instruction with multiple branch addresses. Indirect branch instructions may be used to allow greater flexibility since the branch address or multiple branch addresses do not need to be determined at compile time.
Owner:NVIDIA CORP

Efficient Code Generation Using Loop Peeling for SIMD Loop Code with Multiple Misaligned Statements

An approach is provided for vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores. In this framework, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirements of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residual iteration counts, and multiple statements with arbitrary alignment combinations. Loop peeling is used to reduce the computational overhead associated with misaligned data. A loop prologue and epilogue are peeled from individual iterations in the simdized loop, and vector-splicing instructions are applied to the peeled iterations, while the steady-state loop body incurs no additional computational overhead.
Owner:INT BUSINESS MASCH CORP

Multi-standard LDPC encoder circuit base on SIMD architecture

The invention provides a multi-standard low density parity check (LDPC) encoder circuit base on a single instruction multiple data (SIMD) architecture. The LDPC encoder circuit comprises an input buffer unit, a master controller, an instruction memory, an intrinsic information memory, a posterior information memory, an external information memory, a parity check and output buffer unit and a processing unit array, wherein the processing unit array is composed of a plurality of concurrent processing units, and the processing unit adopts very large scale integrated circuits (VLSI) hardware architecture. The encoder adopts a novel two-phase message passing (TPMP) decoding algorithm, ensures that the hardware architecture is not limited by a special architecture of a block matrix, and realizes the separation of the hardware architecture and the block LDPC code check matrix architecture. The invention provides a flexible and configurable design circuit of the processing unit, effectively improves the use ratio of the hardware, reduces design area of chips, provides a dedicated and simplified SIMD instruction set which is suitable for various block LDPC codes, realizes the separation of the hardware architecture and the block LDPC code check matrix architecture, and meets the demands of multi-standard communication.
Owner:FUDAN UNIV

Dynamic voice allocation in a vector processor based audio processor

InactiveUS20060155543A1Reduced processor resourceAvoid system overagesSpeech synthesisMusic synthesisSpeech sound
A method dynamically allocating voices to processor resources in a music synthesizer or other audio processor includes utilizing processor resources to execute vector-based voice generation algorithm for sounding voices, such as executed using SIMD architecture processors or other vector processor architectures. The dynamic voice allocation process identifies a new voice to be executed in response to an event. The combined processor resources needed to be allocated for the new voice and for the currently sounding voices are determined. If the processor resources are available to meet the combined need, then processor resources are allocated to a voice generation algorithm for the new voice, and if the processor resources are not available, then voices are stolen. To steal voices, processor resources are de-allocated from at least one sounding voice or sounding voice cluster.
Owner:KORG

Fast, energy-efficient exponential computations in simd architectures

In one embodiment, a computer-implemented method includes receiving as input a value of a variable x and receiving as input a degree n of a polynomial function being used to evaluate an exponential function ex. A first expression A*(x−ln(2)*Kn(xf))+B is evaluated, by one or more computer processors in a single instruction multiple data (SIMD) architecture, as an integer and is read as a double. In the first expression, Kn(xf) is a polynomial function of the degree n, xf is a fractional part of x / ln(2), A=252 / ln(2), and B=1023*252. The result of reading the first expression as a double is returned as the value of the exponential function with respect to the variable x.
Owner:IBM CORP

Distributed stacking data storage method supporting SIMD system structure

The invention discloses a distributed stacking data storage method supporting an SIMD system structure. Stacking spaces are allocated in an internal storage in a distribution mode, scalar stacks storing scalar information are allocated in a scalar storage, and vector stacks storing vector information are allocated in a vector storage; when a program is compiled, local variables needing to be accessed by scalar units are allocated in the scalar stacks, and local variables needing to be accessed by vector units are allocated in the vector stacks; when the program is operated, the scalar information, needing to be stored, in a program switching site, is stored in the scalar stacks, and vector information, needing to be stored, in a program switching site, is stored in the scalar stacks, and when the program returns on site, the scalar information is directly read from the scalar stacks to the scalar units, and the vector information is directly read from the vector stacks to the vector units. The distributed stacking data storage method supporting the SIMD system structure has the advantages of being high in storing and accessing speed of stacking data, small in bandwidth requirement, high in system performance and low in power consumption.
Owner:NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products