Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

171 results about "Very long instruction word" patented technology

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

Local and global register partitioning in a vliw processor

A Very Long Instruction Word (VLIW) processor having a plurality of functional units includes a multi-ported register file that is divided into a plurality of separate register file segments, each of the register file segments being associated to one of the plurality of functional units. The register file segments are partitioned into local registers and global registers. The global registers are read and written by all functional units. The local registers are read and written only by a functional unit associated with a particular register file segment. The local registers and global registers are addressed using register addresses in an address space that is separately defined for a register file segment / functional unit pair. The global registers are addressed within a selected global register range using the same register addresses for the plurality of register file segment / functional unit pairs. The local registers in a register file segment are addressed using register addresses in a local register range outside the global register range that are assigned within a single register file segment / functional unit pair. Register addresses in the local register range are the same for the plurality of register file segment / functional unit pairs and address registers locally within a register file segment / functional unit pair.
Owner:ORACLE INT CORP

Method and system for fast context based adaptive binary arithmetic coding

InactiveUS20070040711A1Increasing instruction level parallelismReduces function call overheadCode conversionCharacter and pattern recognitionProcedure callsContext-adaptive variable-length coding
A method for efficient and fast implementation of context-based adaptive binary arithmetic encoding in H.264 / AVC video encoders is disclosed. The H.264 / AVC video standard supports two entropy coding mechanisms. These include Context Adaptive Binary Arithmetic Coding (CABAC) and Context Adaptive Variable Length Coding (CAVLC). The entropy coding efficiency of CABAC exceeds that of CAVLC by a clear margin. The method further provides techniques that make the implementation of CABAC on digital signal processors (DSPs) and other processing devices significantly faster. In one aspect, the method increases decoupling between the binarization process and the arithmetic encoding process from bit level to single or multiple syntax element(s) level. The binarized data is provided to the arithmetic encoding engine in bulk, thereby significantly reducing the overhead due to procedure calls. In another aspect, a CABAC arithmetic encoding engine format is provided which decreases data writing overhead and better exploits parallelism in the encoding process. This aspect is particularly advantageous to, for example, very long instruction word (VLIW) DSPs and media processors. In yet another aspect, the method discloses efficient CABAC binarization schemes for syntax elements.
Owner:STREAMING NETWORKS PVT

Processing unit for efficiently determining a packet's destination in a packet-switched network

A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.
Owner:CISCO TECH INC

Method for implementing advanced encryption standards using a very long instruction word architecture processor

A method for implementing Advanced Encryption Standards (AES) by a very long instruction word (VLIW) architecture processor. The method includes inputting the instructions for AES into the processor, decoding and scheduling the input instructions, controlling at least one of a plurality of multiplexers to output data from a first register of the processor and / or an arithmetic logic unit to the first register and / or the arithmetic logic unit according to the decoded and scheduled instructions, controlling the arithmetic logic unit to perform operations, and outputting results of the operations to the plurality of the multiplexers.
Owner:ADMTEK INCORPORATED

Processor having systolic array pipeline for processing data packets

A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.
Owner:CISCO TECH INC

Very long instruction word processor structure supporting simultaneous multithreading

The invention provides a very long instruction word processor structure supporting simultaneous multithreading, which comprises at least two parallel instruction processing pipeline structures, wherein each instruction processing pipeline structure comprises an instruction obtaining module, an instruction distribution module and an instruction executing module, a general register file, a floating point register file and a control register file, the instruction obtaining module is used for obtaining instruction information, the instruction distribution module is used for receiving and distributing the instruction information obtained by the instruction obtaining module, and the instruction executing module comprises instruction executing units A, D, M and F which are used for executing the instruction information, the general register file is used for storing executing results of the corresponding executing units A, M and D, and the floating point register file is used for storing executing results of the corresponding executing units D and F. Through the structure, the resources of a processor can be more sufficiently utilized, the threading access efficiency is enhanced, and the processing speed of the processor is improved.
Owner:TSINGHUA UNIV

Loop deblock filtering of block coded video in a very long instruction word processor

This invention is applicable to filtering block artifacts of macroblock and block oriented video compression. This invention computes all possible filter results speculatively and simultaneously in parallel, computes conditions for application of corresponding filter results simultaneously in parallel, and writes filter results to memory conditionally dependent upon computed corresponding conditions. This invention permits effective block filtering on a very long instruction word data processor.
Owner:TEXAS INSTR INC

Register files for a digital signal processor operating in an interleaved multi-threaded environment

A processor device is disclosed and includes a memory and a sequencer that is responsive to the memory. The sequencer supports very long instruction word (VLIW) type instructions and at least one VLIW instruction packet uses a number of operands during execution. The processor device further includes a plurality of instruction execution units responsive to the sequencer and a plurality of register files. Each of the plurality of register files includes a plurality of registers and the plurality of register files are coupled to the plurality of instruction execution units. Further, each of the plurality of register files includes a number of data read ports and the number of data read ports of each of the plurality of register files is less than the number of operands used by the at least one VLIW instruction packet.
Owner:QUALCOMM INC

Computer processing architecture having a scalable number of processing paths and pipelines

A processing core comprising R-number of processing pipelines each comprising N-number of processing paths. Each of the R-number of processing pipelines are synchronized together to operate as a single very long instruction word (VLIW) processing core. The VLIW processing core is configured to process R×N-number of VLIW sub-instructions in parallel. In addition, the R-number of pipelines can be configured to operate independently as separately operating pipelines. In accordance with one embodiment of the present invention, each of the R-number of processing pipelines comprises S-number of register files, such that the processing core comprises R×S-number of register files. In accordance with another embodiment of the present invention, each of the R-number of processing pipelines comprises one register file for every two of the N-number of processing paths, such that S=N / 2. In accordance with yet another embodiment of the invention, a single VLIW processing instruction comprises R×N-number of P-bit sub-instructions appended together.
Owner:ORACLE INT CORP

Split Embedded DRAM Processor

A processing architecture includes a first CPU core portion coupled to a second embedded dynamic random access memory (DRAM) portion. These architectural components jointly implement a single processor and instruction set. Advantageously, the embedded logic on the DRAM chip implements the memory intensive processing tasks, thus reducing the amount of traffic that needs to be bussed back and forth between the CPU core and the embedded DRAM chips. The embedded DRAM logic monitors and manipulates the instruction stream into the CPU core. The architecture of the instruction set, data paths, addressing, control, caching, and interfaces are developed to allow the system to operate using a standard programming model. Specialized video and graphics processing systems are developed. Also, an extended very long instruction word (VLIW) architecture implemented as a primary VLIW processor coupled to an embedded DRAM VLIW extension processor efficiently deals with memory intensive tasks. In different embodiments, standard software can be accelerated either with or without the express knowledge of the processor.
Owner:ROUND ROCK RES LLC

Methods and apparatus for indirect VLIW memory allocation

Techniques and a set of heuristics are described to perform allocation of the special instruction memory where indirect very long instruction words (VLIW's) are stored for the ManArray family of multiprocessor digital signal processors (DSP). This approach substantially reduces the cost of pre-initializing the contents of VLIWs.
Owner:ALTERA CORP

Method of context based adaptive binary arithmetic encoding with decoupled range re-normalization and bit insertion

ActiveUS20050001745A1Increases the available instruction level parallelism (IPC)Improve performanceDigital computer detailsCode conversionCouplingVariable length
This invention increases the available instruction level parallelism (IPC) of CABAC encoding by decoupling the re-normalization loop and the bit-insertion task required to create the encoded bit-stream. This makes all software implementations of CABAC based encoding significantly faster on digital signal processors that can exploit instruction level parallelism such as very long instruction word (VLIW) digital signal processors. In a joint hardware / software implementation, this invention employs existing Huffman variable length encoding hardware with minimum modifications. The de-coupling of these two tasks of this invention exposes previously hidden underlying instruction level parallelism and task level parallelism.
Owner:TEXAS INSTR INC

Method and apparatus for splitting packets in multithreaded VLIW processor

A method and apparatus are disclosed for allocating functional units in a multithreaded very large instruction word (VLIW) processor. The present invention combines the techniques of conventional very long instruction word architectures and conventional multithreaded architectures to reduce execution time within an individual program, as well as across a workload. The present invention utilizes instruction packet splitting to recover some efficiency lost with conventional multithreaded architectures. Instruction packet splitting allows an instruction bundle to be partially issued in one cycle, with the remainder of the bundle issued during a subsequent cycle. The allocation hardware assigns as many instructions from each packet as will fit on the available functional units, rather than allocating all instructions in an instruction packet at one time. Those instructions that cannot be allocated to a functional unit are retained in a ready-to-run register. On subsequent cycles, instruction packets in which all instructions have been issued to functional units are updated from their thread's instruction stream, while instruction packets with instructions that have been held are retained. The functional unit allocation logic can then assign instructions from the newly-loaded instruction packets as well as instructions that were not issued from the retained instruction packets.
Owner:LUCENT TECH INC +1

Selective vertical and horizontal dependency resolution via split-bit propagation in a mixed-architecture system having superscalar and VLIW modes

A computer system supplies instructions simultaneously to a plurality of parallel execution pipelines in either superscalar mode or very long instruction word mode with checks for vertical and horizontal dependency between instructions, the horizontal dependency checks between instructions supplied in the same machine cycle being effective in superscalar mode but disabled in very long instruction word mode.
Owner:STMICROELECTRONICS SRL

Apparatus and method for dispatching very long instruction word having variable length

Apparatus and method for dispatching a very long instruction word (VLIW) instruction having a variable length are provided. The apparatus for dispatching a VLIW instruction includes a packet buffer for storing at least one or more VLIW instructions, and a decoding unit configured to constitute a VLIW instruction to be currently executed among the VLIW instructions stored in the packet buffer and decode predetermined bits of each sub-instruction contained in the VLIW instruction. The apparatus dispatches a corresponding sub-instruction to an FU which corresponds to each sub-instruction, based on the results of decoding performed in the decoding unit, position information on the sub-instructions that are placed on the packet buffer, and position information on the sub-instructions that are placed in the current VLIW instruction. Sub-instructions can be effectively dispatched to corresponding FUs using simple decoding logic even in a case where the length of the VLIW instruction is not fixed.
Owner:SAMSUNG ELECTRONICS CO LTD

Processor having systolic array pipeline for processing data packets

A processor for use in a router, the processor having a systolic array pipeline for processing data packets to determine to which output port of the router the data packet should be routed. In one embodiment, the systolic array pipeline includes a plurality of programmable functional units and register files arranged sequentially as stages, for processing packet contexts (which contain the packet's destination address) to perform operations, under programmatic control, to determine the destination port of the router for the packet. A single stage of the systolic array may contain a register file and one or more functional units such as adders, shifters, logical units, etc., for performing, in one example, very long instruction word (vliw) operations. The processor may also include a forwarding table memory, on-chip, for storing routing information, and a cross bar selectively connecting the stages of the systolic array with the forwarding table memory.
Owner:CISCO TECH INC

Clustered architecture in a VLIW processor

A Very Long Instruction Word (VLIW) processor has a clustered architecture including a plurality of independent functional units and a multi-ported register file that is divided into a plurality of separate register file segments, the register file segments being individually associated with the plurality of independent functional units. The functional units access the respective associated register file segments using read operations that are local to the functional unit / register file segment pairs. In contrast, the functional units access the register file segments using write operations that are broadcast to a plurality of register file segments. Independence between clusters is attained since the separate clustered functional unit / register file segment pairs have local (internal) bypassing that allows internal computations to proceed, but have only limited bypassing between different functional unit / register file segment pair clusters. Thus a particular functional unit / register segment pair does not bypass to all other functional unit / register segment pairs.
Owner:ORACLE INT CORP

Artificial intelligent type hydraulic support electric-hydraulic control system

The invention discloses an artificial intelligent type hydraulic support electric-hydraulic control system which comprises a hardware layer, a system layer, a specialist system and a data base. The system conducts communication through a communication bus and adopts a full duplex signal. The system layer comprises management layer Agents and control layer Agents, and each hydraulic support corresponds to a control layer Agent. The specialist system is connected with a communication bus signal, and the data base stores sensing parameters of three machines and is in signal connection with the communication bus. The artificial intelligent type hydraulic support electric-hydraulic control system has the advantages that the system layer calls the current sensing parameters of the three machines saved by the data base, ultra long instruction words are formed finally by calling the sensing parameters of the three machines through the specialist system, the sensing parameters are transmitted to each hydraulic support controller through the communication bus, the control layer Agents make an automatic movement decision on data information, the management layer Agents coordinate movements of the control layer Agents, and intellectualization of the hydraulic support electric-hydraulic control system of a whole fully mechanized coal mining face is achieved.
Owner:CHINA UNIV OF MINING & TECH +1

Super-reconfigurable fabric architecture (SURFA): a multi-FPGA parallel processing architecture for COTS hybrid computing framework

A field programmable gate array includes a virtual bus interface that receives a control word from a host processor over a standard I / O bus. A configurable very long instruction word (VLIW) controller receives the control word via virtual bus interface signals mapped from the virtual bus interface. A reconfigurable communication and control fabric controls the data paths and programming modes of single instruction-multiple data (SIMD) processing element cells. The configurable VLIW controller has an interface with the reconfigurable communication and control fabric. SIMD processing element cells are controlled by the configurable VLIW controller through the reconfigurable communication and control fabric via the interface.
Owner:THE BOEING CO

Methods and apparatus for dynamic very long instruction word sub-instruction selection for execution time parallelism in an indirect very long instruction word processor

A pipelined data processing unit includes an instruction sequencer and n functional units capable of executing n operations in parallel. The instruction sequencer includes a random access memory for storing very-long-instruction-words (VLIWs) used in operations involving the execution of two or more functional units in parallel. Each VLIW comprises a plurality of short-instruction-words (SIWs) where each SIW corresponds to a unique type of instruction associated with a unique functional unit. VLIWs are composed in the VLIW memory by loading and concatenating SIWs in each address, or entry. VLIWs are executed via the execute-VLIW (XV) instruction. The iVLIWs can be compressed at a VLIW memory address by use of a mask field contained within the XV1 instruction which specifies which functional units are enabled, or disabled, during the execution of the VLIW. The mask can be changed each time the XV1 instruction is executed, effectively modifying the VLIW every time it is executed. The VLIW memory (VIM) can be further partitioned into separate memories each associated with a function decode-and-execute unit. With a second execute VLIW instruction XV2, each functional unit's VIM can be independently addressed thereby removing duplicate SIWs within the functional unit's VIM. This provides a further optimization of the VLIW storage thereby allowing the use of smaller VLIW memories in cost sensitive applications.
Owner:ALTERA CORP

Long instruction word controlling plural independent processor operations

A data processing apparatus including a multiplier unit forming a product from L bits of each two data buses of N bits each N is greater than L. The multiplier forms a N bit output having a first portion which is the L most significant bits of the of product and a second portion which is M other bits not including the L least significant bits of the product, where N is the sum of M and L. In the preferred embodiment the M other bits are derived from other bits of the two input data busses, such as the M other bits of the first input data bus. An arithmetic logic unit performs parallel operations (addition, subtraction, Boolean functions) controlled by the same instructions. This arithmetic logic unit is divisible into a selected number of sections for performing identical operations on independent sections of its inputs. The multiplier unit may form dual products from separate parts of the input data. A single instruction controlling both the multiplier unit and the arithmetic logic unit permits addition of dual products. The dual products are temporarily stored in a data register permitting the multiply and add operations to be pipelined. The dual products are formed in one data word and added by a rotate / mask and add operation in a three input arithmetic unit.
Owner:TEXAS INSTR INC

Systems and Methods for Context Adaptive Video Data Preparation

Systems and methods for encoding and decoding video image data are included. In some cases, the methods are tailored for highly parallel operation on a very long instruction word processor. Various of the embodiments may be implemented in relation to H.264 / MPEG-4 AVC video compression standard.
Owner:TEXAS INSTR INC

Controlling VLIW instruction operations supply to functional units using switches based on condition head field

A VLIW processor for executing a sequence of very long instruction words having a plurality of operations to be executed in parallel. The VLIW processor has a plurality of functional units for parallel execution of the operations specified by the VLIW, an instruction register for holding the VLIW, and a condition flag for indicating the results of a comparison operation. The VLIW includes a conditional head and a plurality of slots, each slot including an operational code and any related operands. The conditional head has a plurality of conditional indicators, each conditional indicator uniquely corresponding to one operation and specifying a condition in which the operation is to be executed if the indicated condition exists. A control circuit is connected to the instruction register and the functional units to deliver the operation from the instruction register to the corresponding functional unit for execution when the condition exists.
Owner:NOVATEK MICROELECTRONICS CORP

Instruction processing method for verifying basic instruction arrangement in VLIW instruction for variable length VLIW processor

An instruction processing method for checking an arrangement of basic instructions in a very long instruction word (VLIW) instruction, suitable for language processing systems, an assembler and a compiler, used for processors which execute variable length VLIW instructions designed based on variable length VLIW architecture.
Owner:FUJITSU LTD

Methods and apparatus for efficient synchronous MIMD operations with IVLIW PE-TO-PE communication

A SIMD machine employing a plurality of parallel processor (PEs) in which communications hazards are eliminated in an efficient manner. An indirect Very Long Instruction Word instruction memory (VIM) is employed along with execute and delimiter instructions. A masking mechanism may be employed to control which PEs have their VIMs loaded. Further, a receive model of operation is preferably employed. In one aspect, each PE operates to control a switch that selects from which PE it receives. The present invention addresses a better machine organization for execution of parallel algorithms that reduces hardware cost and complexity while maintaining the best characteristics of both SIMD and MIMD machines and minimizing communication latency. This invention brings a level of MIMD computational autonomy to SIMD indirect Very Long Instruction Word (iVLIW) processing elements while maintaining the single thread of control used in the SIMD machine organization. Consequently, the term Synchronous-MIMD (SMIMD) is used to describe the present approach.
Owner:ALTERA CORP

Apparatus for compressing instruction word for parallel processing vliw computer and method for the same

An apparatus and a method are provided for a parallel processing very long instruction word (VLIW) computer. The apparatus includes: an index code generation unit sequentially generating an index code, which is associated with a number of no operation (NOP) instruction word between effective instruction words, with respect to each of instruction word groups to be executed in a VLIW computer; an instruction compression unit sequentially deleting the NOP instruction word which corresponds to the index code with respect to each of instruction word groups; and an instruction word conversion unit converting the effective instruction words to include the index code, the effective instruction words corresponding to the NOP instruction words.
Owner:SAMSUNG ELECTRONICS CO LTD

Instruction compressing apparatus and method

An instruction compressing apparatus and method for a parallel processing computer such as a very long instruction word (VLIW) computer, are provided. The instruction compressing apparatus includes a bundle code generating unit, an instruction compressing unit, and an instruction converting unit. The bundle code generating unit may generate a bundle code in response to an input of instructions to be compressed. The bundle code may indicate whether a current instruction group is terminated, and also whether an instruction group following the current instruction group is a no-operation (NOP) instruction group. The instruction compressing unit may remove a NOP instruction and / or a NOP instruction group from the input instructions according to the generated bundle code. The instruction converting unit may include the generated bundle code in the remaining instructions which have not been removed by the instruction compressing unit.
Owner:SAMSUNG ELECTRONICS CO LTD

Processor device and loop processing method thereof

InactiveCN102508635AAchieving Zero Overhead for LoopsImprove performanceMachine execution arrangementsLoop controlVolume body
The invention discloses a VLIW (Very Long Instruction Word) processor device and a loop processing method thereof. The VLIW processor device comprises a loop unit, an address sending unit and an instruction decoding unit, wherein the loop unit comprises a loop volume data calculating module, a loop counting module, a memory module and an instruction fetching address calculating module. The loop processing method comprises the following steps of: obtaining a loop mark instruction; extracting a loop parameter carried in the loop mark (LP) instruction; obtaining and storing loop volume data according to the address of the loop mark instruction and the loop parameter; taking the stored loop volume body data as current loop volume data; obtaining and executing the instruction according to a current instruction fetching address; and obtaining a next instruction fetching address according to the current instruction fetching address, and obtaining the current instruction fetching address by comparing the next instruction fetching address with the loop volume data. The problems that the loop control of a VLIW processor cannot be completely realized by hardware and the loop execution expense is high are solved, therefore, the performance of the VLIW processor is greatly increased.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products