Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

50results about How to "Lower memory access latency" patented technology

Mobile terminal real-time rendering system and method based on cloud platform

The invention discloses a mobile terminal real-time rendering system and method based on a cloud platform, and the method comprises the steps: receiving viewpoint information and interaction information transmitted by a mobile terminal, inquiring and reading a model and scene file, and obtaining three-dimensional scene data; dividing the three-dimensional scene data to obtain three-dimensional scene model group data according to the type of model groups of a three-dimensional scene; storing the three-dimensional scene model group data and carrying out the automatic adjustment of storage positions according to different data demands in different three-dimensional scenes; extracting the three-dimensional scene model group data, building and managing an MIC/GPU rendering task of a three-dimensional scene image, finally obtaining rendering result data of the three-dimensional scene image, and enabling the obtained rendering result data of the three-dimensional scene image to be compressed and then transmitted to the mobile terminal; employing a dynamic load strategy to carry out the deployment and management of the MIC/GPU rendering task, and guaranteeing the load balance of a cloud server. The method consumes the calculation capability and storage space of a mobile client to the minimum degree.
Owner:SHANDONG UNIV

Method and system for supplier-based memory speculation in a memory subsystem of a data processing system

InactiveUS20050132147A1Improvement in average memory access latencyReduce apparent memory access latencyMemory adressing/allocation/relocationConcurrent instruction executionMemory controllerHandling system
A data processing system includes one or more processing cores, a system memory having multiple rows of data storage, and a memory controller that controls access to the system memory and performs supplier-based memory speculation. The memory controller includes a memory speculation table that stores historical information regarding prior memory accesses. In response to a memory access request, the memory controller directs an access to a selected row in the system memory to service the memory access request. The memory controller speculatively directs that the selected row will continue to be energized following the access based upon the historical information in the memory speculation table, so that access latency of an immediately subsequent memory access is reduced.
Owner:IBM CORP

Control system and method for memory access

InactiveUS20100153636A1Reduce memory access latencyRaise bandwidthMemory systemsData bufferPriority setting
A control system for memory access includes a system memory access command buffer, a memory access command parallel processor, a DRAM command controller and a read data buffer. The system memory access command buffer stores plural system memory access commands. The memory access command parallel processor is connected to the system memory access command buffer for fetching and decoding the system memory access commands to plural DRAM access commands, storing the DRAM access commands in DRAM bank command FIFOs, and performing priority setting according to a DRAM bank priority table. The DRAM command controller is connected to the memory access command parallel processor and a DRAM for receiving the DRAM access commands, and sending control commands to the DRAM. The read data buffer is connected to the DRAM command controller and the system bus for storing the read data and rearranging a sequence of the read data.
Owner:SUNPLUS TECH CO LTD

A memory allocation method and serv

The embodiment of the present application discloses a memory allocation method and a server, which are used for reducing performance loss caused by NC delay and improving server performance when memory allocation is performed. A method for embodiment of that present application includes: Server Identification Node Topology Table, The node topology table contains not only the connection relationships between NUMA nodes, but also between the NUMA node and the NC, the connection between NCs, Based on the node topology table, generating a memory access jump table of each NUMA node, The hop table contains not only the number of QPI hops in the shortest path connected to other NUMA nodes, and has an NC hop count, according to the access jump table of each NUMA node, calculating the memory accesspriority of each NUMA node, The number of NC hops is taken as an important parameter in the calculation of memory access priority. The less the number of NC hops, the higher the memory access priority. When a NUMA node applies for memory, the memory is allocated according to the memory access priority table. The higher the priority, the more priority the memory is allocated from the NUMA node corresponding to the priority.
Owner:XFUSION DIGITAL TECH CO LTD

Sparse tensor canonical decomposition method based on data division and calculation distribution

The invention relates to a sparse tensor canonical decomposition method based on data division and task allocation. The sparse tensor canonical decomposition method comprises the following steps: initially, performing multi-stage division and task allocation on a plurality of processing cores on a core group according to the many-core characteristics of an SW processor; initially, performing multi-stage segmentation processing on sparse tensor data; designing a communication strategy aiming at sparse tensor canonical decomposition by utilizing the register communication characteristics of the SW processor SW26010; aiming at the common performance bottleneck of different sparse tensor canonical decomposition methods, namely different requirements (whether tensor elements need to be randomly extracted for calculation) of matrix tensor multiplied by Khatri-Rao product (MTTKRP for short) during specific operation, different calculation schemes of the MTTKRP process are designed by utilizing the characteristics of a SW processor. According to the method, the characteristics of the SW system structure are fully excavated, the calculation requirements of sparse tensor decomposition are fully considered, multiple sparse tensor canonical decomposition calculation methods can be completed on the SW system structure in parallel and efficiently, and dynamic load balance is guaranteed to the maximum extent.
Owner:BEIHANG UNIV

Correction of incorrect cache accesses

The application describes a data processor operable to process data, and comprising: a cache in which a storage location of a data item within said cache is identified by an address, said cache comprising a plurality of storage locations and said data processor comprising a cache directory operable to store a physical address indicator for each storage location comprising stored data; a hash value generator operable to generate a generated hash value from at least some of said bits of said address said generated hash value having fewer bits than said address; a buffer operable to store a plurality of hash values relating to said plurality of storage locations within said cache; wherein in response to a request to access said data item said data processor is operable to compare said generated hash value with at least some of said plurality of hash values stored within said buffer and in response to a match to indicate a indicated storage location of said data item; and said data processor is operable to access one of said physical address indicators stored within said cache directory corresponding to said indicated storage location and in response to said accessed physical address indicator not indicating said address said data processor is operable to invalidate said indicated storage location within said cache.
Owner:ARM LTD +1

AMBA interface circuit

InactiveCN101710310AReduce latency and memory access latencySave resourcesElectric digital data processingEmbedded systemNetwork on
The invention relates to an AMBA interface circuit which is characterized in that 3 FIFOs are arranged in a Master interface circuit, wherein the Writer Data FIFO and the Writer Address FIFO are used for receiving the data and the address from the transmission of master equipment; if the master equipment does not obtain the right to use the bus temporarily, the data or the address can be first written into the Writer Data FIFO or the Writer Address FIFO, and the data or the address can be transmitted after the master equipment obtains the right to use the bus; the Read Data FIFO is used for sending data to the master equipment; when the master equipment is busy, the data from the transmission of a Slave equipment can be stored temporarily in the Read Data FIFO, then the bus can be release, and the data can be transmitted when the master equipment can receive the data. Compared with the prior art, the invention has the advantages that firstly, because the FIFOs are arranged in the Master interface circuit, the running of the master equipment and the slave equipment and the transmission of the data or the address can be made concurrent, and the bus waiting time and the access-memory delay can be can be shortened, secondly, because the FIFOs are arranged in the Master interface circuit, the resource can be saved in the process of the transmitting the data or the address by the master equipment and the slave equipment, and thirdly, the loss of the data can be avoided when the Master interface circuit is used for transmitting the network on the chip.
Owner:EAST CHINA INST OF OPTOELECTRONICS INTEGRATEDDEVICE

Correction of incorrect cache accesses

The application describes a data processor operable to process data, and comprising: a cache in which a storage location of a data item within said cache is identified by an address, said cache comprising a plurality of storage locations and said data processor comprising a cache directory operable to store a physical address indicator for each storage location comprising stored data; a hash value generator operable to generate a generated hash value from at least some of said bits of said address said generated hash value having fewer bits than said address; a buffer operable to store a plurality of hash values relating to said plurality of storage locations within said cache; wherein in response to a request to access said data item said data processor is operable to compare said generated hash value with at least some of said plurality of hash values stored within said buffer and in response to a match to indicate a indicated storage location of said data item; and said data processor is operable to access one of said physical address indicators stored within said cache directory corresponding to said indicated storage location and in response to said accessed physical address indicator not indicating said address said data processor is operable to invalidate said indicated storage location within said cache.
Owner:ARM LTD +1

Streamlined convolution computing architecture design method and residual network acceleration system

The invention provides a streamlined convolution computing architecture design method and a residual network acceleration system. According to the method, a hardware acceleration architecture is divided into an on-chip buffer area, a convolution processing array and a point-by-point addition module; a main path of the hardware acceleration architecture is composed of three convolution processing arrays which are arranged in series, and two assembly line buffer areas are inserted among the three convolution processing arrays and used for achieving interlayer assembly lines of three layers of convolution of the main path. A fourth convolution processing array is set to be used for processing convolution layers, with the kernel size being 1 * 1, of the branches of the residual building blocks in parallel, a register in the fourth convolution processing array is configured, the working mode of the fourth convolution processing array is changed, the fourth convolution processing array can be used for calculating a residual network head convolution layer or a full connection layer, and when the branches of the residual building blocks are not convolved, the fourth convolution processing array is skipped out and convolution is not exected; and a point-by-point addition module is set to add corresponding output feature pixels element by element for the output feature of the main path of the residual building block and the output feature of the branch quick connection.
Owner:SUN YAT SEN UNIV

Parallel acceleration LZ77 decoding method and device

PendingCN113890540AImprove decompression performanceReduce visit frequencyCode conversionAccess frequencyComputer engineering
The invention discloses a parallel acceleration LZ77 decoding method and device; wherein the method comprises the steps: controlling an LZ77 decoder to read a plurality of to-be-decoded data units, and carrying out the combination of the plurality of to-be-decoded data units, so as to obtain different combined data pairs, wherein the type of the to-be-decoded data unit is an original character or a distance length pair obtained by compression through an LZ77 algorithm, and the combined data pair is a combination of the original character and the distance length pair; and controlling the LZ77 decoder to decode and output corresponding target data according to the type of the to-be-decoded data unit in the combined data pair, processing the target data through a data copying module, and writing the processed target data into an on-chip RAM cache to obtain decoded data. According to the method, a plurality of to-be-decoded data units are read at the same time, the problem that serial copy delay is too large is solved, meanwhile, an on-chip RAM cache is additionally arranged, the access frequency of an off-chip memory is greatly reduced, access delay and access bandwidth pressure are effectively reduced, and decompression performance is improved.
Owner:INSPUR BEIJING ELECTRONICS INFORMATION IND
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products