Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

104 results about "Algorithm acceleration" patented technology

Algorithm Acceleration. Algorithm acceleration uses code generation technology to generate fast executable code. Accelerated algorithms must comply with MATLAB ® Coder™ code generation requirements and rules.

Method and system for deep learning algorithm acceleration on field-programmable gate array platform

The invention discloses a method and system for deep learning algorithm acceleration on a field-programmable gate array platform. The field-programmable gate array platform is composed of a universal processor, a field-programmable gate array and a storage module. The method comprises: according to a deep learning prediction process and a training process, a general computation part that can be operated on a field-programmable gate array platform is determined by combining a deep neural network and a convolutional neural network; a software and hardware cooperative computing way is determined based on the determined general computation part; and according to computing logic resources and the bandwidth situation of the FPGA, the number and type of IP core solidification are determined, and acceleration is carried out on the field-programmable gate array platform by using a hardware computing unit. Therefore, a hardware processing unit for deep learning algorithm acceleration is designed rapidly based on hardware resources; and compared with the general processor, the processing unit has characteristics of excellent performance and low power consumption.
Owner:SUZHOU INST FOR ADVANCED STUDY USTC

FPGA-based clustering algorithm acceleration system and design method thereof

The invention discloses an FPGA-based clustering algorithm acceleration system and a design method thereof. The method comprises the steps of obtaining a key code of each algorithm through a profiling technology; detailing the key code of each algorithm and extracting same function logic (a common operator); redesigning a code structure by using a blocking technology to increase the utilization rate of data locality and reduce the off-chip access frequency; designing an extended semantic instruction set, realizing function logic parts corresponding to the instruction set, and finishing a key code function through operations of fetching, decoding and execution of instructions; designing an acceleration framework of an accelerator and generating an IP core; and transplanting an operation system to a development board, and finishing cooperative work of software and hardware in the operation system. Various clustering algorithms can be supported and the flexibility and universality of a hardware accelerator can be improved; and the code of each algorithm is reconstructed by adopting the blocking technology to reduce the off-chip access frequency so as to reduce the influence of the off-chip access bandwidth on the acceleration effect of the accelerator.
Owner:SUZHOU INST FOR ADVANCED STUDY USTC

Method for accelerating RNA secondary structure prediction based on stochastic context-free grammar

The invention discloses a method for accelerating RNA secondary structure prediction based on stochastic context-free grammar (SCFG), aiming at accelerating the speed of RNA secondary structure predication by using the SCFG. The method comprises the following steps of: firstly establishing a heterogeneous computer system comprising a host computer and a reconfigurable algorithm accelerator, then transmitting a formatted CM model and an encoded RNA sequence into the reconfigurable algorithm accelerator through the host computer, and executing a non-backtrace CYK / inside algorithm calculation by a PE array of the reconfigurable algorithm accelerator, wherein task division strategies of region-dependent segmentation and layered column-dependent parallel processing are adopted in the calculation so as to realize fine-grained parallel calculation, and n numbers of PEs simultaneously calculate n numbers of data positioned different columns of a matrix by adopting an SPMD way, but different calculation sequences are adopted in the calculation according to different state types. The invention realizes the application acceleration of the RNA sequence secondary structure prediction based on the SCFG model and has high acceleration ratio and low cost.
Owner:NAT UNIV OF DEFENSE TECH

FPGA method achieving computation speedup and PCIESSD storage simultaneously

The invention discloses an FPGA method achieving computation speedup and PCIE SSD storage simultaneously. An FPGA is used, an SSD controller and an algorithm accelerator are integrated in the FPGA, the FPGA is further internally provided with a DDR controller and a direct memory read module DMA, and the direct memory read module DMA is connected with the SSD controller, the DDR controller and the algorithm accelerator respectively. According to the FPGA method achieving computation speedup and PCIE SSD storage simultaneously, the two functions of computation speedup and SSD storage are achieved on PCIE equipment, the layout difficult is reduced, overall power consumption of server nodes is reduced, and the cost of an enterprise is reduced.
Owner:FASII INFORMATION TECH SHANGHAI

LSTM (Long Short-Term Memory) forward direction operation accelerator based on FPGA (Field Programmable Gate Array)

The invention discloses a LSTM (Long Short-Term Memory) forward direction operation accelerator based on an FPGA (Field Programmable Gate Array), and works by a hardware and software coordination pattern. A hardware part contains three types of accelerator designs: a single-DMA (Direct Memory Access) pattern LSTM neural network forward direction algorithm accelerator, a double-DMA pattern LSTM neural network forward direction algorithm accelerator and a spare LSTM neural network forward direction algorithm accelerator. The accelerator is used for accelerating a LSTM network forward direction calculation part and comprises a matrix-vector multiplication module, an Element-wise operation module and an activation function module. The single-DMA pattern accelerator has a good operation effecton the aspects of performance and an energy efficiency ratio. The double-DMA pattern accelerator and the sparse network accelerator have the good effect on the aspect of the energy efficiency ratio, and in addition, more on-chip storage resources of the FPGA can be saved.
Owner:SUZHOU INST FOR ADVANCED STUDY USTC

A national cryptographic algorithm acceleration processing system based on an FPGA

The invention discloses a national cryptographic algorithm acceleration processing system based on an FPGA. The system is used for processing a data packet which is sent to a server and needs to be processed by a national cryptographic algorithm. The system comprises an FPGA (Field Programmable Gate Array) accessed to a server through a PCIE (Peripheral Component Interface Express) core interface,wherein the FPGA is used for transmitting a data packet which is stored in the server and needs to be processed by a national cryptographic algorithm to a high-capacity cache DDR of the FPGA at a high speed through a PCIE core interface through DMA reading operation; The method comprises the following steps: processing a data packet needing to be processed by a national cryptographic algorithm through a corresponding national cryptographic algorithm IP core defined by a user, forming the data packet processed by the national cryptographic algorithm and transmitting the data packet to a DDR, and transmitting the data packet processed by the national cryptographic algorithm in the DDR to a server side memory through a PCIE core interface through DMA write operation. The acceleration processing system disclosed by the invention has good reusability and expandability, and has very good popularization and application values.
Owner:北京中科海网科技有限公司

Software and hardware cooperating design method for arithmetic acceleration

InactiveCN101493862AChanging the status quo of secular stagnationImprove compatibilitySpecial data processing applicationsAnalysis dataSystem requirements
The invention discloses a software and hardware collaborative design method of algorithm acceleration. The method has six steps of: step 1: static analysis of algorithm and software; step 2: using software analysis tools to carry out dynamic actual measurement analysis of the software so as to obtain a basic data chart of software operation; step 3: making overall structure and function design of a multi-core hardware system by combination of system requirements, the algorithm analysis and the software actual measurement analysis data; step 4: using appropriate modeling tools (RML) to describe the whole system; step 5: constructing a function process abstract chart GCG (including a function call chart of operation time parameters) on the base of the step 2 and discussing the distribution of the software in the multi-core system by using the chart GCG as the subject; and step 6: carrying out the software and hardware realization of a prototype system according to a proposal obtained from the step 5 and evaluating the realization results. The method has good compatibility, is applicable to the urgent demand for the design of a multi-core system on chip (SOC) and promotes the improvement of multi-core design tools. The method has very high utility value and promising application prospect.
Owner:BEIHANG UNIV

Method, device and equipment for optimizing intelligent video analysis performance

The invention relates to a method, a device and equipment for optimizing the analysis performance of an intelligent video, and the method comprises the steps: (1) carrying out a reference piperine test on a video file for the acceleration of an offline video file, and setting an optimal file slice number; slicing the video file, and issuing a slicing task to the GPU; calling a GPU to decode the slice file, and calling back a decoding result to an algorithm directly through a video memory address, and reducing the performance loss without the video memory-main memory copy, wherein the video analysis algorithm takes the decoded video memory address, calls a GPU for algorithm acceleration and outputs an analysis result; (2) optimizing and expanding the number of paths for real-time video stream algorithm analysis; and calling the GPU to decode each path of real-time video, calling back a decoding result to the algorithm directly through a video memory address, setting double caches by analgorithm end, storing decoded data in multiple paths, transmitting the decoded data to the algorithm for GPU batch processing, and switching the two cache functions after batch processing is completed to achieve the purpose of minimum system delay.
Owner:武汉众智数字技术有限公司

Industrial camera

InactiveCN108696727AIndependent processing capacityEasy to achieve smooth expansionTelevision system detailsColor television detailsGate arrayGraphics processing unit
The invention discloses an industrial camera. The industrial camera comprises an image sensor used for acquiring image data; a programmable gate array FPGA used for connecting the image sensor, a plurality of 10 gigabit optical modules, an HDMI display interface and a graphics processing unit GPU, and performing image processing and executing system and data management; a plurality of 10 gigabit optical module interfaces, which are connected with the image sensor through the FPGA and used for transmitting the image data acquired by the image sensor or processed by the FPGA; the graphics processing unit GPU, which is connected with the FPGA and used for performing algorithm acceleration on the image data transmitted by the FPGA; and the HDMI interface which is connected with the FPGA and used for performing image display on the image data processed by the FPGA. According to the industrial camera provided by the invention, the design of 10 gigabit optical interfaces can realize the smooth expansion of multiple interfaces, achieve the requirement of high bandwidth, and use an optical fiber to perform long-distance transmission without relaying, so that the transmission cost is reduced, the connection of multiple hosts to the camera can be met, and high-bandwidth image data can be acquired.
Owner:杭州言曼科技有限公司

Multi-level scene reconstruction and rapid segmentation method, system and device for narrow space

ActiveCN112200874AImprove scene reconstruction accuracyImprove rebuild speedImage enhancementImage analysisPattern recognitionColor image
The invention belongs to the field of robot scene reconstruction, particularly relates to a multi-level scene reconstruction and rapid segmentation method, system and device for a narrow space, and aims to solve the problem that the reconstruction precision and calculation real-time performance of robot scene reconstruction and segmentation in the narrow space cannot be considered at the same time. The method comprises the following steps: taking a color image, a depth image, camera calibration data and robot spatial position and attitude information; converting sensor data into a single-framepoint cloud through coordinate conversion; dividing scales of the single-frame point cloud, carrying out ray tracing and probability updating to acquire a multi-level scene map after scale fusion; and performing downsampling twice and upsampling once on the scene map, performing lossless transformation by means of scales, and establishing a plurality of sub-octree maps based on a space segmentation result, thereby realizing multi-level scene reconstruction and rapid segmentation. On the premise that necessary details of the scene are not lost, dense reconstruction and algorithm acceleration are achieved, and application to actual engineering occasions is better facilitated.
Owner:INST OF AUTOMATION CHINESE ACAD OF SCI +2

Neural network model real-time automatic quantification method and real-time automatic quantification system

The invention discloses a neural network model real-time automatic quantification method, which is based on an embedded AI accelerator, and comprises the following steps: carrying out embedded AI neural network training at a PC end, establishing a PC end deep learning neural network, and training an input floating point network model of an embedded AI model; quantizing the floating point network model into an embedded end fixed point network model; preprocessing data needing to be quantized, and realizing all acceleration operators of each layer of the model network through a hardware mode; deploying embedded AI hardware of the embedded end and transplanting the neural network model of the embedded end, and transplanting the neural network model of the built AI hardware platform. The invention further discloses a neural network model real-time automatic quantification system. According to the invention, algorithm acceleration is realized based on an embedded AI accelerator hardware mode, the storage occupied space of a neural network model can be reduced, the operation of the neural network model can be accelerated, the computing power of embedded equipment can be improved, the operation power consumption can be reduced, and the effective deployment of the embedded AI technology can be realized.
Owner:SENSLAB INC

Method for achieving quasi-Newton algorithm acceleration based on high-level synthesis of FPGA

The invention discloses a method for achieving quasi-Newton algorithm acceleration based on high-level synthesis of an FPGA. The method comprises the steps that 1, functions of a quasi-Newton algorithm are analyzed, and main calculation modules of the quasi-Newton algorithm are divided; 2, advanced languages C and C++ are utilized to achieve modules in the step 1, and the correctness of the functions of the algorithm are verified; 3, the quasi-Newton algorithm with the functions correct through function verification in the step 2 serves as an input file, a high-level synthesis tool is utilized to convert the advanced languages into RTL-level languages, and generated RTL codes are verified; 4, the generated RTL codes are manufactured into bitstream files, and the files are downloaded to the configurable logical parts of the FPGA. Starting from the quasi-Newton algorithm acceleration, high-level synthesis is utilized to achieve the quasi-Newton algorithm, quasi-Newton algorithm acceleration is achieved through the FPGA, and the FPGA development difficulty is reduced.
Owner:TIANJIN UNIV

Hardware circuit design and method of data loading device for accelerating calculation of deep convolutional neural network and combining with main memory

InactiveCN111783933ASimplify connection complexitySimplify space complexityNeural architecturesPhysical realisationComputer hardwareHigh bandwidth
The invention relates to a hardware circuit design and method of a data loading device combined with a main memory. The hardware circuit design and method are used for deep convolutional neural network calculation acceleration. According to the device, a cache structure is specifically designed and comprises input cache and control, wherein a macro block segmentation method is applied to input ofa main memory or / and other memories, and regional data sharing and tensor data fusion and distribution are achieved; a parallel input register array for converting the data segmentation pieces input into the cache; and a tensor type data loading unit that is connected with the output of the input cache and the input of the parallel input register array. The design simplifies an address decoding circuit, saves area and power consumption, and does not influence high bandwidth of data. The hardware device and the data processing method provided by the invention comprise a transformation method, amacro block segmentation method and an addressing method for the input data, so that the requirement of carrying out algorithm acceleration by limited hardware resources is met, and the address management complexity is reduced.
Owner:北京芯启科技有限公司

Method and system for solving complete risk link sharing group separation path pair

The invention discloses a method and system for solving a complete risk link sharing group separation path pair. The method comprises the following steps: under the condition of meeting a trap in thenetwork of a risk sharing link group, acquiring a risk sharing link group side conflict set T by finding the information of a first work (main) path AP firstly, and providing an algorithm for divisionand rule as well as parallel processing of an original problem by utilizing the risk sharing link group side conflict set T. In the application field in which fault-tolerant protection needs to be carried out on the work route AP in software defining network controller layer route business, the operation time of the complete risk sharing link group separation path algorithm is far less than thatof other same type of algorithms, and the algorithmic speed-up ratio is 20 times higher, and the algorithm is far superior to other same type of algorithms in solving speed. The method can adapt to all fields of current complete risk sharing link group separation routes, and has broader application prospects compared with current complete risk sharing link group separation route algorithms.
Owner:HUNAN UNIV

Virtual terrain rendering method for carrying out resource dynamic processing and caching based on GPU

The invention discloses a virtual terrain rendering method for dynamically processing and caching resources based on a GPU. The method comprises the following steps: constructing a terrain grid according to a spatial quadtree algorithm; gradually subdividing the terrain and cutting the view cone according to a viewpoint position, and creating GPU end caches for different terrain resources according to the resources positioned by the logic coordinates to realize different shader programs; starting different rendering processes for different resources, and storing the processed resources in a cache opened up by a GPU; writing a shader for terrain rendering to create a drawing instruction for the remaining nodes after terrain cutting, and submitting Draw Call to a GPU to complete drawing. According to the method, the computing power of the GPU is fully utilized, rendering work is submitted to the GPU, processing work of rendering resources is submitted to the GPU to be completed, and therefore resource processing work is greatly accelerated. Meanwhile, a whole set of GPU resource caching and access algorithm is realized, the access performance of resources is accelerated, the rendering performance is further improved, and real-time rendering and flexible editing of super-large-scale virtual terrains become possible.
Owner:航天远景科技(南京)有限公司

CUDA-based S-BPF reconstruction algorithm acceleration method

The present invention discloses a CUDA-based S-BPF reconstruction algorithm acceleration method, which overcomes the problem in the prior art that the conventional CT imaging-based image reconstruction algorithm lasts long. The method comprises the steps of 1, reading a plurality of projections from a hard disk and calculating a constant C for the limited Hilbert inverse transformation in a CPU; 2, transmitting the plurality of projections from an internal memory to a video memory and deriving a back projection in a GPU to obtain a DBP image; 3, conducting the limited Hilbert inverse transformation on the DBP image obtained in the step 2 to transmit an obtained result from the video memory to the internal memory. According to the technical scheme of the invention, the method solves the problems in the prior art that the reconstruction algorithm-based GPU acceleration is obvious in accelerating effect and the communication delay becomes a bottleneck in limiting the existing acceleration strategy. Experimental results show that, the speed-up ratio obtained based on the above method is about 2 times based on existing policies.
Owner:THE PLA INFORMATION ENG UNIV

Spectrum resource self-allocation method

The invention provides a spectrum resource self-allocation method, which comprises the following steps of: constructing a cognitive function-based game model through a Nash game model and a Stackelberg game model; constructing a water injection model based on a cognitive function through the game model based on the cognitive function in combination with a classical water injection algorithm; and solving the water injection model based on the cognitive function through distributed free iteration to realize self-optimization allocation optimization of spectrum resources. In iteration, a user caneffectively improve the convergence speed of the system by using an acceleration scheme. According to the invention, the optimization of the utilization efficiency of channel resources is realized, the channel rate of the scheme can reach the maximum value of the theoretical spectrum utilization rate, and good system performance can be obtained in different scenes. Frequency spectrum resources can be intelligently found and utilized, and the frequency spectrum resource utilization efficiency is improved to the maximum extent. An algorithm acceleration scheme is provided by utilizing the capability of predicting a competition result of a cognitive function, so that the operation speed can be greatly improved under the condition of not reducing the performance.
Owner:BEIJING JIAOTONG UNIV

Zu-Chongzhi encryption algorithm acceleration method, system, storage medium and computer equipment

ActiveCN110445601AImprove encryption computing abilityReduce real-time business application delayData stream serial/continuous modificationComputer equipmentAlgorithm acceleration
The invention relates to the technical field of information safety, and particularly discloses a Zu-Chongzhi encryption algorithm acceleration method. The method comprises the step of performing secret key initialization and plaintext logic operation process multichannel parallel execution of an ancestral encryption algorithm based on a multichannel cache ring mode. Based on a multi-channel cachering mode, an ancestral algorithm is operated separately, so that logic separation of the ancestral algorithm. The key initialization and plaintext logic operation are executed in parallel in multiplechannels. The data encryption operation capability is improved. The real-time service application delay is reduced. The invention further discloses a Zu-Chongzhi encryption algorithm acceleration system, a storage medium and computer equipment.
Owner:BEIJING SANSEC TECH DEV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products