GPDSP-oriented large-scale matrix multiplication calculation method

A matrix multiplication, large-scale technology, applied in the field of large-scale matrix multiplication calculations, can solve problems such as difficulty in taking advantage of GPDSP vector calculations, and achieve the effects of efficient large-scale matrix multiplication calculations, close coordination and convenient operation

Active Publication Date: 2015-05-20
NAT UNIV OF DEFENSE TECH
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The traditional Cache-oriented block matrix multiplication method is not suitable for GPDSP's non-Cache vector array storage access mode and the architectural characteristics of vector processing array concurrent vector processing, which makes it difficult to take advantage of GPDSP's vector computing advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • GPDSP-oriented large-scale matrix multiplication calculation method
  • GPDSP-oriented large-scale matrix multiplication calculation method
  • GPDSP-oriented large-scale matrix multiplication calculation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0037] Such as figure 1 As shown, it is a schematic diagram of a simplified memory access structure model of the GPDSP that the present invention is oriented to in a specific application example. The system includes a CPU core unit and a DSP core unit. The DSP core unit includes several 64-bit vector processing array computing units, dedicated on-chip scalar memory and vector array memory, on-chip shared storage shared by the CPU core unit and DSP core unit, and large-capacity off-chip DDR memory.

[0038] Let the number of DSP cores in the GPDSP be r; the number of vector processing array calculation units of the DSP core is p, the number of MAC (multiply-add components) of each calculation unit is q, and the on-chip vector array memory capacity of the DSP core is s1 Bytes, the on-chip scalar memory capacity of the DSP core is s2 b...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a GPDSP-oriented large-scale matrix multiplication calculation method. The method comprises the steps that S1, a CPU core is used for distributing storage space for a matrix A, a matrix B and a matrix C on an off-chip DDR memory, and initialized data are generated or data needed by the calculation are transmitted from other data sources; S2, the best block size MB, KB and NB needed by block matrix multiplication are determined according to the architecture features of a GPDSP system; S3, the CPU core is used for conducting logic block partitioning on the matrix A, the matrix B and the matrix C according to the MB value, the KB value and the NB value determined in the step S2; S4, the CPU core of the GPDSP is used for dispatching a DSP core to conduct the multiplication and addition calculation (please see the specifications for the formula) of subblock matrixes; S5, calculation is completed. According to the GPDSP-oriented large-scale matrix multiplication calculation method, the principle is simple, operation is convenient, the general purpose computation of the CPU core of the GPDSP and the powerful parallel computing and high-bandwidth vector data loading capacity of a DSP core vector processing array are fully utilized, and the DSP core calculation memory access rate is obviously improved.

Description

technical field [0001] The present invention mainly relates to a general-purpose digital signal processor (General-Purpose Digital Signal Processor, referred to as GPDSP), in particular to a large-scale matrix multiplication calculation method suitable for GPDSP. Background technique [0002] Basic Linear Algebra Subroutines (BLAS) is one of the most commonly used core mathematical algorithm libraries for various scientific calculations. The industry has launched highly optimized BLAS implementations for their respective processor platforms, such as IBM's ESSL, Intel's MKL, AMD's ACML, etc. Among them, the matrix multiplication (General Matrix-Matrix Multiplication, GEMM) is the core algorithm of the BLAS library. GEMM is a typical calculation-intensive and memory-intensive application, which has very high requirements on the computing power, memory access bandwidth and delay of the processor. Related literature research shows that GEMM calculation occupies a high-performan...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/16G06F7/523
Inventor 刘仲陈书明万江华陈磊田希彭元喜陈虎扈啸孙永节陈胜刚孙海燕阳柳张雪萌马胜
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products