Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

An adaptive matrix multiplication optimization method based on Godson 3b

A technology of adaptive matrix and optimization method, applied in complex mathematical operations and other directions, which can solve problems such as programming troubles and debugging difficulties.

Inactive Publication Date: 2016-06-22
UNIV OF SCI & TECH OF CHINA
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, the existing matrix multiplication optimization method based on the Loongson 3B hardware system generally suffers from troublesome programming and difficult debugging.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • An adaptive matrix multiplication optimization method based on Godson 3b

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0071] The present invention is based on the adaptive matrix multiplication optimization method of Loongson 3B, is to first divide two source matrices (i.e. the multiplication matrix and the multiplied matrix) of Loongson 3B into two sub-matrices (i.e. Unit multiplication matrix and unit multiplied matrix), use the direct cache access device DCA on the Loongson 3B chip to prefetch the multiplication matrix whose column length is M and whose width is K to the high-speed buffer, and add codes in the direct high-speed buffer A memory access state collection module is added to the DCA control code of the memory access device, and at the same time, the multiplied matrix with a column length of K and a width of N is copied to the secondary high-speed buffer, and the direct register access device DRA is used to read from the secondary high-speed buffer Prefetch the multiplication matrix whose column length is l, the width is h and the multiplied matrix whose length is h and width is g...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an adaptive matrix multiplication optimization method based on Loongson 3B. First, the multiplication matrix and the multiplied matrix of Loongson 3B are divided into two sub-matrices according to the principle that the size of the block is not larger than the second-level cache. The direct cache memory accessor prefetches the multiplication matrix whose column length is M and whose width is K to the high-speed buffer, and at the same time copies the multiplied matrix whose column length is K and whose width is N to the secondary high-speed buffer, and uses direct register access The device prefetches the multiplication matrix with column length l and width h and the multiplied matrix with length h and width g from the secondary high-speed buffer to the register, uses the multiply-add instruction to complete the multiply-add operation, and collects the module through the memory access state Obtain information and make adaptive adjustments to the block parameters M, K, N, l, h, and g to obtain new block parameters, thereby realizing efficient adaptive optimization of matrix multiplication operations on the Loongson 3B platform.

Description

technical field [0001] The invention belongs to the technical field of electrical digital data processing, and in particular relates to a method for optimizing a linear system software package based on Loongson 3B. Background technique [0002] Loongson 3B is China's first eight-core central processing unit (CPU) with completely independent intellectual property rights. In the field of high-performance computing, Godson 3B needs the support of the basic linear algebra subroutine library. According to the officially released Loongson 3B processing user manual, the existing Loongson 3B processor adds a cache lock window, direct register accessor (DRA), and direct cache memory on the basis of the Loongson 3A processor. Receiver (DCA) and other functions. Each central processing unit (CPU) core adopts a 4-transmission superscalar structure, including two 256-bit vector components, a 128x256-bit floating-point register file, which can store 512 double-precision floating-point n...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/16
Inventor 顾乃杰赵增张孝慈张明
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products