Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Realization Method of Vector Aggregate Load Instruction

An implementation method and instruction technology, which are applied in the implementation field of vector aggregated load instructions, can solve the duplication of the micro-operation management function of a vector buffer unit and the function of the launch queue, and cannot support multiple vector aggregated load instructions and multiple vector aggregated loads at the same time. Instructions cannot be executed in parallel, so as to reduce the number of micro-operations, reduce Perm operations, and improve program performance

Active Publication Date: 2022-02-08
NAT UNIV OF DEFENSE TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

After splitting in this way, in addition to the Load operation for data acquisition, three Perm operations for data conversion are added, which increases the power consumption of instruction execution and the delay of instruction execution, resulting in performance degradation.
In addition, it is also necessary to increase the perm operation of data conversion in hardware design
[0005] The Chinese patent document with the application number 201810668398.4 discloses a launch method and device for mixed execution of scalar and vector instructions. This method can be used for launch and management of vector aggregate load instructions, but this method has the following three disadvantages: 1) The micro-operation management function of the vector buffer component and the launch queue function are duplicated, resulting in waste of resources
2) Since the vector buffer component has micro-operation management logic, the component resources are relatively large, which is limited by the hardware resources of the entire chip. The vector buffer component cannot support multiple vector aggregate load instructions at the same time, resulting in multiple vector aggregate load instructions that cannot be executed in parallel. , program performance is limited
3) Since there are instructions in the launch queue and the vector buffer unit to be sent to the pipeline, additional arbitration unit support is required, and the order management between instructions becomes complicated

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Realization Method of Vector Aggregate Load Instruction
  • A Realization Method of Vector Aggregate Load Instruction
  • A Realization Method of Vector Aggregate Load Instruction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] Such as image 3 As shown, the implementation steps of the implementation method of the vector aggregation load instruction in this embodiment include:

[0035] 1) Split the vector aggregate loading instruction into multiple common loading micro-operations according to the size of the vector elements;

[0036] 2) Send the split normal load micro-operation to the instruction queue;

[0037] 3) Waiting for the source operand of the common load micro-operation to be ready in the instruction queue, and launching the corresponding common load micro-operation to the storage pipeline after the source operand is ready and carrying the vector element number;

[0038] 4) Execute common loading micro-operations for a single element;

[0039] 5) Judging whether the execution is successful, if the execution is successful, jump to the next step; otherwise, the launch queue needs to choose an opportunity to re-transmit the memory access operation of the element to the pipeline to ob...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical field of microprocessor design, and in particular to a method for implementing a vector aggregation load instruction. The corresponding element number is sent to the instruction queue; after the operand is ready, the single-element loading micro-operation is sent to the storage pipeline to obtain data; the obtained data is written into the corresponding element of the corresponding data cache item; all data cache items After the element data is written, the result data is written from the data cache to the result bus, and the execution of the vector aggregation load instruction is completed. The invention can effectively improve the execution performance of the vector aggregation load instruction, and at the same time can maximize the use of the path of the common load instruction, is suitable for high-performance out-of-order superscalar microprocessors, and has the advantages of simple implementation and high performance.

Description

technical field [0001] The invention relates to the technical field of microprocessor design, in particular to a method for realizing a vector aggregation load instruction. Background technique [0002] In order to adapt to the development of application programs and improve the efficiency of program execution, a variety of vector extensions have been added to mainstream instruction sets. Taking full advantage of the parallelism of vector operations can improve system performance. In vector extension, there is a type of vector aggregate load instruction (Gather Load, denoted as GLoad), which is very different from ordinary load instructions. Such as figure 1 As shown in (a) in (a), the ordinary load instruction loads the data of a continuous address in the storage space into the register. And for vector aggregate load instructions, such as figure 1 As shown in (b) in (b), the address of each element of the vector is different. This instruction needs to fetch an element f...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/30G06F9/38
CPCG06F9/30094G06F9/3867G06F9/30007
Inventor 郑重王永文孙彩霞王俊辉隋兵才倪晓强雷国庆黄立波郭维郭辉
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products