Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code

A technology of binary code and conversion method, applied in the field of automatic parallelization of embedded virtual SIMD, can solve the problem of not reducing the overall number of instructions, increasing instructions, etc., and achieving the effect of reducing the number of visits, reducing the number of visits, and reducing the time overhead

Active Publication Date: 2016-01-13
XI AN JIAOTONG UNIV
View PDF2 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Moreover, since the SLP algorithm is mainly based on ARM instructions, a small number of calculation statements are optimized as NEON instructions. Therefore, although the number of some calculation instructions is reduced, the data movement instructions between the q register of NEON and the R register of ARM are added. From the results, there is no reduction in the overall number of instructions, and even in general, it will increase the number of overall instructions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code
  • NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code
  • NEON vectorization conversion method for ARM (Advanced RISC Machine) binary code

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0077] The NEON vectorization conversion method of a kind of ARM binary code of the present invention, its core is:

[0078] 1. In the target-optimized ARM program, some memory access instructions have non-linear address increment or decrement memory access, non-equal length increment or decrement memory access, and memory access exceeding the step limit of the NOEN memory access instruction due to its memory access mode. It cannot be translated into NEON instructions due to reasons such as storage and conditionally executed memory access (that is, conditional memory access instructions). Before the present invention translates ARM instructions into NEON instructions, it first analyzes the instructions that constitute the loop and the instructions that need to be translated in groups, strips them to reduce the burden of subsequent memory access pattern analysis, and then optimizes all memory access in the target program Instructions are counted, and the similarities and differ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention discloses a NEON vectorization conversion method for an ARM (Advanced RISC Machine) binary code. The method comprises the following steps: step 1. carrying out disassembling; step 2. carrying out flow graph generation; step 3. carrying out cycle detection; step 4. carrying out memory analysis; step 5. carrying out instruction translation; and step 6. carrying out assembly instruction output. According to the NEON vectorization conversion method for the ARM binary code disclosed by the present invention, after disassembling the binary code of an ARM, a control flow graph is established and reaching fixed value analysis is performed, and a basic block that a target optimization object is located at is found, and an access mode in the optimized basic block is analyzed, and according to resource scheduling of a free extension register and core register on a chip, a part of repeated access results are stored in a free on-chip register, so that a time overhead of program access is reduced by accessing a high-speed register, thereby achieving the goal of speeding up.

Description

【Technical field】 [0001] The invention belongs to the technical field of embedded virtual SIMD automatic parallelization, and in particular relates to a NEON vectorization conversion method of ARM binary code, which is applicable to the acceleration of underlying functions in related fields of image processing and matrix calculation. 【Background technique】 [0002] ARM processor has become the most popular embedded application processor because of its high performance and low power consumption. As users have more and more stringent requirements on the execution time of ARM programs, some ARM programs with a large amount of data calculation need to be accelerated. In order to accelerate the ARM program when the original ARM program cannot be obtained, the SMID unit can be used to accelerate from the binary code level. [0003] At present, among the algorithms accelerated by SIMD instructions, the SLP algorithm is relatively complete, and according to the test results in rela...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F9/38
Inventor 梅魁志温哲西李博良张少愚刘辉黄雄高榕付帅伍健
Owner XI AN JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products