Optimized code decompiling method and system based on deep learning

A deep learning and decompilation technology, applied in the fields of information security and software technology, can solve problems such as error-prone, rule conflict, time-consuming, etc., and achieve the effect of increasing readability and high accuracy

Pending Publication Date: 2022-07-01
INST OF INFORMATION ENG CAS
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the known instruction sets are very diverse, and often require significant analyst effort (e.g., several years of development) to summarize the rules
Worse, frequent changes in decompilation rules due to changes in...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Optimized code decompiling method and system based on deep learning
  • Optimized code decompiling method and system based on deep learning
  • Optimized code decompiling method and system based on deep learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below through specific embodiments and accompanying drawings.

[0040] 1. The overall flow of the optimized code decompilation method based on deep learning of the present invention is as follows: figure 1 shown, including the following points;

[0041] 1.1. Data set construction: Obtain the low-level intermediate language LIR and high-level intermediate language HIR code pairs from LPL and HPL, and perform data dependency analysis to obtain the graph-structured LIR and the sequence-structured HIR.

[0042] 1.2. Deep model translation: Select a deep learning model suitable for code translation, use the constructed dataset to train the deep learning model, and learn the mapping rules between LIR and HIR. Using the trained deep learning model, the LIR of the LPL to be decompiled is translated into HIR.

[0043] 1.3. Generate HPL code: According to HIR, restore data flow, restore control structure, and generate HPL code of ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an optimized code decompiling method and system based on deep learning. The method comprises the following steps: acquiring a low-level intermediate language LIR and a high-level intermediate language HIR by utilizing a low-level programming language LPL and a high-level programming language HPL to serve as a training data set; training a deep learning model by using the training data set, learning a mapping rule between the LIR and the HIR, and translating the LIR of the LPL to be decompiled into the HIR by using the trained deep learning model; performing data stream recovery and control structure recovery on the HIR obtained by the deep learning model to generate an HPL code; and searching an active code similar to the generated HPL code by utilizing a similarity matching algorithm, and migrating semantic information in the active code into the generated HPL code. According to the method, the LPL can be automatically converted into the HPL, and high accuracy is achieved for optimized and unoptimized binary decompilation.

Description

technical field [0001] The invention belongs to the technical fields of software technology and information security, relates to software reverse engineering technology, and in particular relates to a deep learning-based optimized code decompilation method and system. Background technique [0002] Compilation is usually the translation of a computer program written in a high-level language (such as C / C++) into a low-level language (such as assembly language or machine code), that is, running on the target CPU (such as X86, ARM). Due to differences between the two PLs (programming languages), unnecessary information (eg variable / function names) is often removed. Also, in this process, optimization techniques aim to minimize or maximize certain properties of an executable program, such as reducing the program's execution time, memory usage, storage size, etc. Note that the execution result of the optimized target program should be the same as the unoptimized original program....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F8/53G06F8/73G06F8/74G06F16/33G06N3/08G06N3/04
CPCG06F8/53G06F8/73G06F8/74G06F16/3344G06N3/08G06N3/044
Inventor 梁瑞刚曹颖陈恺
Owner INST OF INFORMATION ENG CAS
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products