Library function identification and detection method and system based on convolutional autoencoder

A convolutional self-encoding and detection method technology, which is applied in the field of library function recognition and detection based on convolutional self-encoders, can solve the problems of graph structure recognition method failure, large version span, low accuracy rate, etc., to reduce time and space complexity Accuracy, low time complexity, and high accuracy

Active Publication Date: 2022-04-08
SHANDONG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the library function identification work has particularities: First, because the function call relationship of library functions is very small, the method of relying on CFG for identification has a low accuracy rate
Second, the function similarity analysis may face a large version span, so the internal structure of the function may change greatly, resulting in the failure of the identification method using the graph structure
Third, these methods that rely on function syntax for recognition introduce expert knowledge, which inevitably brings human bias, resulting in poor stability of the results of the method itself
Therefore, for library function identification, these related methods are not suitable for library function matching work

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Library function identification and detection method and system based on convolutional autoencoder
  • Library function identification and detection method and system based on convolutional autoencoder
  • Library function identification and detection method and system based on convolutional autoencoder

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] This embodiment discloses a library function identification and detection method based on a convolutional autoencoder, and the specific steps are:

[0048] Step 1: Randomly collect various versions and types of software files on the open platform to form a data set.

[0049] Step 2: Divide the collected data set composed of various binary files into training set, verification set and test set:

[0050] In the present invention, the dataset needs to be split. Divide the dataset into three disjoint subsets: training set, validation set, and test set. They are used for training, validation and testing, respectively, to evaluate the generalization ability of the trained model on unknown binaries. During training, a validation set is used to determine some hyperparameters.

[0051] Step 3: Count the continuous sequence of two opcodes (ie bi-grams of opcodes) to construct a co-occurrence matrix for each function, which is used as input to train the CAE model.

[0052] 3.1...

Embodiment 2

[0084] The purpose of this embodiment is to provide a computing device, including a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the steps of the above method when executing the program.

Embodiment 3

[0086] The purpose of this embodiment is to provide a computer-readable storage medium.

[0087] A computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the steps of the above-mentioned method are executed.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present disclosure proposes a library function identification and detection method and system based on a convolutional autoencoder, including: extracting the instruction sequence corresponding to the binary function from each software to be tested, and then extracting the operation code in the instruction sequence; The continuous sequence of codes is counted to construct a co-occurrence matrix for each function, which is used as input to train the convolutional autoencoder model, and the convolutional autoencoder model is used to extract bottleneck features and complete model training; use training The final model encodes the library functions, and then performs similarity analysis on the library functions, and identifies the library function with the highest similarity coefficient as the final match. It has good versatility, and can directly mark various newly introduced matching objects, and obtain good results.

Description

technical field [0001] The present disclosure belongs to the field of computer technology, and in particular relates to a library function identification and detection method and system based on a convolutional autoencoder. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Binary code analysis, also known as binary analysis, is the practice of analyzing software's raw binary files to extract its internal design and implementation. For code analysts, binary code contains a large amount of information that can be retrieved, such as code (instructions, basic blocks, and functions), structure (control and data flow), and data (global variables and stack variables). Additionally, binary analysis provides a fundamental picture of a program's behavior because computers execute binaries (executables) rather than source code directly. In situation...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F21/56G06F17/15G06F17/16G06F8/52G06F8/53G06N3/04G06N3/08
CPCG06F21/563G06F21/565G06F17/15G06F17/16G06F8/52G06F8/53G06N3/08G06N3/045
Inventor 王风宇刘学谦孔健
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products