Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Program Source Code Encoding Method Based on Code Attribute Tensor

An encoding method and source code technology, applied in the field of program source code encoding based on code attribute tensor, can solve the problem that the encoding method of program source code cannot fully reflect the semantic characteristics of the program, the semantic information of the code is lost, and the high rate of false positives and omissions Questions such as rate of return

Active Publication Date: 2020-09-29
INST OF SOFTWARE - CHINESE ACAD OF SCI
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the program source code contains more semantic structures than natural language, such as control dependencies and data dependencies, directly treating the program source code as plain text will lead to the loss of code semantic information
In the above example, the loss of semantic information will directly lead to the inability of the machine learning model to fully learn the characteristics of the source code, which will lead to a decrease in the accuracy of model prediction, and a high rate of false positives and false positives in defect or vulnerability detection. rate of return
[0004] In order to solve the problem that the encoding method of the existing program source code cannot fully reflect the semantic characteristics of the program, the present invention proposes the concept of code attribute tensor and an algorithm for encoding program source code into code attribute tensor

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Program Source Code Encoding Method Based on Code Attribute Tensor
  • A Program Source Code Encoding Method Based on Code Attribute Tensor
  • A Program Source Code Encoding Method Based on Code Attribute Tensor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] Below in conjunction with accompanying drawing, the present invention will be further described.

[0048] This embodiment is based on the program source code encoding method of the code attribute tensor, wherein the overall process is as follows figure 1 As shown, it mainly includes the following steps:

[0049] 1) Generate a code attribute map for the program source code, the process is as follows figure 2 As shown, the specific description is as follows:

[0050] 1a) Generate an AST for the program source code, assign a code attribute to each node, and its attribute value corresponds to the code represented by the node, and assign a type attribute to each node, and its attribute value corresponds to the statement type of the code represented by the node , assign an order attribute to each node to reflect the ordered structure of the tree, go to 1b).

[0051] 1b) Generate CFG for the program source code, mark the jump condition of the edge, and turn to 1c).

[005...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a program source code coding method based on a code attribute tensor. The program source code coding method comprises the steps of generating a code attribute graph for a program source code; creating a symbol table, a node table and a code attribute tensor, and initializing; encoding the operation relationship between the data type of the AST node and the AST node into a code attribute tensor; encoding the father-child relationship between the AST nodes into a code attribute tensor; and encoding the adjacency relationship between the CFG nodes into the code attribute tensor, and outputting a final code attribute tensor. The invention provides a code attribute tensor concept and an algorithm for encoding a program source code into a code attribute tensor in order to solve the problem that an existing program source code encoding mode cannot fully embody program semantic characteristics. According to the method, under the condition that semantic information is prevented from being lost, program source codes are encoded into tensor-form data to serve as input of a machine learning model, and support is provided for subsequent program static analysis work.

Description

technical field [0001] The invention belongs to the technical field of computers and relates to a program source code encoding method based on code attribute tensor. Background technique [0002] With the development of the computer industry, computer software has become an indispensable part of life, and computer systems are widely used in various industries, including medical, education, military, political and new retail fields. With the rapid development and popularization of computer systems, how to improve software quality and ensure the credibility of its behavior has become an important issue of common concern in both academia and industry. The static analysis technology of programs is a common program analysis technology, which does not need to run the program itself, but only checks the correctness of the program by analyzing or checking the source program's syntax, structure, process, interface and other static information. The method is widely used due to its ea...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/30
CPCG06F8/30
Inventor 段旭吴敬征武延军罗天悦杨牧天倪琛
Owner INST OF SOFTWARE - CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products