Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Code classification method based on neural network linguistic model

A language model, code classification technology, applied in the field of code classification based on neural network language model, can solve problems such as dimensional disaster

Active Publication Date: 2017-09-29
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF4 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Based on the above technical problems, the present invention provides a code classification method based on a neural network language model, aiming to solve the technical problem of dimension disaster caused by symbolic representation methods during code classification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Code classification method based on neural network linguistic model
  • Code classification method based on neural network linguistic model
  • Code classification method based on neural network linguistic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0050] All the features disclosed in this specification, except mutually exclusive features and / or steps, can be combined in any way.

[0051] The present invention will be described in detail below in conjunction with the accompanying drawings.

[0052] A code classification method based on a neural network language model, comprising the following steps:

[0053] Step 1: Use the tool pycparser to convert the code into an AST tree (Abstract Syntax Tree).

[0054] Step 2: Initialize AST tree node c i The vector vec(c i ), that is, assign a random value vector to each node in the AST tree; the node c i Central African leaf node p k The vector is vec(p k ) 1 , the non-leaf node p k child node t x The vector of vec(t x ), where vec(p k ) 1 ∈vec(c i ), vec(t x )∈vec(c i ), where i represents the sequence number of the node, k represents the sequence number of the non-leaf node, and x represents the sequence number of the child node.

[0055] Step 3: In order to make ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of the software engineering, and discloses a code classification method based on a neural network linguistic model. The method comprises the following steps: firstly converting a code to an AST tree, initializing a vector of a node ci of the AST tree, and to obtain a reconstitution vector of a non-leaf node pk by using a vector of a child node tx; updating the vector of the node ci by using an AST-Node2Vec model, if the circulation condition is not satisfied, continuously circulating; if the circulation condition is satisfied, outputting the AST tree with the updated node vector and the reconstitution vector of the updated non-leaf node; and using the AST tree with the updated node vector and the reconstitution vector of the updated non-leaf node as the input of a convolution neutral network based on the tree, and completing the code classification by using the convolution neutral network based on the tree. The method is used for classifying the codes so that the problem of dimension curse can be effectively avoided, the semantically similarity can be displayed, and the codes can be classified better according to the function.

Description

technical field [0001] The invention relates to a code classification method, in particular to a code classification method based on a neural network language model, which can classify codes according to functions. Background technique [0002] Hindle et al. used statistical methods to compare programming languages ​​with natural languages ​​and found that they had very similar statistical properties. These features are very difficult for humans to capture, but they demonstrate that learning-based methods can be applied to the field of code analysis. Code analysis methods based on machine learning have been studied for a long time, relying on a large number of artificial features when solving problems such as code error detection and code duplication analysis. For a specific problem, these features require a large amount of labeled data. Moreover, the data representation of this method is a one hot representation, that is, an N-dimensional vector is used to encode the N wo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F11/36G06F17/30G06N3/08
CPCG06F11/3608G06F16/35G06N3/08
Inventor 屈鸿杨林川涂强张书州王淼颜志鹏王一鸣
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products