Source code author identification method based on deep belief network

A deep belief network, author identification technology, applied in the field of Web mining and information extraction, to achieve strong robustness, broad application prospects, and improve efficiency.

Active Publication Date: 2018-06-01
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF3 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] At present, there are few research works using deep learning for source code author identification

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Source code author identification method based on deep belief network
  • Source code author identification method based on deep belief network
  • Source code author identification method based on deep belief network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0054] This embodiment describes the process of adopting a source code author identification method based on a deep belief network according to the present invention, such as figure 1 shown.

[0055] From figure 1 It can be seen that the specific steps are as follows:

[0056] Step 1), construct the source code data set by the source code data acquisition module, and preprocess the source code data;

[0057] Grab source code from a source code website and save it to your local computer. Wherein, the source code website can be a github website, and the URL is https: / / github.com / ;

[0058] Preprocess the collected source code, obtain the source code author and the source code file collection written by him;

[0059] Step 2), for the source code file, the source code feature extraction module adopts the method based on the continuous n-gram code segment model to extract the source code feature;

[0060] A code segment is a string of fields in source code separated by blanks,...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a source code author identification method based on a deep belief network, and belongs to the field of Web mining and information extraction. The method includes the followingsteps of constructing a source code data set, and preprocessing source code data; extracting source code features based on a continuous n-gram code segment model; training a deep belief network modelbased on a training source code file sample; using the trained deep belief network model to identify an author of an source code file, and outputting an author identification result of the source codefile. The method converts a source code author identification problem into a classification problem, and identifies the identity of the author of the source code through the deep belief network, so that the performance and efficiency of identification of author identity are improved, and the method has broad application prospects in the fields of information retrieval, information security, computer forensics and the like.

Description

technical field [0001] The invention relates to a source code author identification method based on a deep belief network, which belongs to the field of Web mining and information extraction. Background technique [0002] Existing source code author identification methods mainly include sorting methods, statistical analysis methods, shallow structure machine learning classification methods, and similarity measurement methods. [0003] Source code author identification based on sorting methods includes sorting methods based on information retrieval and sorting methods based on author portraits. The core idea of ​​the sorting method based on information retrieval is to use information retrieval techniques to identify the author of the source code. First, convert the source code into a string sequence such as operators and keywords; secondly, convert the string sequence into an n-gram sequence; then, build an index for all source code; finally, retrieve the source code whose a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/74G06N3/08G06N99/00
CPCG06F8/74G06N3/084G06N20/00
Inventor 张春霞王森武嘉玉王树良牛振东张佳籴黄达友张沛炎
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products