Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Clone code detection method capable of automatically constructing and utilizing pseudo-clone corpus

An automatic construction and code detection technology, which is applied in the field of program analysis and machine learning, can solve problems such as lack of accuracy, good time and space efficiency, and difficulty in code clone detection, achieving the effect of improved efficiency, low cost, and simple and fast process

Active Publication Date: 2020-02-28
TIANJIN UNIV
View PDF7 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there is currently a lack of detection algorithms with high accuracy and good space-time efficiency, and the detection of code clones is still a relatively difficult research topic.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clone code detection method capable of automatically constructing and utilizing pseudo-clone corpus
  • Clone code detection method capable of automatically constructing and utilizing pseudo-clone corpus
  • Clone code detection method capable of automatically constructing and utilizing pseudo-clone corpus

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings and embodiments.

[0020] The clone detection method of the present invention takes the program method as the granularity, that is, detects two similar methods in java projects, and judges them as clone pairs. Through a reasonable method to determine what is the most basic code unit, these code units can effectively represent the vocabulary and grammatical information that the source code can use in code clone detection; and then by embedding the basic unit word of the program code into a vector , using a supervised classifier to detect code clone pairs; and creatively proposed a new method of using a pseudo-training corpus to train a neural code clone detection model, which can realize the automatic construction of a large-scale pseudo-training dataset at zero cost.

[0021] Such as figure 1 As shown, it is a flow chart of a clone code detect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a clone code detection method capable of automatically constructing and utilizing a pseudo-clone corpus. The clone code detection method comprises the following steps: step 1,automatically constructing a pseudo-training corpus; step 2, determining the most effective code unit in the clone code detection task, namely, segmenting a code basic unit required by clone detectionby using a BPE method; step 3, splicing tokens according to the most basic and most effective code units BPE determined in the step 2, performing word embedding, training a word embedding model for all methods in a corpus by using a statement sequence displayed by token characters, and generating a {code unit-vector} corresponding dictionary; step 4, establishing a simple and effective BiLSTM classification model for code clone detection, and performing training; and classifying the two methods by utilizing an L2-Norm algorithm, and judging whether the two methods are clone pairs or not. According to the invention, a clone code detection tool is realized, and a better detection effect on 1, 2 and 3 type clones is achieved.

Description

technical field [0001] The invention relates to the fields of program analysis and machine learning, in particular to a program code clone detection method. Background technique [0002] The prior art involved in the present invention is as follows: [0003] (1) Clone code detection: Clone code refers to code fragments with similar text or similar functions in software projects. Studies have shown that the proportion of clone code in a system is about 7% to 23%, or even as high as 50%. According to the difference in code similarity, code cloning can be divided into four types: Type 1 cloning: After excluding the differences in the format of program codes such as newlines, blanks, and tabs, as well as the differences in comment statements, the two code fragments Exactly the same; type 2 clone: ​​two code fragments meet the definition of type 1 clone except for the difference between constant values ​​and identifiers such as variable names and function names, then these two c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75
CPCG06F8/751Y02D10/00
Inventor 桑炜王赞
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products