Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Pre-training enhanced code clone detection method

A detection method and pre-training technology, applied in neural learning methods, biological neural network models, instruments, etc., can solve time-consuming problems and achieve the effect of improving accuracy

Pending Publication Date: 2022-01-28
TIANJIN UNIV
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, directly using labeled data sets to train deep neural networks requires a large amount of high-quality labeled data. This work is not only time-consuming but also requires many professionals to label, especially when detecting code clones in different languages, it is necessary to master different languages. professionals to label

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pre-training enhanced code clone detection method
  • Pre-training enhanced code clone detection method
  • Pre-training enhanced code clone detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

[0033] The present invention provides a code clone detection method enhanced by pre-training, see figure 2 ,specific:

[0034] 1. Data collection:

[0035] This example uses the artificially constructed BigCloneBench dataset [3] to evaluate the effectiveness of the present invention. This dataset is a widely used benchmark for Java code clone detection. It contains 8,654,345 annotated real code clone pairs, among which 8,219,320 are type IV clone pairs (95.00%), 279,032 are for non-clone pairs. 100 type IV clone pairs and 100 non-clonal pairs were randomly selected to construct a data set, and the remaining type IV clone pairs and non-clonal pairs were used for testing to si...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a pre-training enhanced code clone detection method, which comprises the following steps of: (1) performing word vector training on words by using a sub-word enriching method, and better representing words outside a word list at the same time; (2) acquiring semantic information of a function fragment by predicting a pre-training task by using a function name, so that the dependence on a labeled data set is reduced; (3) performing fine adjustment on the clone detection model by using a small amount of labeled data so as to achieve a better effect. (4) Through learning the semantics of the code snippets, a better classification effect is achieved.

Description

technical field [0001] The present invention relates to the field of code clone detection, in particular to the detection of IV type code clones. Background technique [0002] Code cloning refers to code fragments that are similar in terms of code statement composition or semantics, and are commonly found in software projects, especially in large-scale projects with many participants. There are many reasons for code cloning. The main reason is that developers can improve efficiency during development, including copying and pasting existing code fragments and adding or removing statements or changing the order of statements, or using development frameworks, design patterns, etc. [1] . Code clone detection is an important task in the field of software engineering, and it is particularly difficult to detect four types of code clones with similar semantics but large grammatical differences. [0003] The problem of code clone detection has been extensively studied. One of the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75G06K9/62G06N3/04G06N3/08
CPCG06F8/751G06N3/08G06N3/044G06F18/241
Inventor 刘爽冷林珊田承霖
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products