C language-oriented source code clone detection method

A detection method and source code technology, applied in the field of C language-oriented source code clone detection, can solve problems such as high false positive rate, low recall rate, and difficulty in detecting Type-3 code clones.

Active Publication Date: 2022-03-15
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Text-based method does not need to convert the source code, and its accuracy rate is high but the recall rate is low; the Token-based method has a faster detection speed and does not depend on the development language, but it is difficult to detect Type-3 code clones; AST and PDG-based methods need to convert the source code into AST or PDG and then perform corresponding comparisons. These two methods are relatively expensive; since the metric values ​​corresponding to different code segments may be the same, the metric-based The value method has a high false positive rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • C language-oriented source code clone detection method
  • C language-oriented source code clone detection method
  • C language-oriented source code clone detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the purpose, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the implementation methods and accompanying drawings.

[0057] The present invention is based on the C program source code clone detection method based on similarity. This method judges whether two code fragments constitute clone code by calculating the similarity between the codes to be tested, and solves the three types of Type1, Type2 and Type3 better. The detection problem of code cloning. This is a function that other existing code clone detection tools do not have.

[0058] The source code cloning detection method facing C language of the present invention, concrete realization steps are as follows:

[0059] Step S1: Set the syntax rule definition file and the C function extraction file to realize the parsing of the source program and the C function extraction processing.

[0060] In...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a C language-oriented source code clone detection method. The technical solution of the present invention is: adopting the context facial features grammar to define the grammar of C language, for realizing the parsing of the source program, generating the parsing tree of the source program, and then transforming the whole parsing tree to obtain the transformed parsing tree, and then Then restore the source code in text form. In addition, it also includes formatting and normalizing the obtained source code in text form. Perform clone detection on the obtained C function through the LCS algorithm to obtain the clone function detection result of the current function to be detected. During clone detection, only the code sequence length of the function falls within the length range of the clone function allowed by it as the current function to be detected The clone comparison object. The invention can realize the detection of Type3 clones, and control the calculation amount of detection to a certain extent.

Description

technical field [0001] The invention belongs to the technical field of clone code detection, and in particular relates to C language-oriented source code clone detection. Background technique [0002] The current research on clone code detection generally divides clone code into three categories: [0003] (1) Type1: Except for spaces and comments, the cloned code is exactly the same; [0004] (2) Type2: The cloned code with the same syntax and modified identifiers, constants, and types; [0005] (3) Type3: On the basis of Type2, further modify the statement, such as adding a statement, removing the statement, modifying the statement statement, etc. to generate cloned code. [0006] The early idea of ​​code cloning detection is very intuitive, and the code is directly treated as pure text (string), and the similarity of the code is judged from the similarity of the text. The representative technology is Baker's Dup, which, like a general web page similarity detection tool,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 桂盛霖徐参语陈一凡
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products