C language oriented source code cloning detection method

A detection method and source code technology, applied in the field of C language-oriented source code clone detection, can solve the problems of low recall rate, high false positive rate and high accuracy rate

Active Publication Date: 2019-09-06
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The Text-based method does not need to convert the source code, and its accuracy rate is high but the recall rate is low; the Token-based method has a faster detection speed and does not depend on the development language, but it is difficult to detect Type-3 code clones; AST and PDG-based methods need to convert the source code into AST or PDG and then perform corresponding comparisons. These two methods are relatively expensive; since the metric values ​​corresponding to different code segments may be the same, the metric-based The value method has a high false positive rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • C language oriented source code cloning detection method
  • C language oriented source code cloning detection method
  • C language oriented source code cloning detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail below in conjunction with the embodiments and the accompanying drawings.

[0057] The present invention is based on the similarity-based C program source code cloning detection method. This method determines whether two code fragments constitute a clone code by calculating the similarity between the codes to be tested, which satisfies the three types of Type1, Type2 and Type3. The detection problem of the code clone. This is a feature currently not available in other existing code clone detection tools.

[0058] The specific implementation steps of the C language-oriented source code clone detection method of the present invention are as follows:

[0059] Step S1: Set up the grammar rule definition file and the C function extraction file to realize the analysis of the source program and the C function extraction proces...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a C language oriented source code cloning detection method. According to the technical scheme, adopting a context five-sense-organ grammar to define a C language grammar and analyzing a source program, generating an analysis tree of the source program, then converting the whole analysis tree to obtain a converted analysis tree, and then restoring the converted analysis treeinto a source code in a text form. In addition, formatting normalization processing is carried out on the obtained source code in the text form, and cloning detection is performed on the obtained C function through an LCS algorithm to obtain a cloning function detection result of the current to-be-detected function, and during cloning detection, only the code sequence length of the function fallsinto the allowable length range of the cloning function to serve as a cloning comparison object of the current to-be-detected function. The Type 3 clone can be detected, and the calculated amount ofdetection is controlled to a certain extent.

Description

Technical field [0001] The invention belongs to the technical field of clone code detection, and specifically relates to C language-oriented source code clone detection. Background technique [0002] The current research on clone code detection generally divides clone code into 3 categories: [0003] (1) Type1: Except for spaces and comments, other identical clone codes; [0004] (2) Type2: The clone code with the same syntax and modified identifiers, constants, and types; [0005] (3) Type3: Make further modifications to the statement based on Type2, such as adding a statement, removing a statement, modifying the cloned code generated by the statement, etc. [0006] The early idea of ​​code cloning detection is very intuitive, directly treating the code as pure text (string), and judging the similarity of the code from the similarity of the text. The representative technology is Baker's Dup, which is like a general webpage similarity detection tool, which judges the code clone by com...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F8/75
CPCG06F8/751
Inventor 桂盛霖徐参语陈一凡
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products