Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A suffix tree-based code file cloning detection method

A code file and detection method technology, which is applied in the field of code file clone detection based on suffix tree, can solve the problem of not being able to know which code sources are copied by the software in advance, and achieve high efficiency

Active Publication Date: 2017-07-28
苏州棱镜七彩信息科技有限公司
View PDF3 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Currently, the research on code clone detection is based on the comparison between two code segments. In fact, it is impossible to know in advance which code sources the software copies. It is necessary to match the code with a large number of codes for detection, which has higher requirements for detection efficiency.
However, using open source code directly imports the entire open source project, and it takes more time to match the code content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A suffix tree-based code file cloning detection method
  • A suffix tree-based code file cloning detection method
  • A suffix tree-based code file cloning detection method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043]The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0044] like figure 1 , figure 2 The code file clone detection method based on the suffix tree, which is to construct a suffix tree for the project file fingerprint, and realize the code file clone detection in linear time, adopt the following steps:

[0045] Step 1: Construct the fingerprint database of open source projects.

[0046] Code fingerprints are constructed at the granularity of code files. The number of fingerprints is controllable and the storage space occupied is limited, so they can be directly stored on the main server. The background for realizing the present invention mainly completes two functions, that is, establishing a fingerprint library and detect...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a suffix tree-based code file cloning detection method which can build suffix trees for engineering project files and achieve code file cloning detection in linear time. An LP detection scheme and algorithm is characterized in that content of source code files of computer software is used as granularity, and by performing lexical analysis and filtering on the code files and obtaining fingerprint values through MD5 hash, fingerprints are created and a fingerprint database is built. The fingerprint database is stored in a MySQL database, and the id of an open source project where the fingerprints are located is used as an index. Nodes marked as cloning results in a suffix tree can be extracted directly and directly stored in a cloning result data table. Thus, cloned code files can be detected in linear time and the method has a higher efficiency than a method characterized by performing detection directly according to fingerprint values and can achieve mass detection.

Description

technical field [0001] The invention relates to a detection method, in particular to a code file clone detection method based on a suffix tree. Background technique [0002] From the birth of the software industry to the present, with the rapid increase in the number of computer users, the software industry has developed rapidly and has penetrated into all aspects of people's work and life. Many software source codes are open on the Internet, and it has become a fast and effective production method for developers to query the relevant codes they need on the Internet. Due to common software functions, code reuse has become a common behavior in software development after simple modification or direct copy and paste. With the rapid development of open source code, millions of software engineering source codes can be found on related networks, such as Google Code Search, GitHub, Snippir, SourceForge, GitHub, etc. Today, open source code has played an important role in software ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44
CPCG06F8/751
Inventor 罗峋饶飞
Owner 苏州棱镜七彩信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products