Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for carrying out combined detection on source code file cloning adjacency lists

A detection method and source code technology, applied in the field of source code processing, can solve the problem of not being able to know which code sources are copied by the software in advance, and achieve the effect of mass detection

Inactive Publication Date: 2017-08-18
苏州棱镜七彩信息科技有限公司
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Currently, the research on code clone detection is based on the comparison between two code segments. In fact, it is impossible to know in advance which code sources the software copies. It is necessary to match the code with a large number of codes for detection, which has higher requirements for detection efficiency.
However, using open source code directly imports the entire open source project, and it takes more time to match the code content

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for carrying out combined detection on source code file cloning adjacency lists
  • Method for carrying out combined detection on source code file cloning adjacency lists
  • Method for carrying out combined detection on source code file cloning adjacency lists

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] The specific implementation manners of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. The following examples are used to illustrate the present invention, but are not intended to limit the scope of the present invention.

[0050] Such as Figures 1 to 3 The source code file clone adjacency list merging detection method, this method is to construct a distributed index and adjacency table merging detection algorithm for the project file fingerprint, and realize the code file clone detection within the time complexity of O(nm). Its implementation can be roughly divided into the following three steps:

[0051] Step 1, data preprocessing.

[0052] Before the source code files are converted into index data, they need to go through several stages of processing, mainly filtering the corresponding code files, extracting tokens, etc.

[0053] First, it is necessary to traverse the directory where t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method for carrying out combined detection on source code file cloning adjacency lists. For an engineering project file, a fingerprint Chunk is constructed by using MD5 and a fingerprint library is established by taking the file as a unit and by taking scanning carried out on each line and fixed line number of codes as a granularity. The fingerprint library is stored in a MySQL database, and a detection algorithm is carried out by taking an ID of an open source project where a fingerprint is located and a Hash value of the Chunk as indexes so that detection of 0 to 3 clone classes can be performed. Namely, the invention discloses a scheme and algorithm for carrying out combined detection on source code file cloning adjacency lists on the basis of distributed indexes. According to the algorithm, cloned code files can be detected under the conditions that the time complexity is O (nm) and the space complexity is O (nm), so as to realize mass detection.

Description

technical field [0001] The invention relates to a source code processing method, in particular to a source code file clone adjacency table merging detection method. Background technique [0002] From the birth of the software industry to the present, with the rapid increase in the number of computer users, the software industry has developed rapidly and has penetrated into all aspects of people's work and life. Many software source codes are open on the Internet, and it has become a fast and effective production method for developers to query the relevant codes they need on the Internet. Due to common software functions, code reuse has become a common behavior in software development after simple modification or direct copy and paste. With the rapid development of open source code, millions of software engineering source codes can be found on related networks, such as Google Code Search, GitHub, Snippir, SourceForge, GitHub, etc. Today, open source code has played an import...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F9/44G06F11/36
CPCG06F8/751G06F11/3604
Inventor 罗峋饶飞
Owner 苏州棱镜七彩信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products