Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Source code clone identification method and system

An identification method and source code technology, applied in the field of source code cloning identification method and system, can solve the problems that the detection efficiency is difficult to achieve the ideal effect and the analysis is difficult, and achieve the effect of high recognition accuracy and improved detection efficiency

Inactive Publication Date: 2021-04-23
SECZONE TECH CO LTD
View PDF4 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, there are many SCA tools (software composition analysis tools) that can already support the analysis of open source components, but most of these tools analyze the open source components of the project based on the feature files of the project, that is, analyze the open source components used in the project, and based on The open source component analysis of the code is rare, mainly because it is very difficult to analyze the open source component based on the massive open source code, and the detection efficiency is difficult to achieve the desired effect

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Source code clone identification method and system
  • Source code clone identification method and system
  • Source code clone identification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0044] In order to describe the technical content, structural features, achieved goals and effects of the present invention in detail, the following will be described in detail in conjunction with the embodiments and accompanying drawings.

[0045] In order to quickly and accurately identify and detect open source components in software development projects, this embodiment discloses a source code clone identification method, such as figure 1 As shown, it includes the following steps:

[0046] S1: Collect source code files, and store the collected source code files in categories to generate a source code library. In this embodiment, the specific method of collecting source code files is not limited, for example, it can be collected from completed software projects, or obtained from some open source communities by using a device for collecting source code information in real time (such as a crawler tool).

[0047]S2: Using a dimensionality reduction algorithm to analyze and pr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a source code clone identification method and system, and the method comprises the following steps: firstly collecting a source code file, and generating a source code library; secondly, respectively processing each source code file in the source code library by adopting a dimension reduction algorithm to generate a plurality of first character strings; then, processing the target code file by adopting a dimension reduction algorithm to generate a second character string; then, respectively performing matching analysis on the second character string and the plurality of first character strings to obtain N first character strings with the highest matching degree; and finally, comparing the target code file with the codes in the source code files corresponding to the previous N first character strings. By means of the source code cloning recognition method, the N open source code files with high similarity can be rapidly screened out from the massive source code library, then code comparison only needs to be conducted on the N open source code files one by one, and therefore cloning codes can be recognized and detected from the code level, the recognition accuracy is high, and the detection efficiency is effectively improved.

Description

technical field [0001] The invention relates to the technical field of software code clone identification, in particular to a source code clone identification method and system. Background technique [0002] In the software development process, in order to speed up the development progress and cost, software engineers often use some open source software, and clone it into the current development project directly or through some simple processing. However, due to the introduction of open source components, some loopholes will inevitably be introduced. Therefore, after the project development is completed, it is necessary to analyze the open source components in the project, that is, to analyze the open source components referenced in the project, including the used open source components or copied parts of open source. code. At present, there are many SCA tools (software composition analysis tools) that can already support the analysis of open source components, but most of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F8/70
CPCG06F8/70
Inventor 汪杰万振华王颉董燕李华
Owner SECZONE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products