Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for converting source codes into numeric identifiers and comparison against data sets

An identifier and code technology, applied in the field of code processing and data technology, to simplify database search and reduce data occupied space

Pending Publication Date: 2020-03-31
SNYK SWEDEN AB
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In state-of-the-art solutions, indexing and storage is a huge challenge due to database footprint and search performance due to the large amount of free and open source code available
Furthermore, the task of analyzing source code is a sensitive issue, while storing to the database a large amount of duplicated free and open source code may conflict with their respective licensing conditions

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for converting source codes into numeric identifiers and comparison against data sets
  • Method for converting source codes into numeric identifiers and comparison against data sets
  • Method for converting source codes into numeric identifiers and comparison against data sets

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] Embodiments described herein allow source code to be compared to datasets without requiring a copy of the original source code for those datasets. These implementations also allow row ID generation and database searching to be separated at different locations. Thus, the individual performing the database search does not need access to the code being compared, only the generated row ID. This logical isolation allows database searches to be done without access to the original code or the original source code for comparison.

[0015] Additionally, due to the fact that searches are performed against a fixed-length balanced index, the embodiments described herein allow for a reduced storage footprint by saving only code snippet IDs (not source code) from large datasets, as well as reduced processing time and faster Search responses, which are a result of the uniform data distribution nature provided by cryptographic hashing algorithms.

[0016] The term "large dataset" is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The disclosure relates to a method for converting source codes into numeric identifiers and comparison against data sets, specifically to a system and method for identifying a characteristic of an input code by converting the input code into simplified code and using the simplified code to generate snippets that can be compared to code in a database. Preferably, code is simplified by at least oneof: unifying of capitalization, removing characters, and replacing at least one of a character and a keyword with an identifier.

Description

technical field [0001] The present disclosure generally relates to code manipulation and data techniques. The growing popularity of free and open source code and growing concerns about license compliance has led to the need to build databases out of public free and open source code with the goal of identifying the inclusion of free and open source code in a given source code file. Background technique [0002] In prior art solutions, indexing and storage becomes a huge challenge due to database footprint and search performance due to the large amount of free and open source codes available. Furthermore, the task of analyzing source code is a sensitive issue, and the large amount of duplicate free and open source code stored into the database may conflict with their respective licensing conditions. Contents of the invention [0003] The following simplified summary is provided to provide a basic understanding of some aspects of various inventive embodiments described in th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/38G06F8/41G06F16/31
CPCG06F16/381G06F16/31G06F8/42G06F21/562G06F8/751G06F21/12G06F8/72
Inventor J·克希亚
Owner SNYK SWEDEN AB
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products