Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Mining method and system of core code elements in software document

A technology of code elements and software codes, which is applied in the field of mining core code elements in software documents, can solve problems such as poor retrieval effect, impact on accuracy, strong/weak/true/false distinction and judgment, and achieve a wide range of applications, Improve quality, improve the effect of quality

Inactive Publication Date: 2018-04-20
PEKING UNIV
View PDF2 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

If the keyword does not appear in the software document or the word frequency of the keyword is not high, it may lead to poor retrieval effect
In addition, more synonyms, close synonyms or abbreviations are used in software documents, which also poses a relatively large challenge to the effectiveness of information retrieval methods
[0008] (2) The second type of method currently focuses on identifying the code unit to which the code element in the document belongs, and does not distinguish and judge the strength / truth of these associations
Noise correlation will greatly affect the accuracy of tracking the relationship between software documentation and software code
[0009] Since there is no method in the prior art that can measure the relationship between software documents and software codes and mine the core code elements of documents

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mining method and system of core code elements in software document
  • Mining method and system of core code elements in software document
  • Mining method and system of core code elements in software document

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

[0057] The present invention provides a system for mining core code elements in software documents, such as figure 1 As shown, the system includes software document and software code preprocessing module, feature extraction module and classification learning algorithm module.

[0058] The software document and software code preprocessing module preprocesses the obtained software document and software source code, extracts the data required for the subsequent process, and obtains a set of candidate document-code association relationships;

[0059] According to the data obtained in the software document and software code preprocessing module, the feature extraction module extracts 8 types of related features in the document and code for data modeling;

[0060...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a mining method and system of core code elements in a software document. The method includes: 1) collecting software documents and source code of to-be-processed software projects, and parsing the software documents and the source code to obtain a candidate document-code relation set; 2) extracting text and code features from the software documents and the software code related to candidate document-code relations, and organizing the features to construct feature vectors of candidate code elements in the software documents; and 3) utilizing the feature vectors of training data with the label core code elements to obtain a classification learning model through optimizing model parameters, and identifying the core code elements in the software document through the classification learning model. The system includes a software document and software code preprocessing module, a feature extraction module and a classification learning algorithm module. According to the method and the system, relations between software documents and software code can be tracked and measured, and core relations and noise relations can be distinguished.

Description

technical field [0001] The invention relates to the field of software document-code correlation tracking, in particular to a mining method and system for core code elements in software documents. Background technique [0002] For a long time, software traceability research has been widely concerned. Researchers try to establish the traceability association between software documents and software codes by mining the potential associations between software codes and various natural language documents. This is of great significance for many aspects such as program understanding, requirements engineering, and software maintenance. [0003] Existing methods for mining associations between software documents and software codes can be divided into two categories: [0004] 1. The method based on information retrieval. The basic idea of ​​this method is to use the software code as a query condition to perform information retrieval in candidate software documents, and establish a r...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F8/73
CPCG06F8/73
Inventor 邹艳珍曹英魁谢冰
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products