The invention discloses a
software defect prediction method based on
data imbalance, which comprises the following steps: taking various error reports with
software metric values as an
original data set for prediction from projects with known bug distribution; performing imbalance
processing on the text matrix in the
original data set by adopting an RSMOTE imbalance
processing strategy to obtain abalanced
data set; modeling the balance
data set by using naive Bayes, polynomial naive Bayes, K neighbors, a
support vector machine, a classification tree and
Adaboost to find a classifier with an optimal prediction effect; and extracting a
software metric value of a new project at an unknown bug position, inputting the software metric value into the classifier for prediction, outputting prediction information about whether each
program segment has a bug or not, and recording and storing the prediction information. According to the method, the RSMOTE imbalance
processing strategy is adoptedto perform imbalance processing on the text matrix in the
original data set, so that the generation of a few types of samples is more flexible, and more extensive and reasonable samples can be generated.