The invention discloses an
unbalanced data classification method based on mixed sampling and
machine learning. The method comprises the steps of step 1, generating a
training set; step 2, for a few types of sample sets P in the
training set,
copying P to generate P ', using P and P' to synthesize PP ', adopting an smote
algorithm to generate S on the basis of the PP', and P, P 'and S form PP' S at the same time; step 3, for the majority of types of sample sets N in the
training set, randomly undersampling without putting back to obtain t Ni; step 4, repeatedly executing the step 2 for t timesto obtain t different PP 'Si, and synthesizing Ni and the corresponding PP' Si into a new training set to obtain t subsets; step 5, training to generate t classifiers Hi; and step 6, integrating t Hito obtain a final classifier H, and utilizing the classifier H to complete classification of the
unbalanced data set. According to the method, the attention of few types of samples is improved, and meanwhile information of multiple types cannot be excessively lost; The possibility of over-fitting and over-generalization is reduced; The
training effect is good,
overfitting is not prone to occurring, and the training speed is high.