Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Extremely unbalanced data classification method based on EasyEnsemble algorithm and SMOTE algorithm

A technology of balancing data and classification methods, applied in computing, computer parts, character and pattern recognition, etc., can solve problems such as data imbalance, and achieve the effect of improving reliability

Inactive Publication Date: 2018-09-28
BEIJING JIAOTONG UNIV
View PDF0 Cites 15 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The EasyEnsemble algorithm effectively solves the problem of data imbalance and reduces the loss of majority class sample information caused by undersampling

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extremely unbalanced data classification method based on EasyEnsemble algorithm and SMOTE algorithm
  • Extremely unbalanced data classification method based on EasyEnsemble algorithm and SMOTE algorithm
  • Extremely unbalanced data classification method based on EasyEnsemble algorithm and SMOTE algorithm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0040] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary only for explaining the present invention and should not be construed as limiting the present invention.

[0041] Those skilled in the art will understand that unless otherwise stated, the singular forms "a", "an", "said" and "the" used herein may also include plural forms. It should be further understood that the word "comprising" used in the description of the present invention refers to the presence of said features, integers, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and / or groups thereof. It will be understoo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an extremely unbalanced data classification method based on an EasyEnsemble algorithm and an SMOTE algorithm. The method comprises: a plurality of minority class subsets are constructed by using an SMOTE algorithm and minority class samples are increased; random undersampling is carried out on majority classes, and all majority class subsets and minority class subsets are combined to obtain a plurality of training subsets with a fixed sample proportion; noise reduction is carried out on each training subset; AdaBoost classifiers are trained by using the training subset after noise reduction; and then all AdaBoost classifiers are integrated to obtain a final classifier. According to the invention, a problem of shortage of minority class samples is solved; and the unbalancing state of the sample is changed by combining random undersampling. With the noise reduction technology, reliability of a new data set is improved; the classification boundary is smoothened; andmajority class information losses are reduced by using an integration method, so that the performance of the classifier is improved.

Description

technical field [0001] The invention relates to the technical field of binary classification of unbalanced data, in particular to a method for binary classification of unbalanced data based on the EasyEnsemble algorithm and the SMOTE algorithm. Background technique [0002] Data imbalance means that in a sample data set, the number of samples of a certain class is much less than the number of samples of other classes. In practical situations, such data sets often appear, such as: fraud detection, fault diagnosis, medical diagnosis of rare diseases, churn prediction, etc. At present, most algorithm models are proposed under the premise of data balance, so when applied to unbalanced data processing, the performance of the algorithm model will be greatly reduced. When the data is extremely unbalanced, there may even be failures, that is, the correct rate is almost zero. Contents of the invention [0003] Analysis of technological innovation: [0004] EasyEnsemble algorithm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06K9/62
CPCG06F18/2148G06F18/24
Inventor 秦雅娟林小榕张宏科
Owner BEIJING JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products