Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

High-dimensional imbalanced data classification method based on SVM

A data classification, SVM-BRFE technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problems of the traditional sampling method being meaningless, the effect is not obvious, and it cannot change the preference of the classifier to most classes.

Inactive Publication Date: 2018-01-09
HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, in high-dimensional unbalanced data, due to the existence of high-dimensional problems, the traditional sampling method cannot change the classifier's emphasis on the majority class, thus making the traditional sampling method meaningless
The experimental research in [21] shows that although the SMOTE method can increase the classifier’s attention to minority classes in low-dimensional data, the effect is not obvious in high-dimensional data.
The main reason is that the minority class generated by the SMOTE method will introduce the correlation between samples in the new sample space, rather than the correlation between features, so the generated minority class cannot restore the original sample space well. class distribution

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional imbalanced data classification method based on SVM
  • High-dimensional imbalanced data classification method based on SVM
  • High-dimensional imbalanced data classification method based on SVM

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0030] By analyzing the SVM-RFE feature selection process, the present invention finds that in the process of feature iterative selection, the imbalance problem can be taken into account by improving the feature evaluation system of the packaged feature selection process, so the feature of SVM automatic boundary division is used to analyze Hill The sample points in the Bert space are resampled to improve the F1 value of the support vector machine model, and the feature weight vector w of the SVM at this time is used as the feature evaluation standard. The following is to combine the two, and to perform feature selection on high-dimensional unbalanced data while considering the imbalance problem to solve high-dimensional problems. The time complexity of this algorithm is O(d 2 ), d is the total number of features, and the main process is as fol...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention proposes a high-dimensional imbalanced data classification method based on SVM. The method includes two parts. The first part is feature selection. An SVM-BRFE algorithm is used to carryout boundary resampling to find the optimal feature weight to carry out feature importance measuring, feature selecting and training set updating, and the process is repeated. Finally, a feature mostconductive to enhancing the F1 value is retained, and other features are removed. A subsequent training process is carried out under the condition with feature redundancy and irrelevant feature combination as less as possible and dimension as low as possible. The influence of a high-dimensional problem on an imbalance problem and the constraint of an SMOTE oversampling algorithm are reduced. Thesecond part is data sampling. An improved SMOTE algorithm, namely PBKS algorithm, is used. Few classes in boundaries automatically partitioned by SVM are used as distance constraints in the Hilbert space Dxij<H>, and original constraints are replaced. A grid method is used to find the approximate preimage. The method provided by the invention can finish the classification task of high-dimensionalunbalanced data stably and effectively, and can obtain a considerable effect.

Description

technical field [0001] The invention belongs to the technical field of data classification, and in particular relates to a method for classifying unbalanced samples. Background technique [0002] In the classification task of data mining, the current classification methods for high-dimensional imbalanced data are to solve the high-dimensional problem or imbalance problem first, and then solve another problem, without considering the impact of high-dimensional characteristics on the classification of unbalanced data. Implications of new problems and imbalanced features for classification of high-dimensional data. The classification task of imbalanced data is mainly carried out from two levels: sampling at the data level and classification at the algorithm level. [0003] The sampling method at the data level is one of the important means to solve the unbalanced data distribution in the sample space. Through methods such as undersampling, resampling, and mixed sampling, the s...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06F17/30
CPCG06F18/213
Inventor 张春慨
Owner HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products