Unbalanced feature selection method based on global minimum redundancy

A feature selection method and minimum redundancy technology, applied in the direction of instruments, character and pattern recognition, computer components, etc., can solve the problems of class imbalance and low classification accuracy, so as to ensure rich information and improve Classification effect, effect of reducing redundant relations

Inactive Publication Date: 2021-09-07
SOUTHWEST JIAOTONG UNIV
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although the GRM algorithm has been proven to be a concise and effective feature selection framework, this method does not take into account the class imbalance problem that is common in general data.
It is easy to cause the selected feature subset to be biased towards the majority class samples and ignore the minority class samples, resulting in low classification accuracy of the minority class samples when the algorithm references and unbalanced data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Unbalanced feature selection method based on global minimum redundancy
  • Unbalanced feature selection method based on global minimum redundancy
  • Unbalanced feature selection method based on global minimum redundancy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] combined with figure 1 It can be seen that the specific implementation steps are as follows:

[0048] Input: unbalanced dataset X, trade-off parameter λ∈[0,1]. where X={x 1 ,x 2 ,...x n},x i =[x 1 (i),x 2 (i),...,x d (i)], f i Represents the i-th feature, and the class label of the sample is represented as y i ∈{0,1}. For the convenience of expression, the samples belonging to the class c ∈ {0,1} are expressed as The number of samples of category c is n c , while the mean vector of class c is expressed as μ c ∈ R d×1 , μ ∈ R d×1 is the overall mean vector.

[0049] Step 1: Initialize the projection vector w ∈ R d×1 and β∈R d×1 , assuming that the penalty coefficient of the quadratic term of the augmented Lagrange multiplier method μ>0, and the step coefficient ρ>1 of the augmented Lagrange method;

[0050] Step 2: According to the input data set X, calculate the inter-class distance vector s∈R d×1 and the improved within-class scatter matrix form S ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an unbalanced feature selection method based on global minimum redundancy, which comprises the following steps of: firstly, establishing a regularization form SIR of an intra-class divergence matrix which emphasizes a minority class and improves a redundancy measurement method on the form of a general LDA divergence matrix according to the characteristics of unbalanced data; secondly, using the regularization form SIR and the absolute inter-class distance vector s as an intra-class divergence matrix and an input feature score vector respectively and introduced into a GRM model, and obtaining an objective function of a GRM-DFS algorithm; and finally, solving the optimization problem through an augmented Lagrangian multiplier method and a segmented root seeking method. According to the method, the problem of solving the global minimum redundancy feature subset biased to a minority class in the unbalanced data is effectively solved. For a subsequent classification algorithm, the GRM-DFS algorithm is helpful for avoiding overfitting and improving the algorithm performance, so that the knowledge discovery efficiency is improved.

Description

technical field [0001] The invention relates to the field of granular computing and knowledge discovery in artificial intelligence, in particular to an unbalanced feature selection method based on global minimum redundancy. Background technique [0002] In the field of machine learning, the class imbalance problem is regarded as a key problem. The class imbalance problem usually occurs when the number of samples of the minority class in a data set is far less than that of the majority class, but the minority class is more important than the majority class. In recent years, when dealing with real data sets from fields such as medical diagnosis, intrusion detection, and credit rating, the class imbalance problem has attracted more and more researchers' attention, and is regarded as one of the top ten problems in the field of data mining. one. In general, most classification algorithms assume that samples have a relatively balanced distribution among all classes. When the cl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/2115G06F18/2411G06F18/22G06F18/2414
Inventor 陈红梅黄书豪杨晓玲李天瑞罗川
Owner SOUTHWEST JIAOTONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products