Unsupervised feature selecting method based on conditional mutual information and K-means

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A feature selection method and conditional mutual information technology, applied in computer parts, character and pattern recognition, instruments, etc., can solve the problems of reduced classification accuracy, data imbalance, inapplicability, etc., to reduce redundancy and eliminate randomness. Sexuality, the effect of increasing relevance

Inactive Publication Date: 2017-03-15

NANJING UNIV OF INFORMATION SCI & TECH

View PDF0 Cites 19 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] Most of the existing traditional feature selection methods aim to improve the classification accuracy without fully considering the distribution of data samples, and generally pursue the learning effect of large classes, and tend to ignore the learning performance of small classes

In order to solve the problem of data imbalance, at the data level, the positive samples of the training set can be resampled before training, so that the positive and negative samples can be balanced, and then corresponding learning (Exploratory under-sampling for class -imbalance learning.Liu X Y, Wu J, Zhou Z H), but this cannot make use of all the data, which will reduce the classification accuracy

At the algorithm level, the traditional feature selection algorithm is improved according to the characteristics of the unbalanced distribution of data categories, so that the algorithm can adapt to samples with unbalanced category distribution (new algorithm for feature selection in imbalanced problems: IM-IG. You Mingyu , Chen Yan, Li Guozheng), but this method is limited to two-type imbalance problems, and is not suitable for multi-type imbalance problems

[0006] For filtering feature selection, many supervised feature selection methods have been proposed, such as applying mutual information to evaluate candidate features, and selecting the top features as the input of the neural network classifier (Using mutual information for selecting features in supervised neural netlearning.R.Battiti), but this method ignores the redundancy between features, resulting in the selection of many redundant features, which is not conducive to the performance improvement of subsequent classifiers

And this method is only suitable for data with class label information, not suitable for unsupervised feature selection

[0007] In the field of unsupervised feature selection, many unsupervised feature selection methods applied to text have been proposed, but these methods cannot be directly applied to numerical data

Some unsupervised feature selection methods applied to numerical data, such as unsupervised filtering feature selection algorithms for classification features, are based on one-pass clustering algorithms and use the importance of each feature in different clusters as a basis for judgment , and finally select feature subsets according to the changing law of importance (research on unsupervised feature selection method for classification features. Wang Lianxi, Jiang Shengyi), this method only uses one clustering algorithm to divide the data, so that the clustering results exist Randomness, cannot guarantee the accuracy of feature selection

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0038] Below in conjunction with accompanying drawing, the implementation of technical scheme is described in further detail:

[0039] The unsupervised feature selection method based on conditional mutual information and K-means of the present invention will be further described in detail in conjunction with the flow chart and the implementation case.

[0040] In this implementation case, the conditional mutual information and K-means algorithm are used to select the features of the unlabeled data set. Such as figure 1 As shown, this method includes the following steps:

[0041] Step 10, performing multiple K-means clustering with different K values and different cluster centers on the unlabeled data set, and obtaining each clustering result;

[0042] In step 101, the maximum number of clusters MAX and the minimum number of clusters MIN of the K-means algorithm are predetermined in the input stage, and before each clustering, a number is randomly selected in the range of [...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an unsupervised feature selecting method based on conditional mutual information and K-means. Multiple times of clustering of unclassified labels is carried out by adopting K-means algorithms having different initial conditions, and on the basis of each time of clustering, a modularization metric value of every feature and the conditional mutual information between among the features are considered comprehensively, and related independence indexes among the features are used to select feature subsets having high relevancy and small redundancy. The feature subsets acquired by the clustering of the different K-means are gathered together to acquire a final feature subset. The unsupervised feature selecting method is effectively used for the imbalanced data sets having no labels, and the acquired feature subsets have the high relevancy and the small redundancy.

Description

technical field [0001] The invention belongs to the problem of feature selection in the field of machine learning, and specifically relates to a method for unsupervised feature selection of an unlabeled data set by using conditional mutual information and a K-means algorithm. Background technique [0002] In the practical application of machine learning, the number of features is often large, there may be irrelevant features, and there may be interdependence between features. The more the number of features, the longer it takes to analyze the features and train the model, and it is easy to cause the "dimension disaster", making the model more complex, which will lead to consequences such as a decline in the model's generalization ability. Therefore, feature selection is particularly important. [0003] Feature selection, also known as feature subset selection or attribute selection, refers to selecting a feature subset from all features to make the constructed model better....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06K9/62

CPCG06F18/23213

Inventor 马廷淮邵文晔曹杰薛羽

Owner NANJING UNIV OF INFORMATION SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unsupervised feature selecting method based on conditional mutual information and K-means

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology