Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Semi-supervised dimensionality reduction method for high dimensional data clustering

A high-dimensional data, semi-supervised technology, applied in the field of data processing, can solve the problems of dimensionality disaster, unsuitable cluster analysis dimension reduction method, data complexity, etc., to achieve improved discrimination ability, good interpretability, cluster analysis simple yet effective effect

Inactive Publication Date: 2012-04-11
ZHEJIANG UNIV
View PDF0 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] These traditional clustering methods have successfully solved the clustering problem of low-dimensional data, but due to the complexity of data in practical applications, they often fail when dealing with many high-dimensional data.
Because traditional clustering methods mainly encounter two problems when clustering high-dimensional data sets: (1) there are a large number of irrelevant attributes in high-dimensional data sets, making the possibility of clusters in all dimensions almost zero; (2) ) The curse of dimensionality brought by high dimensions makes the practicality of some clustering algorithms almost zero
[0014] PCA is a traditional and classic unsupervised dimensionality reduction method, which has been widely used in various applications. This method can effectively find out the main features of the data, but it cannot effectively extract the category features of the data; LDA is a supervised The dimensionality reduction method, although the effect is good, but this method requires a large amount of data containing label information as training data, so it is only suitable as a dimensionality reduction method for classification, not as a dimensionality reduction method for cluster analysis; NMF As a basic dimensionality reduction framework, the data obtained by dimensionality reduction has good interpretability and has become a hot spot at present, but the effect of cluster analysis after dimensionality reduction is not ideal, and the discriminative ability of cluster analysis is still low. There is room for improvement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Semi-supervised dimensionality reduction method for high dimensional data clustering
  • Semi-supervised dimensionality reduction method for high dimensional data clustering
  • Semi-supervised dimensionality reduction method for high dimensional data clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] In order to describe the present invention more specifically, the dimensionality reduction method of the present invention will be described in detail below in conjunction with the drawings and specific embodiments.

[0039] Such as figure 1 As shown, a semi-supervised dimensionality reduction method for high-dimensional data clustering, including the following steps:

[0040] (1) Construct sample feature matrix.

[0041] In this embodiment, the Yale face data set is taken as an example, and the statistical information of the data set is shown in Table 1.

[0042] Table 1: Yale face dataset statistics

[0043] data set Face image frame number Number of face categories number of image features Yale 165 15 1024

[0044] Among them, there are 165 frames of face images in the Yale face data set, and the 165 frames of face images are composed of 15 face images of people with different appearances (11 frames of face images for each person). ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a semi-supervised dimensionality reduction method for high dimensional data clustering. The method comprises the following steps: (1) constructing a sample characteristic matrix; (2) constructing a constraint matrix; (3) constructing an iterative equation set and an iterative output transition matrix; and (4) obtaining a sample characteristic matrix after an operation of dimensionality reduction is performed. In the invention, through adding part of known class information as a constraint in the process of decomposing the sample characteristic matrix, and using an idea of concept decomposition, a coefficient matrix obtained by decomposition is used as the low-dimensionality representation of a high-dimensionality sample characteristic matrix, and when the low-dimensionality matrix is applied to clustering analysis, the clustering analysis becomes simple and effective; meanwhile, in the invention, data subjected to dimensionality reduction has a good interpretability; and compared with a dimensionality reduction method in the prior art, by using the dimensionality reduction method disclosed by the invention, the discrimination capacity of the clustering analysis can be further improved.

Description

technical field [0001] The invention belongs to the technical field of data processing, and in particular relates to a semi-supervised dimensionality reduction method for high-dimensional data clustering. Background technique [0002] Clustering is a common multivariate statistical analysis method in machine learning and data mining. It discusses a large number of samples and requires reasonable classification according to their respective characteristics. There is no model for reference or to follow, that is, in performed without prior knowledge. At present, as an effective means of data analysis, clustering methods are widely used in various fields: in business, cluster analysis is used to discover different customer groups, and characterize the characteristics of different customer groups through purchase patterns; In biology, cluster analysis is used to classify animals and plants and to classify genes to gain an understanding of the inherent structure of populations; i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 刘海风杨政吴朝晖
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products