Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Ultra-high-dimensional data dimension reduction algorithm based on information entropy

A high-dimensional data and information entropy technology, applied in the field of ultra-high-dimensional data dimensionality reduction algorithms, can solve the problems of reducing the number of features and running time-consuming

Inactive Publication Date: 2017-02-15
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the results show that the running time is too long to meet the application requirements. On this basis, information entropy is introduced, and feature screening is performed first, which greatly reduces the number of features, and then dimensionality reduction is performed. The specific process is as follows: figure 2 As shown, the specific algorithm is as image 3 As shown, the running time of the whole process is reduced several times, and the dimensionality reduction results retain most of the principal components, which can still meet the application requirements.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Ultra-high-dimensional data dimension reduction algorithm based on information entropy
  • Ultra-high-dimensional data dimension reduction algorithm based on information entropy
  • Ultra-high-dimensional data dimension reduction algorithm based on information entropy

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Specific embodiments of the present invention will be described below in conjunction with the accompanying drawings, so that those skilled in the art can better understand the present invention. It should be noted that in the following description, when detailed descriptions of known functions and designs may dilute the main content of the present invention, these descriptions will be omitted here.

[0023] figure 1 It is the flow of dimension reduction processing of ultra-high-dimensional data based on information entropy in the present invention. In this example, if figure 2 As shown in , the original data is used as input. If the original data is a matrix composed of attributes and records, the step of converting to a matrix can be omitted.

[0024] The next step of generating the matrix is ​​to calculate the information entropy H(i) for each attribute, and compare it with the threshold et (et depends on the specific application value). The attributes greater than...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides an ultra-high-dimensional data dimension reduction algorithm based on information entropy, belongs to the field of high-dimensional data preprocessing, and aims to solve the following problems existing in actual application of a conventional PCA algorithm: when data dimensionality (feature) is high enough, all data characteristic read values cannot be stored in memory in one time for analysis and calculation; a partitioning processing method, without depending on a cloud platform and a distributed type calculation platform, is used for experiment; however, the method is long in time consumption and cannot satisfy actual application requirement. On the basis, the thought of the information entropy is introduced for improving the PCA algorithm; the improved algorithm can be used for processing the ultra-high-dimensional data dimension reduction; and experiment results prove that the operating time consumption of the improved algorithm is shortened by 60 times compared with that of the partitioning processing algorithm when it is ensured that the same proportion of the original data information is reserved.

Description

technical field [0001] The invention belongs to the field of high-dimensional data preprocessing, and more specifically, is a dimensionality reduction algorithm for ultra-high-dimensional data based on information entropy improvement. Background technique [0002] With the rapid development of information science and technology, the representation of information is becoming more and more comprehensive, it is becoming easier for people to obtain data, and the data objects of concern are becoming more and more complex. The industry has the most urgent demand for data analysis and processing technology, especially for high-dimensional data. analysis and processing techniques. Directly dealing with high-dimensional data will face the following difficulties: curse of dimensionality, empty space, ill-posedness, and algorithm failure. Aiming at the problem that the data feature dimension is too high, the memory is limited, and cannot be read into the memory for analysis and calcul...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/283
Inventor 何兴高李蝉娟张效藩
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products