High-dimensional vector data visualization method and system based on double-layer anchor point graph projection optimization

An anchor point and vector technology, applied in the field of big data visualization, can solve the problems of wrong visual layout structure, inability to support large-scale data visualization, poor parallelism, etc.

Active Publication Date: 2019-08-30
ZHEJIANG UNIV
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In 2014, Tang Jian et al. proposed the LargeVis algorithm in the article "Visualizing Large-scale and High-dimensional Data" published at the International conference on world wide web conference, which uses data modeling similar to t-SNE The idea is to use the student t-distribution in low-dimensional space to fit the Gaussian distribution in high-dimensional space, but a different optimization method is adopted, that is, only the neighbor relationship between the data point to be investigated and its nearest neighbors is kept at a low In dimensional space, this kind of neighbor relationship is represented by an approximate nearest neighbor graph
However, none of these three algorithms can be implemented directly on the GPU
The reason is that its algorithmic logic is complex and its parallelism is poor, so it cannot support the visualization of large-scale data
In addition, neither BH-t-SNE nor LargeVis can well retain the global layout structure information of the data, so they often show wrong visual layout structures, which makes people misunderstand the structural information in the data
UMAP can only efficiently project data that satisfies the assumption of a uniform distribution in a low-dimensional manifold space and preserve global information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional vector data visualization method and system based on double-layer anchor point graph projection optimization
  • High-dimensional vector data visualization method and system based on double-layer anchor point graph projection optimization
  • High-dimensional vector data visualization method and system based on double-layer anchor point graph projection optimization

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0064] The present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be pointed out that the embodiments described below are intended to facilitate the understanding of the present invention and do not have any limiting effect on it.

[0065] Such as figure 1 As shown, a high-dimensional vector data visualization method based on double-layer anchor map projection optimization, the specific process is as follows:

[0066] (1) Perform K-means clustering on the original high-dimensional spatial data set D, use the obtained cluster center as the anchor point set A, and the number of clusters k c In practical applications, we set it to the default value of 1000; each anchor point is assigned a unique number within 0-999. At the same time, for large-scale data (data sets with more than 5 million data points), we do not cluster on the full set, but sample a subset of no more than 1 million points, and the number of cl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a high-dimensional vector data visualization method and system based on double-layer anchor point graph projection optimization, and the method comprises the steps: (1) carrying out the K-means clustering of an original high-dimensional vector data set D, and an obtained clustering center serving as an anchor point set A; (2) establishing an inverted index of the high-dimensional vector data according to the anchor point set A; (3) establishing an approximate kD neighbor graph GD of the data set D by using the inverted index; (4) for each point d in the data set D, violently searching c anchor points closest to the point d, and updating the c anchor points as neighbors of the point d to a neighbor graph GD; (5) constructing a kA neighbor graph GA on the anchor pointset A in a violent retrieval mode; and (6) based on the high-dimensional spatial data structure information represented by GD and GA, generating a low-dimensional spatial visual projection by using adouble-layer projection optimization algorithm. By means of the method and the system, global macrostructure information and local microstructure information in a high-dimensional space can be reserved together, and high-quality layout information is obtained.

Description

Technical field [0001] The invention relates to the field of big data visualization, and in particular to a method and system for high-dimensional vector data visualization based on double-layer anchor map projection optimization. Background technique [0002] In the era of big data, data generated by information systems such as the Internet has grown exponentially. Due to the unprecedented large scale of data and extremely fast update speed, the laws and patterns contained in mining big data have exceeded the reach of human resources. In particular, a large number of machine learning and data mining algorithms often represent discrete multi-modal data as a continuous real number vector in a high-dimensional space, which is more difficult to be directly understood by humans. High-dimensional vector data visualization technology is a key technology to solve this problem, that is, the main research content of the present invention. High-dimensional vector data visualization techn...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/22G06F16/26G06F16/28
CPCG06F16/2228G06F16/26G06F16/285
Inventor 付聪张永辉蔡登
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products