High-dimensional data clustering method, electronic device and system

A high-dimensional data and clustering method technology, applied in the Internet field, can solve the problems of low accuracy of clustering results and large amount of calculation, and achieve the effect of controlling the amount of calculation and improving the accuracy

Inactive Publication Date: 2018-12-14
北京国信杰云科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, for high-dimensional data, when using the K-means algorithm for clustering operations, the accuracy of the clustering results is low and the calculation is very heavy.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • High-dimensional data clustering method, electronic device and system
  • High-dimensional data clustering method, electronic device and system
  • High-dimensional data clustering method, electronic device and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In order to make the purpose, technical solutions and advantages of the present invention more clear, the following preferred processing flow will be provided in conjunction with the accompanying drawings in the embodiments of the present invention, and the technical solutions in the present invention will be clearly and completely described. Obviously, the described The embodiments are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0025] The K-means algorithm in the prior art does not have high accuracy of the clustering results due to the random setting of the clustering center, and the calculation amount for high-dimensional data is also very large. The present invention sets a preset reference cluster by calculating the similarity betw...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a high-dimensional data clustering method, electronic device and system. The method comprises the following steps: obtaining the high-dimensional data to be clustered after dimension reduction and encryption from a plurality of clients; obtaining the corresponding base clustering by calculating the high-dimensional data to be clustered after dimension reduction and encryption; calculating the similarity between any two base clusters in the base clustering set; initializing the preset reference clustering; using the iterative algorithm to obtain candidate reference clusters successively based on the preset reference clusters and the tag sets corresponding to the base clusters. The preset reference clusters are updated iteratively until the preset reference clusters are no longer changed, if the preset reference clusters and candidate reference clusters have higher average similarity with the other clusters in the base cluster set, then the preset reference clusters are used as the final clustering results. The invention improves the accuracy of the high-dimensional data clustering result and controls the calculation amount.

Description

technical field [0001] The present invention relates to the technical field of the Internet, and more specifically, to a high-dimensional data clustering method, electronic equipment and a system. Background technique [0002] With the rapid development of Internet technology, a very large amount of data will be generated in real time every day, including mobile communication, electricity consumption data, network transactions and real-time monitoring information. The explosive growth of information volume makes data clustering one of the most important issues in the field of modern information management. [0003] The K-Means algorithm is the most classic distance-based clustering algorithm, which uses distance as the evaluation index of similarity, that is, the closer the distance between two objects is, the greater the similarity between the two objects will be. [0004] At present, when clustering data based on the K-means algorithm, the data to be classified is formed ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06F21/60G06K9/62
CPCG06F21/602G06F18/23213
Inventor 党鹏珍
Owner 北京国信杰云科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products