Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Bisecting K-means algorithm based on density dividing principle

A K-means and algorithm technology, applied in the field of binary K-means clustering based on the density division criterion, can solve problems such as clustering quality degradation, improve clustering accuracy, and overcome the effect of not being able to remove noise points

Inactive Publication Date: 2017-07-04
JIANGSU UNIV
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

DBSCAN is a representative algorithm based on density clustering. It can find clusters of any shape in the spatial database, and can find and remove noise points. However, it is very sensitive. If it is not set properly, it will cause a decrease in the quality of clustering.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bisecting K-means algorithm based on density dividing principle
  • Bisecting K-means algorithm based on density dividing principle
  • Bisecting K-means algorithm based on density dividing principle

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0032] The present invention will be further described below in conjunction with the accompanying drawings.

[0033] Such as figure 1 As shown, two clusters of different shapes are shown. According to the traditional "sse minimization division criterion", cluster B must be selected for further division. However, we can clearly perceive that it is cluster A that needs to be further divided instead of Cluster B, if cluster B is selected for division, it will cause the loss of clustering accuracy. This is the problem that the "sse minimization partition criterion" mentioned above is not sensitive to the cluster shape.

[0034] Such as figure 2 As shown, a binary K-means algorithm based on the density division criterion disclosed in the embodiment of the present invention includes the following steps:

[0035] 1) Initialize the point threshold M and the variation threshold δ, where M represents the number of points that should be included in the core point neighborhood at leas...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a bisecting K-means algorithm based on a density dividing principle. Through the bisecting K-means algorithm, a data set is divided into two clusters. Furthermore by means of a dbscan algorithm, the number of sub-clusters which can be obtained through dividing all the clusters is obtained. The cluster with largest number of sub-clusters is further divided through the bisecting K-means algorithm. Clustering is finally finished through continuous iteration. The bisecting K-means algorithm has advantages of overcoming a defect of low sensitivity to cluster shape in an sse minimum dividing principle, more scientifically selecting the cluster to be divided, effectively improving clustering accuracy, furthermore realizing requirement for comparison of the number of sub-clusters in each cluster on the condition of same neighbor radius and number-of-points threshold, preventing clustering accuracy reduction caused by improper neighbor radius and number-of-points threshold setting, overcoming a defect of incapability of eliminating noise points in partitioning clustering, and realizing high realistic meaning.

Description

technical field [0001] The invention relates to the technical field of data clustering, in particular to a binary K-means clustering method based on a density division criterion. Background technique [0002] Clustering, also known as cluster analysis, refers to assigning data to different clusters, so that the data difference in the same cluster is as small as possible, and the data difference in different clusters is as large as possible. Clustering algorithm is an unsupervised learning method, which is widely used in data mining, recommendation system and other fields. Clustering methods can be roughly divided into partition clustering, hierarchical clustering, density clustering, grid clustering and model clustering. [0003] Among them, partition clustering is an easy-to-understand clustering method and the most common clustering algorithm. The famous k-means algorithm is a typical one. The Kmeans algorithm is widely used because it is easy to understand and has low t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 马汉达戴季国薛艳飞
Owner JIANGSU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products