Clustering method, system and medium for automatically confirming number of clusters based on coefficient of variation

A technology of coefficient of variation and clustering method, applied in instrument, calculation, character and pattern recognition, etc., can solve the problem of improper selection of initial centroid for manually setting the number of clusters

Active Publication Date: 2021-04-09
UNIV OF JINAN
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In order to solve the deficiencies of the prior art, the present invention provides a clustering method, system and medium for automatically confirming the number of clusters based on the coefficient of variation, which solves the defects of the traditional k-means++ clustering algorithm manually setting the number of clusters and improper selection of initial centroids , using the concept of variation coefficient and density index to improve the division-based k-means++ clustering algorithm, without manually setting the number of clusters, it also ensures the accuracy of the clustering results;

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Clustering method, system and medium for automatically confirming number of clusters based on coefficient of variation
  • Clustering method, system and medium for automatically confirming number of clusters based on coefficient of variation
  • Clustering method, system and medium for automatically confirming number of clusters based on coefficient of variation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0059] It should be pointed out that the following detailed description is exemplary and intended to provide further explanation to the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

[0060] It should be noted that the terminology used here is only for describing specific implementations, and is not intended to limit the exemplary implementations according to the present application. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural, and it should also be understood that when the terms "comprising" and / or "comprising" are used in this specification, they mean There are features, steps, operations, means, components and / or combinations thereof.

[0061] Such as figure 1 As shown, the clustering method that automatically confirms the number of clusters base...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a clustering method, system and medium for automatically confirming the number of clusters based on the coefficient of variation, calculates the density value of each data point in the data set, calculates the density index according to the density value, and selects the data point with the largest density index as the first cluster. Class center; calculate the shortest distance between each data point and the current existing cluster center, and then calculate the probability of each data point being selected as the cluster center according to the shortest distance, and pre-select the cluster center according to the roulette method; until the selection Set a cluster center, and perform k-means clustering according to the selected initial cluster center to generate a corresponding number of clusters; calculate the average intra-cluster variation coefficient and the minimum inter-cluster variation coefficient, and then calculate the average intra-cluster variation The difference between the coefficient and the minimum inter-cluster variation coefficient, compare the difference with the set value, if the difference is less than the set value, merge the two clusters with the smallest inter-cluster variation coefficient; until the difference is greater than or equal to If the value is set, the clustering result will be output.

Description

technical field [0001] The invention relates to a clustering method, system and medium for automatically confirming the number of clusters based on the coefficient of variation. Background technique [0002] With the rapid development of information technology, many industries, such as commerce, enterprises, scientific research institutions and government departments, have accumulated massive amounts of data stored in different forms. These massive amounts of data often contain various useful information. It is difficult to obtain these information only relying on the query and retrieval mechanism of the database and statistical methods, so data mining technology is also developing rapidly. Clustering analysis technology is an important research field in data mining and has been widely used in many applications. , including pattern recognition, data analysis, image processing, and market research. [0003] Clustering analysis technology is an unsupervised learning method, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62
CPCG06F18/23213
Inventor 刘腾腾曲守宁张坤杜韬王凯郭庆北朱连江王钦
Owner UNIV OF JINAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products