Fast global k-means clustering method accelerated using opencl

A technology of K-means and clustering methods, applied in the field of data processing, can solve problems such as code inability to port, limit application range, and inability to accelerate in parallel, and achieve the effects of overcoming poor portability, saving storage space, and increasing load

Active Publication Date: 2020-04-14
XIDIAN UNIV
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The disadvantage of this method is that the parallelized K-means code implemented by the Spark platform cannot be transplanted on various hardware devices that support the open computing language OpenCL, and the portability is not high. It is only suitable for distributed systems and cannot Parallel acceleration on GPU, FPGA, MIC and other hardware devices greatly limits its application scope
However, this method still has the disadvantage that it needs to store a large amount of intermediate calculation results when accelerating the algorithm on the heterogeneous device side. When the input image data is too large, it exceeds the storage capacity of the heterogeneous device, resulting in This method cannot realize the processing of larger image data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fast global k-means clustering method accelerated using opencl
  • Fast global k-means clustering method accelerated using opencl
  • Fast global k-means clustering method accelerated using opencl

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The present invention will be further described below in conjunction with the accompanying drawings.

[0035] The invention utilizes OpenCL hardware equipment, an open computing language, and adopts a fast global K-means clustering algorithm to realize.

[0036] refer to figure 1 , further describe the implementation steps of the present invention.

[0037] Step 1, read in the dataset and the total number of clusters.

[0038] Read in the data set stored in a two-dimensional matrix. The rows of the matrix represent the number of data, and the columns represent the attributes of the data.

[0039] Read in the total number of clusters.

[0040] Step 2, transpose the data set of the two-dimensional matrix.

[0041] Copy the two-dimensional matrix in the data set to the global memory of the hardware device.

[0042] Each thread is responsible for a data point in the two-dimensional matrix of the data set, calculates the index of the data point of each thread in the mat...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fast global K-means clustering method using OpenCL acceleration, the steps of realization are: (1) read in the data set and the total number of clusters; (2) transpose the data at the OpenCL hardware device end; (3) )Calculate the centroid as the first initial clustering center point; (4) cluster using the K-means algorithm; (5) select a new initial clustering center point on the OpenCL hardware device side; (6) judge the current clustering center Whether the total number of points is less than or equal to the total number of clusters, if so, execute step (4), if not, execute step (7); (7) output the clustering result. The invention can realize real-time processing of massive clustering data on any hardware device supporting the open computing language OpenCL.

Description

technical field [0001] The invention belongs to the technical field of data processing, and further relates to a Fastglobal K-means clustering method accelerated by OpenCL (Open Computing Language) hardware equipment in the technical field of data mining. The invention can realize the parallel accelerated fast global K-means clustering method, and can realize the real-time processing of massive data on the open computing language OpenCL hardware device. Background technique [0002] The fast global K-means algorithm uses a deterministic method that does not depend on any initial parameter value instead of random search to obtain the initial cluster center, which solves the problem of poor stability of the clustering results of the traditional K-means clustering method based on local search. And by optimizing the calculation method of global K-means to increase the new cluster center, the running time is reduced. [0003] TCL Group Co., Ltd. disclosed a patent document "A me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/62G06K9/00
CPCG06V10/94G06F18/23213
Inventor 朱虎明钱新宇焦李成王坤缑水平田小林张小华马晶晶马文萍
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products