Data flow clustering method based on density and extension network

A data flow clustering and data flow technology, which is applied in the direction of electrical digital data processing, special data processing applications, character and pattern recognition, etc., can solve problems such as manual setting of clustering parameters, improper selection of initial centroids, and large memory consumption. To achieve the effect of reliable design principle, ensuring accuracy and high excavation efficiency

Inactive Publication Date: 2017-10-20
UNIV OF JINAN
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the present invention is to solve the defects of traditional data stream clustering algorithm, such as manually setting clustering parameters, improper selection of initial centroid, large memory consumption, and low clustering efficiency, by improving the density grid clustering algorithm based on data stream , to ensure the accuracy of clustering results, and has the advantages of good clustering effect and high mining efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data flow clustering method based on density and extension network
  • Data flow clustering method based on density and extension network
  • Data flow clustering method based on density and extension network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. The following embodiments are explanations of the present invention, but the present invention is not limited to the following embodiments.

[0037] This embodiment presents a data stream clustering method based on density and extended grid. First, the data stream that has been accumulated for a period of time is processed at the online layer to obtain the initial clustering result, and the grid density is recalculated according to the update information of the data stream. Continuously update the clustering results. First assume that the data set (S 1 , S 2 ,...,S d ) has a d-dimensional attribute, and the data space S=S 1 ×S 2 ×...×S d is the d-dimensional data space. x=(x 1 , x 2 ,...,x d ) represents the set of data points on the data space S at time t. Take each dimension S of the data space i (1≤i≤d) is divided into p parts, equa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a data flow clustering method based on density and an extension network. A Spark parallel computing platform is used for analyzing and improving a traditional data flow clustering algorithm, the data flow clustering algorithm based on density and the extension grid is provided, so that defects of the method for manually setting a clustering parameter are improved, and clusters in any shape can be acquired. The algorithm comprises the basic steps as follows: 1, local density of each sampling point and distances with the other sampling points are used for determining the quantity of cluster centers in a grid, the cluster centers are automatically determined, and influence on a clustering result due to improper selection of an initial centroid is avoided; 2, data points outside the grid are clustered by expanding the network, so that clusters in the grid are expanded, and clustering accuracy is ensured; 3, adjacent density estimation and grid boundary are introduced for combining the grids, so that memory consumption is saved; and 4, an attenuation factor is used for updating the grid density in real time, and reflecting an evolution process of a space dataflow.

Description

technical field [0001] The invention belongs to the technical field of data mining, and in particular relates to a data stream clustering method based on density and extended grid. Background technique [0002] With the rapid development of hardware technology, network communication technology, various sensing devices, and various information technologies, in social networks, sensor networks, e-commerce, network monitoring, meteorological environment monitoring, and financial retail enterprises and other application fields, emerging A large amount of real-time dynamic data is called streaming data. [0003] Different from traditional data types, streaming data has the characteristics of large data volume, infinite or unknown length, fast dynamic change, unstable rate, and high cost of accessing historical data. It is very difficult to use traditional data mining algorithms to process streaming data, so clustering algorithms based on data streams emerge as the times require....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/2465G06F18/23213
Inventor 杜韬华峥牟国栋曲守宁张坤朱连江王钦
Owner UNIV OF JINAN
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products