Mixed data stream clustering method based on merging and pruning

A technology of data flow clustering and data flow, which is applied in the fields of cluster analysis and data flow mining, which can solve the time cost and space cost of maintenance, restrict the application of data flow clustering, and only focus on numerical data or classified data, etc. question

Inactive Publication Date: 2021-04-20
ZHEJIANG GONGSHANG UNIVERSITY
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

In some pruning algorithms that have been proposed, the pruning time depends on the value of the input parameters, resulting in high computational cost and memory usage
[0005] To sum up, there is an urgent need for an efficient hybrid data stream clustering method to reduce the computational cost and memory cost of hybrid data stream clustering. There are some problems in the current data stream clustering: (1) Traditional data stream clustering Most studies only focus on numerical data or categorical data, not both, which restricts the application of data stream clustering
(2) The maintenance of micro-clusters in the online stage of data stream clustering costs a lot of time and space.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed data stream clustering method based on merging and pruning
  • Mixed data stream clustering method based on merging and pruning
  • Mixed data stream clustering method based on merging and pruning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Specific embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0044] Such as figure 1 As shown, a mixed data flow clustering method based on merging and pruning uses an important measurement criterion to change the classification attribute value into a numerical attribute, normalize the data, and then use the principal component analysis method to analyze the data Dimensionality reduction. The hybrid data stream clustering method employs an online / offline two-stage processing framework. In the online stage, a new micro-cluster eigenvector is used as the data structure to store the summary information of the data flow, and the summary information of the data flow required in the offline stage is dynamically maintained thr...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a mixed data stream clustering method based on merging and pruning, which comprises the following steps: converting a classification attribute value into a numerical attribute by using an important measurement criterion, normalizing data, and then reducing the dimension of the data by using a principal component analysis method. The mixed data stream clustering method adopts an online / offline two-stage processing framework. In the online stage, a new micro-cluster feature vector is adopted as a data structure to store data flow summary information, the data flow summary information required in the offline stage is dynamically maintained through a micro-cluster merging algorithm and a micro-cluster pruning algorithm, and the evolution process of the data flow is accurately reflected. In the offline stage, a density peak clustering method is adopted, the micro-clusters are used as virtual objects for clustering, and a final clustering result is obtained.

Description

technical field [0001] The invention relates to the technical fields of data stream mining and cluster analysis, in particular to a clustering method for mixed data streams based on merging and pruning. Background technique [0002] In today's era, people generate a variety of data streams through the use of the Internet. Data streams often have the characteristics of infinite, continuous, fast arrival, concept drift, etc. These characteristics make data stream mining face great challenges. In practical applications, the data that needs to be analyzed is often unlabeled, and the cost of obtaining data stream class labels is very high. Therefore, data stream clustering, as an unsupervised learning algorithm, has attracted extensive attention from researchers and has become an important topic in this context. a research hotspot. [0003] So far, most studies on data stream clustering have only focused on numerical data or categorical data, but not both. Mixed data streams a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35
Inventor 王幸达庄毅黄智浩
Owner ZHEJIANG GONGSHANG UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products