Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Blended data clustering method based on density searching and rapid partitioning

A technology of mixed data and clustering methods, applied in the field of data clustering, can solve the problems of inability to determine whether the distance calculation method is reasonable, the accuracy is unstable, and the distance calculation method of mixed data type data cannot be directly and effectively processed.

Active Publication Date: 2015-05-13
ZHEJIANG UNIV OF TECH
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Aiming at several problems in the processing of mixed attribute data in the existing clustering: (1) It is impossible to directly and effectively deal with the distance calculation method of mixed data type data; (2) It is impossible to determine whether the distance calculation method is reasonable, and there is no corresponding evaluation method; ( 3) Traditional density-based data has high computational complexity and unstable accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Blended data clustering method based on density searching and rapid partitioning
  • Blended data clustering method based on density searching and rapid partitioning
  • Blended data clustering method based on density searching and rapid partitioning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0075] This embodiment takes the research object of "catalog marketing" (catalog market) of marketing, and the mixed data that needs to be clustered is customer information, that is, the collection of all customer information is used as the data set to be clustered. Each piece of customer information includes numerical attribute information such as age, income, and online duration, as well as classified attribute information such as gender, constellation, and consumer variety, using a hybrid data aggregation based on density search and fast division in this embodiment. The class method clusters all customer information, and then according to the clustering results, recommends specific products to different categories of users, and regularly releases marketing strategies such as similar people to buy items.

[0076] The hybrid data clustering method based on density search and fast division in this embodiment, such as figure 1 shown, including:

[0077] S1: Determine the domin...

Embodiment 2

[0152] The clustering method of this embodiment is completed based on the following experimental platform: the experimental platform includes a PC, the operating system is Windows 7, and the integrated development environment is Microsoft Visual C++2010. The hardware conditions are: CPU is Intel Core I52.6GHz, memory is 4GB.

[0153] In order to verify the performance of the new algorithm PSO-PD_HDC (that is, the hybrid attribute data clustering algorithm based on density search and fast partition), five real data sets are used, which are all from UCI and its learning library (Machine Learning Repository ), the specific information is shown in Table 3.

[0154] table 3

[0155]

[0156] The clustering method (PSO-PD_HDC clustering), IWKM algorithm, SBAC algorithm, K-prototypes algorithm and KL-FCM-GM algorithm of this embodiment are used to cluster the above data sets respectively.

[0157] Among them, the parameters in the experiment are set as α1=α2=1.8, the inertia wei...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a blended data clustering method based on density searching and rapid partitioning. The blended data clustering method is characterized by comprising the following steps of determining a domination type of blended data in a blended attribute dataset; calculating the distance between any two blended data in the blended dataset according to the domination type of the blended data; optimizing the clustering radius within the preset clustering radius value range on the basis of a density searching algorithm according to the distance between the any two blended data, and using a corresponding clustering result corresponding to the optimal clustering radius as the final clustering result. According to the method, the domination analyzing method is executed on the blended data to determine the special type of the blended data, different distance calculation methods are adopted for different blended data, the importance of data dimension information with the domain attribute in overall data information can be effectively brought into play, and the data distance can be accurately calculated; the data clustering algorithm based on density searching and rapid partitioning is adopted, speed is high, and accuracy is high.

Description

technical field [0001] The invention relates to the technical field of data clustering, in particular to a hybrid data clustering method based on density search and fast division. Background technique [0002] With the continuous development of communication technology and hardware equipment, data mining technology has great application prospects in real-time monitoring systems, meteorological satellite remote sensing, network traffic monitoring, etc. In view of the characteristics of rapid, continuous arrival and continuous growth of data, traditional clustering algorithms cannot Applicable to data objects, the data puts forward new requirements for clustering algorithms as follows: 1. No need to assume the number of natural clusters; 2. Can find clusters of any shape; 3. Have the ability to deal with outliers. Moreover, most of the data in reality is mixed attribute data, which includes both numerical attribute data and categorical attribute data. How to effectively mine v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 陈晋音何辉豪杨东勇陈军敢卢瑾顾东袁张健
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products