Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Mixed attribute feature large data set clustering algorithm based on CSA

A technology of attribute characteristics and big data, which is applied in the field of clustering algorithm of large data sets based on CSA mixed attribute characteristics, which can solve the problems of sensitive prototype initialization, slow convergence speed, and prone to premature phenomena, etc.

Inactive Publication Date: 2017-12-26
九次方大数据信息集团有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Similar to the k2 prototype algorithm, this new algorithm is also sensitive to prototype initialization and is easy to fall into local extreme points. Therefore, a clustering method based on genetic algorithm (GA, GeneticAlgo2rithm) was proposed, although this method can converge with a higher probability To the global optimal point, but the convergence speed is slow, and it is prone to premature phenomenon

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Mixed attribute feature large data set clustering algorithm based on CSA
  • Mixed attribute feature large data set clustering algorithm based on CSA
  • Mixed attribute feature large data set clustering algorithm based on CSA

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations. Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without cre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a mixed attribute feature large data set clustering algorithm based on a CSA. The algorithm comprises the following steps: S1, an antibody community is initialized; S2, clone operation is carried out; S3, immunization gene operation is carried out; S4, clone selection operation is carried out; S5, clone death operation is carried out; S6, single-step operator iteration is carried out; and S7, antibody encoding is carried out, and steps from the second to the sixth are repeated until a clustering archetype is converged to an optimal solution. The clonal selection algorithm (CSA) is used for large data set clustering analysis. The clustering performance is evaluated in the large data set, and an experiment result shows that the method can effectively find out the clustering structure in the data. When clustering analysis is carried out on a large amount of data with numerical values and generic mixed features, the algorithm based on the CSA is quick in convergence speed and does not depend on selection of the initial archetype, and convergence to the global optimal solution with the possibility of 1 can be realized.

Description

technical field [0001] The present invention relates to a large data set clustering algorithm, in particular to a CSA-based mixed attribute feature large data set clustering algorithm. Background technique [0002] In the prior art, in data mining, we often encounter and analyze a large amount of data with numerical and categorical characteristics. However, most of the existing classification algorithms can only deal with numerical feature data or categorical feature data alone, but cannot analyze data with two mixed attributes. [0003] Dividing a sample set into various classes is a basic operation in data mining, and has been widely used in many tasks, such as classification (unsupervised), aggregation, division or dissection, etc. Clustering is a very popular approximate partition method. It divides a group of samples into several categories, so that under a certain standard, the samples in the same category are close to each other, and the differences between samples ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/12G06K9/62
CPCG06N3/126G06F18/23
Inventor 张汉青陶长连郑建全
Owner 九次方大数据信息集团有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products