Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Big data clustering method based on decomposition and composition

A clustering method and big data technology, applied in database model, relational database, electronic digital data processing and other directions, can solve the problems of high dimension, difficult internal model of big data, and large amount of big data.

Active Publication Date: 2014-09-24
广东唯审信息科技有限公司
View PDF6 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big data; and the presence of noise attributes and noise sample points in big data also makes it even more difficult to mine the internal model of big data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Big data clustering method based on decomposition and composition
  • Big data clustering method based on decomposition and composition
  • Big data clustering method based on decomposition and composition

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049] A decomposition-combination clustering method for big data. Firstly, the big data is divided horizontally and vertically; then, the category label of each data subset is obtained, and then the category label of the entire data set is obtained by using the combination clustering method. The specific implementation steps are as follows:

[0050] 1) Horizontal segmentation. Use random sampling to horizontally split the big data, that is, randomly draw 10% of the sample size to obtain the data subset D i , repeated sampling with replacement r = 100 times, so that the full set of 100 data subsets is D.

[0051] 2) Vertical segmentation. Using random sampling, for each data subset D i Carry out vertical segmentation, that is, randomly extract 10% of the attributes to obtain the data subset D ij , repeated sampling with replacement c=100 times, making 100 data subsets D ij The complete set is D i .

[0052] 3) Obtain category labels for subsets of data. Use K-means for...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a big data clustering method based on decomposition and composition. The method includes the steps of transversely segmenting a data set to obtain a plurality of data subsets , transversely segmenting each transverse data subset to obtain a plurality of longitudinal data subsets, and obtaining classification tags of the data subsets obtained through transverse segmentation and longitudinal segmentation by using a basic clustering algorithm, compositing and clustering the classification tags of the longitudinal data subsets, and compositing and clustering the classification tags of the transverse data subsets again to obtain a complete classification tag of the data set. By means of the big data clustering method, the problem of big data clustering is converted into the composition clustering problem, and the big data clustering method has the advantages of having efficiency and robustness, being capable of being parallelized and the like. The big data clustering method is suitable for big data clustering and is particularly suitable for the file classification field, the customer segmentation field, the information retrieval field and other fields.

Description

technical field [0001] The invention belongs to the field of data mining, and relates to a clustering method for data division, in particular to a combined clustering method for big data. Background technique [0002] Big data has brought unprecedented impact and challenges to people. The characteristics of big data are: Volume (mass), Velocity (high speed), Variety (variety), and veracity (authenticity). How to mine the potential value information contained in big data has become a hot issue in industry and academia. Big data has the characteristics of large data volume and high dimensionality, which makes traditional data analysis methods helpless in the face of big data; and the presence of noise attributes and noise sample points in big data also makes it even more difficult to mine the internal model of big data . Contents of the invention [0003] In view of the massive high-dimensional problems in big data clustering, the purpose of the present invention is to pro...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/2219G06F16/285
Inventor 吴俊杰伍之昂曹杰
Owner 广东唯审信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products