Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Differential distinguishability k prototype clustering method based on MapReduce

A clustering method and differential technology, applied in special data processing applications, instruments, data mining, etc., can solve problems such as lack of theoretical basis, difficulty in the level of privacy protection, and the inability of traditional data processing models to meet big data computing, and achieve guarantees Security and utility, the effect of improving the efficiency of data processing

Active Publication Date: 2019-12-27
BEIHANG UNIV +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, there are two problems in the above privacy protection method: (1) it has background-related dependence, that is, it assumes a certain attack model or background knowledge of the attacker; (2) it lacks a strict theoretical basis, which proves that the level of privacy protection is very high difficulty
[0005] In summary, in big data analysis, hybrid data clustering methods are likely to cause privacy leaks, and traditional data processing models cannot meet the needs of big data computing.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Differential distinguishability k prototype clustering method based on MapReduce
  • Differential distinguishability k prototype clustering method based on MapReduce
  • Differential distinguishability k prototype clustering method based on MapReduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] A kind of differential discriminability k-prototype clustering method based on MapReduce proposed by the present invention needs to solve the following two problems: the first, how to apply differential discriminability to big data clustering to realize data privacy protection; second , How to deploy the differential discriminability k-prototype clustering method to the big data platform.

[0034] Set forth the concrete implementation method of the present invention in two parts below:

[0035] 1. MapReduce framework

[0036] The MapReduce framework on the Hadoop big data platform is implemented as open source, adopts the Master / Slave (M / S) architecture, and is built on top of the distributed file system. The computing work of the MapReduce framework has the following characteristics: the work task can be divided into multiple sub-tasks, and these sub-tasks are relatively independent and have no constraints on each other, and can be completed in parallel. After the sub...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a differential distinguishability k prototype clustering method based on MapReduce. The differential distinguishability k prototype clustering method comprises the following steps: step 1, preprocessing an input data set D; step 2, carrying out task setting of a MapReduce framework; 3, determining a local center point set Q in each Map task; 4, determining the clustering number k according to the local center point set Q; step 5, setting parameters of a differential distinguishability implementation mechanism; step 6, dividing each data record of the data set D1 into corresponding clusters; 7, calculating a new round of clustering center point; step 8, comparing the central points of the two rounds of clustering; and 9, dividing a data set D1 according to the finalclustering center point. According to the method, a simple parameterization method is provided for practitioners of big data mining; the data processing efficiency is improved, and the safety and effectiveness of the data can be ensured.

Description

technical field [0001] The invention relates to a differential discriminability k-prototype clustering method based on MapReduce, which belongs to the technical field of network space security. Background technique [0002] Data mining is an efficient and in-depth data analysis technology under the background of big data. It has absorbed a large number of technologies in many application fields such as machine learning, database, and statistics, and has quickly become a research hotspot in all walks of life. As an important direction of data mining, cluster analysis is widely used in various scenarios. Clustering can design different algorithms according to the characteristics of data sets and specific analysis tasks. According to the type of processing objects, clustering algorithms can be divided into three categories: numerical data clustering algorithms, categorical data clustering algorithms, and hybrid data clustering algorithms. Data Clustering Algorithms. Most clus...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/62G06F16/2458
CPCG06F21/6245G06F2216/03G06F16/2465
Inventor 尚涛赵铮姜亚彤张锋杨英刘建伟
Owner BEIHANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products