Privacy protection clustering method for big data analysis and computer storage medium

A technology of privacy protection and clustering method, which is applied in the field of privacy protection clustering method and computer storage medium, which can solve the problems of large random noise, privacy leakage of big data clustering mining, affecting the quality of clustering results, etc., and achieve high clustering Availability, effects of good clustering quality

Inactive Publication Date: 2019-10-15
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Purpose of the invention: The technical problem to be solved by the present invention is to provide a privacy-preserving clustering method and computer storage medium for big data analysis, which solves the problem that traditional privacy budget allocation easily leads to excessive random noise, thereby affecting the quality of clustering results , improved the privacy budget allocation method of the differential privacy-preserving clustering algorithm, and proposed a differential privacy budget allocation method, which improves the availability of clustering results under the same degree of privacy protection, and solves the privacy problem in big data cluster mining leak problem

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Privacy protection clustering method for big data analysis and computer storage medium
  • Privacy protection clustering method for big data analysis and computer storage medium
  • Privacy protection clustering method for big data analysis and computer storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The flow chart of the method in this embodiment is as follows figure 1 As shown, the specific steps are as follows:

[0032] Step 1, the existing Image.csv dataset, which comes from the clustering dataset of the School of Computer Science, University of Eastern Finland (http: / / cs.joensuu.fi / sipu / datasets / ). Record the data set as D, the number of records N in the data set is 34112, and the data dimension d is 3, that is, each piece of data has 3 attributes. The total privacy budget ε controls the degree of privacy protection. The smaller ε is set, the greater the added noise and the higher the degree of privacy protection. Here, the total privacy budget ε is set to 0.8, the number of clusters k is 3, and each piece of data can be regarded as a sample point in the k-dimensional space. Normalize each dimension of the data set D to [0,1].

[0033] Data normalization is to scale each dimension of data to [0,1], which is performed by the following formula:

[0034]

...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a privacy protection clustering method for big data analysis and a computer storage medium. The method comprises the following steps: normalizing data and selecting a central point; calculating a minimum privacy budget and distributing a privacy budget sequence, dividing a sample point to a nearest center point, generating Laplace noise, adding noise to parameters in the process of updating the center point, and performing continuously iterating until the difference of error quadratic sums of two adjacent iterations is smaller than a threshold value or the maximum iteration frequency is reached. According to the method, the sensitive information in the data set is protected by adding the noise obeying the Laplace distribution to the intermediate parameter in the clustering algorithm execution process; the problem that sensitive information of a data set is leaked in the execution process of the clustering algorithm is solved, the privacy budget allocation mode of the differential privacy protection clustering algorithm is improved, the availability of clustering results is improved under the same privacy protection degree, and the privacy leakage problem inbig data clustering mining is solved.

Description

technical field [0001] The invention relates to a privacy protection clustering method and a computer storage medium, in particular to a privacy protection clustering method and a computer storage medium for big data analysis. Background technique [0002] At present, data mining has been paid more and more attention by people. Using machine learning algorithms to mine and analyze massive data can obtain a lot of valuable new knowledge and new laws. As a commonly used method in the field of data mining, cluster analysis is widely used in scenarios such as data preprocessing, target group classification, pattern recognition, and image segmentation. K-means is the simplest, most effective and most used algorithm in big data clustering analysis. However, during the execution of the algorithm, it is necessary to calculate the number of samples of each cluster and the sum of each attribute when updating the centroid. These operations will leak the data set. Sensitive information...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06K9/62G06F21/62
CPCG06F21/6245G06F18/23213
Inventor 徐小龙范泽轩孙雁飞
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products