Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Data desensitization device based on distributed cluster

A distributed cluster, data desensitization technology, applied in the field of data desensitization devices, can solve the problems of consuming large CPU time, slow extraction and loading performance, limited memory, etc., to improve data extraction performance, achieve high-speed data extraction, Realize the effect of high-speed loading

Pending Publication Date: 2020-06-16
SHANGHAI SNC NET INFORMATION TECH CO LTD
View PDF0 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Existing solution 1 will lead to the following problems: JDBC can extract and load data, but when the amount of data in a single table reaches 100 million levels, the performance of extraction and loading will be very slow, and query timeout may even occur, making it impossible to complete unloading. Disadvantages of sensitive tasks
[0008] The existing solution 2 will lead to the following problems: because the CPU and memory of a single machine are limited, if the database to be desensitized has thousands of tables and the total amount exceeds TB level, it may cause memory overflow and the CPU cannot handle it The problem
[0009] Most of the current production environment is to desensitize the entire database or the entire collection. To desensitize the massive data of multiple tables at the same time, it is necessary to obtain the massive data of multiple tables to execute the desensitization algorithm. Some desensitization algorithms are more complicated and consume more CPU. resources, so the execution of the desensitization server is a CPU-intensive task, because CPU resources are limited, if the number of multi-threads started is not limited, CPU context switching will occur frequently, consume a lot of CPU time, and may also cause the CPU to be too late to process Data and data are backlogged in the pipeline (the pipeline uses a memory-based queue Array Blocking Queue), resulting in JVM memory overflow
The CPU of a single server is limited, such as a 16-core CPU, it is more appropriate to use 16*2 threads, but for the amount of data above the TB level, the CPU and memory of a single server are obviously not enough to handle

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data desensitization device based on distributed cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0023] figure 1 It is a schematic diagram of desensitization by a distributed cluster-based data desensitization device in an embodiment of the present invention.

[0024] See figure 1 , the data desensitization device based on the distributed cluster of the embodiment of the present invention includes a master server, a thread master scheduler and a plurality of slave servers, and each of the slave servers is provided with a thread scheduler to allocate threads from the server , the master server slices each source data table that needs to be desensitized in the database and puts the table slice into the fragmentation queue of the source data table; The allocation defines the number of thread pipeline groups and the number of threads. The thread master scheduler schedules threads from the server's thread scheduler to pull data from the fragmentation ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data desensitization device based on a distributed cluster. The device comprises a master server, a thread master scheduler and a plurality of slave servers, each slave server is provided with a thread scheduler for thread distribution of the slave servers, and the master server slices each source data table needing to be desensitized in a database and puts the table slices into a slice queue of a source data table; and the master server allocates a defined thread pipeline to each source data table through a thread total scheduler, and the thread total scheduler schedules a thread through a thread scheduler of slave servers to pull data from the fragmentation queue for desensitization and then loads the data to a target data table. According to the invention, thethread scheduler of the slave servers is coordinated through a thread total scheduler, so that dynamic thread allocation is realized and the loading performance is improved; due to distributed clustersetting of the master server and the slave servers, the extension performance is good; and high-speed data extraction is realized through table data fragmentation.

Description

technical field [0001] The invention relates to a data desensitization device, in particular to a distributed cluster-based data desensitization device. Background technique [0002] Data desensitization refers to the transformation of certain sensitive information through desensitization rules to achieve reliable protection of sensitive private data. This allows safe use of masked real-world datasets in development, testing, and other non-production environments, as well as outsourced environments. A large amount of sensitive information in a relational database needs to be desensitized. [0003] There are two existing desensitization methods: [0004] Solution 1: Use a simple JDBC method to desensitize. [0005] Solution 2: Use a single machine to perform data desensitization of multiple tables. [0006] The existing desensitization methods have the following problems: [0007] Existing solution 1 will lead to the following problems: JDBC can extract and load data, bu...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F21/62G06F9/48G06F9/50
CPCG06F21/6245G06F9/4881G06F9/5027
Inventor 程永新宋辉郭振宇
Owner SHANGHAI SNC NET INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products