Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Data Block Routing Method Combining Fingerprint Sampling and Data Fragmentation Reduction

A technology of data blocks and fingerprints, which is applied in the direction of electrical digital data processing, special data processing applications, digital data information retrieval, etc., can solve the problems of low system throughput, poor data recovery performance, and no consideration of data fragmentation, etc., to achieve improvement Throughput, avoidance of computation and memory bottlenecks, effects of good data recovery performance

Active Publication Date: 2019-02-26
CHONGQING UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the system throughput obtained by the current stateful data routing algorithm is low. When faced with terabytes of backup data, the time for using the Bloom Filter to find duplicate data blocks needs to be calculated in hours. These two routing algorithms do not consider the data fragmentation on each data node server, resulting in poor data recovery performance of the system

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Data Block Routing Method Combining Fingerprint Sampling and Data Fragmentation Reduction
  • A Data Block Routing Method Combining Fingerprint Sampling and Data Fragmentation Reduction

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0028] The subjects involved in the present invention include a client and a data node server. The client is the receiver of the backup data stream, and the data node server is used to store all the data blocks of the backup data stream.

[0029] figure 1It is a schematic diagram of the structure of the distributed data deduplication system. The distributed deduplication system includes a client 100 and a data node server 200; the module of the client 100 is a fingerprint processing module 110, and the module of the data node server 200 is a Bloom filter search module 210, a fragment search module 220, and a deduplication module 230. The data node server 200 maintains a Bloom filter and a data fingerprint index table. Fingerprint processing module 110: use the data block variable length algorithm to block the backup data stream to obtain data blocks with a certain average length (such as an average length of 4KB), and use a hash algorithm (such as SHA-1 hash algorithm) to p...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention proposes a data block routing method for a distributed deduplication system that combines fingerprint sampling and data fragmentation reduction. This method proposes to sample data block fingerprints, and use Bloom filter to quickly find duplicate data for the sampled data block fingerprints, and estimate the data repetition rate. Fragmentation and storage space usage to determine routing nodes. Compared with the existing routing methods, the method can improve the deduplication rate of the whole system, the throughput rate of the system and the speed of data recovery.

Description

technical field [0001] The invention belongs to the technical field of computer information storage, and in particular relates to a data block routing method combined with fingerprint sampling and data fragment reduction in a distributed duplicate data deletion system. Background technique [0002] With the advent of the big data era, the explosive growth of data volume has higher and higher requirements for data backup performance. When the existing single-server deduplication backup system deals with massive data, the amount of data that can be stored is limited, the scalability is poor, the throughput of the system is low, and the overall performance of the system is relatively poor. Using a distributed deduplication backup system can effectively solve the problems of a single server backup system. [0003] In a distributed deduplication system, data routing is the key to global deduplication. Data routing mainly solves the problem of how data blocks are routed to each ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): H04L29/08G06F16/174
CPCH04L67/1095G06F16/1752H04L67/63
Inventor 谭玉娟王奏鸣晏志超
Owner CHONGQING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products