Replicated data deleting method based on file content types

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of deduplication and content type, which is applied to the redundant data error detection in computing, digital data processing, special data processing applications, etc. It can solve problems such as single block strategy and inability to optimize file content type. , to achieve the effect of improving the overall performance

Inactive Publication Date: 2010-05-12

HUAZHONG UNIV OF SCI & TECH

View PDF0 Cites 155 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] The present invention provides a deduplication method based on file content type, which solves the problem that the existing deduplication method has a single block strategy and cannot be optimized according to the file content type

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0045] The present invention will be further described below in conjunction with the accompanying drawings.

[0046] likefigure 1 As shown, the present invention performs the block boundary feature calculation step in advance, and the following sequence includes the content type identification step, the file block step, the digital fingerprint calculation step, the repeated data block judgment step and the end step.

[0047] An example of a complete flow for a content-type-based deduplication approach is given below:

[0048] Perform block boundary feature calculation steps in advance, including the following sub-steps:

[0049] A. Generate a sample file collection in the storage pool: extract the backup file collection generated by the backup process performed on September 30, 2009 from the backup system, a total of 14427 files, as a sample file collection, and put them into the storage pool;

[0050] B. Classification of sample files: Extract the metadata of each sample fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a replicated data deleting method based on file content types, which belongs to the replicated data deleting method of computer data backup, is applicable to disk-based backup systems, and solves the problems that the existing replicated data deleting method is single in block strategies and can not carry out optimization according to the file content types. The deleting method carries out a block boundary characteristic calculation step in advance, and then comprises the following steps sequentially: content type identification, file blocking, digital fingerprint calculation, replicated data block judgment and ending. The deleting method carries out classification on backup files based on content types, computes the optimal block boundary characteristic value aiming at every content type; and when the backup files are processed, the file content type identification step is added, and the block boundary characteristic is selected according to identification result, therefore, the overall effectiveness of the replicated data deleting method is improved when the complex backup files are processed.

Description

technical field [0001] The invention belongs to a method for deleting duplicate data of computer data backup, in particular to a method for deleting duplicate data based on file content type (Content Type), which is suitable for a backup system based on a disk. Background technique [0002] After entering the 21st century, with the acceleration of the information age, data has shown a trend of explosive growth, user storage capacity is becoming increasingly tight, data management is becoming increasingly difficult, and storage expenditures are gradually increasing. In order to deal with these problems, a data deduplication technology is proposed to effectively reduce the repeated data in the user's daily backup, so that the backup data is greatly reduced, thereby saving the storage capacity for the user and reducing the difficulty of data management. Many storage vendors have launched backup systems or software based on data deduplication, such as EMC's Avamar Data Store bac...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30G06F11/14

Inventor 周敬利秦磊华曾东聂雪军刘科朱建峰

Owner HUAZHONG UNIV OF SCI & TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Replicated data deleting method based on file content types

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology