A Data Deduplication Method for Massive Image Data

A technology for deduplicating data and pictures, which is applied in the field of data storage research, can solve the problems such as poor deduplication effect of picture files, achieve the effect of solving the problem of picture quality comparison, reducing the amount of data, and improving the deduplication rate

Active Publication Date: 2018-05-08
JINAN UNIVERSITY
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Data deduplication technology can effectively reduce redundant data in the storage system, but the existing technology is not effective in deduplication of image files
After research, it is found that the main reason is that common image files are all compressed files, and the compression process will change the binary stream of the data, so that the originally redundant data is no longer redundant

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Data Deduplication Method for Massive Image Data
  • A Data Deduplication Method for Massive Image Data
  • A Data Deduplication Method for Massive Image Data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] see figure 1 , the data deduplication process in this embodiment includes the following steps:

[0044] 1. File filtering.

[0045] (1-1) Read a file in the backup stream, and judge whether the file is a picture file according to the file extension.

[0046] (1-2) If it is not a picture file, deduplicate the file according to the general process. The general data deduplication process is: block based on the binary stream of the file, calculate the hash fingerprint of the data block, and find the fingerprint in the fingerprint database To judge whether the data block is redundant, if the data block is redundant, it will be deleted, if the data block is the only block, it will be stored in the system, and the fingerprint will be added to the fingerprint database.

[0047] (1-3) If it is a picture file, execute step (1-4).

[0048] (1-4) Calculate the size of the image file. If the file is less than 5KB, treat the file as a whole as a data block, calculate its hash fing...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a repeated data deletion method oriented to mass picture data, and relates to the field of computer memory. The repeated data deletion method comprises the steps of recognizing a file type according to a file name extension, and reading bitmap matrix data of a picture file into a memory; partitioning a bitmap matrix according to a partition algorithm based on picture size; judging a data block is a unique block or a redundant block by using a traditional unique block judging method or a similarity judging method; deleting the data block if the data block is the redundant block, selecting a corresponding picture compression algorithm to compress the data block according to the type of the primary picture file if not, and then memorizing the data block into a system. The technical scheme provided by the invention is suitable for a mass picture data deduplication environment, and the practically-memorized data volume of the mass picture data can be greatly reduced.

Description

technical field [0001] The invention relates to the field of data storage research, in particular to a method for deduplication of massive image data. Background technique [0002] With the development of computer networks, the presentation of information has gradually changed to pictures as the main and text as the supplement. At present, social networking sites, shopping sites, etc. contain a large amount of picture information, and this kind of information shows explosive growth in the network. How to effectively store and manage these picture data has become a hot spot in the storage field. [0003] Data deduplication technology can effectively reduce redundant data in the storage system, but the existing technology is not effective in deduplication of image files. After research, it is found that the main reason is that common image files are all compressed files, and the compression process will change the binary stream of data, so that the originally redundant data i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
CPCG06F16/1744G06F16/1748
Inventor 邓玉辉谢恒翔
Owner JINAN UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products