A data deduplication method

A data and data block technology, applied in the direction of electrical digital data processing, digital data information retrieval, special data processing applications, etc., can solve the problem of low detection rate, improve the detection rate, reduce the amount of data analysis and storage space The effect of occupation

Active Publication Date: 2019-07-23
GEOVIS CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This detection method is too simple and the detection rate is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A data deduplication method
  • A data deduplication method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] The present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments, wherein the schematic embodiments and descriptions are only used to explain the present invention, but not as improper limitations to the present invention.

[0025] The system structure to which the method of the present invention is applied includes an interface server and a plurality of database servers, the interface server is responsible for managing the storage of data files, and the database server is used for actually storing data. In order to store massive amounts of data, the preferred solution of the present invention is to use 256 database servers. Of course, this is for large-scale data storage systems. If the user is a small business, in order to reduce costs, multiple servers can also be considered Merged into one, thereby reducing the number of database servers.

[0026] On the basis of above-mentioned system structure, the basi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data duplicate removal method. The method comprises the steps of classifying data blocks based on the last bytes of the data blocks, and setting a database server for performing processing and storage corresponding to each type of data block; setting a minimal data block length by an interface server, and for data files needed to be subjected to duplicate removal, if thefile length is smaller than the minimum length, directly sending the data files to the database servers corresponding to the data blocks; otherwise, performing block segmentation on the data files byusing different trail bytes; in six block segmentation modes with maximum block numbers, selecting two block segmentation modes with maximum repeated data quantities by the interface server, and instructing the corresponding database servers to perform storage; for the repeated data blocks, only storing a pointer by the database servers, wherein the pointer points to the stored same data blocks; and for the non-repeated data blocks, storing the whole data blocks and the hash values of the data blocks.

Description

[0001] 【Technical field】 [0002] The invention belongs to the field of computers and databases, and in particular relates to a data deduplication method. [0003] 【Background technique】 [0004] In recent years, in order to process a large amount of information, the concept of big data has emerged. The so-called big data refers to a collection of data that cannot be captured, managed and processed by conventional software tools within an affordable time frame. , high growth rates and diverse information assets. [0005] Due to the massive nature of data, it is difficult for people to analyze these data on their own. However, under the backdrop of technological innovation represented by cloud computing, these data that were difficult to collect and use have become easy to use Yes, through continuous innovation in all walks of life, big data will gradually create more value for human beings. [0006] However, although there are more and more computers used for big data analys...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/215G06F16/2455
CPCG06F16/215G06F16/24556
Inventor 王焰辉李振钊曾刚
Owner GEOVIS CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products