The invention discloses an extendible repeated
data detection method, belongs to the technical field of computer storage, and solves the problem that in the existing repeated data detecting method, the storage capacity cannot be efficiently extended, so as to meet the requirements of the current situation that the storage demand increases and repeatedly deleted systems need upgrading and updating. The extendible repeated
data detection method comprises the following steps: partitioning
processing,
fingerprint extraction, retrieving of Bloom filters, retrieving of
fingerprint subset table, judgment of unfulfilled Bloom filters, new
fingerprint marking, judgment of
Bloom filter quantity, and extending of
Bloom filter array. In the invention, the
Bloom filter array is used to retrieve the fingerprint data, so as to quickly locate the retrieval range, improve the retrieval efficiency and realize detection on the repeated data; the extendible repeated
data detection method is high in expansibility and querying performance, can support element location and control the misjudgment rate, and further can effectively reduce the memory overhead. The Bloom filter array is composed of a series of isomorphic Bloom filters, so that once the misjudgment rate epsilon' and the pre-established retrieving fingerprint total quantity nmax are provided, the quantity of the required Bloom filters and the number of the hush functions can be worked out.