Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Data counting and deduplication method, system, server and storage medium

A technology for data counting and data storage, applied in the field of big data, can solve problems such as low accuracy, achieve the effects of improving accuracy, reducing the probability of data manslaughter, and improving the efficiency of duplication checking

Active Publication Date: 2022-03-22
WUHAN DOUYU NETWORK TECH CO LTD
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] In view of this, the embodiment of the present invention provides a method, system, server, and storage medium for deduplication of data counting, and the existing deduplication method has a low accuracy rate.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data counting and deduplication method, system, server and storage medium
  • Data counting and deduplication method, system, server and storage medium
  • Data counting and deduplication method, system, server and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0028] see figure 1 , a schematic flowchart of a data counting and deduplication method provided by an embodiment of the present invention, including the following steps:

[0029] S101. After receiving the deduplication call request from the client, use the dubbo component to perform load balancing, so as to assign servers to perform deduplication processing.

[0030] The client can provide a local service for the user, and can request a deduplication service from the server. The client may refer to a deduplication request program on the client computer, capable of invoking a deduplication component on the server side. After receiving the request, the server will verify the legitimacy of the request, and then distribute the server through load balancing of dubbo components. The dubbo component is a distributed service framework, which can provide transparent RPC (Remote Procedure Call) remote service invocation, and has a soft load balancing and fault tolerance mechanism. S...

Embodiment 2

[0043] exist figure 1 on the basis of combining figure 2 Step S102 is described in detail, that is, to create a deduplication service data storage unit, as follows:

[0044] figure 2 The flowchart of step S102 provided for the embodiment of the present invention includes steps S1021, S1022, S1023, and S1024, and the above steps do not imply the sequence of execution.

[0045] In step S1021, by parsing the request parameters, the database name, partition data, and deduplication level can be obtained.

[0046] Before redis storage, you need to query the storage component redis to determine whether it has been stored, so as to avoid repeated data storage and occupy memory. Specifically, by obtaining the data name and partition data content in the request parameter, and then comparing it with the data traversal in the redis storage component, the interference can be eliminated through step S1022.

[0047] When there is no corresponding database name and partition data, creat...

Embodiment 3

[0052] exist figure 1 on the basis of combining image 3 The process of creating the deduplication calculation unit in step S103 is described in detail as follows:

[0053] After parsing the application request parameters, it is necessary to obtain the set deduplication level parameters in step S103. The specific implementation process is performed in S301 and S302 through the Bloom Filter algorithm for deduplication counting. For example, when the deduplication level is level 1, calculate the hash value of a group of deduplication data, and find the corresponding redis storage unit according to the hash value result. Bitmap, and query in the bitmap, if it does not exist, then set 1 with the value of 0 in the corresponding bitmap bit, add the data to the storage unit of the deduplication result, and return the deduplication result. Each time the query result is returned according to the query process, if any bit returns a value of 0, it indicates that the query data does not...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a data calculation and deduplication method, system, server and storage medium, which are suitable for data deduplication in big data. The method provided by the present invention includes: receiving a call request, using dubbo components to perform load balancing; parsing the request, and creating a corresponding number of redis data storage bitmaps on the server according to the preset deduplication level parameters in the request; Obtain the deduplication content parameter and the deduplication level parameter in the request, and calculate the deduplication result through the Bloom Filter algorithm. When the deduplication level is greater than 1 and the deduplication result returns a value of 0, recalculate a Group hash function, and deduplication through Bloom Filter algorithm again. In the present invention, the dubbo component is used for load balancing, and at the same time, according to the preset deduplication level, the count deduplication of the corresponding level is performed through the Bloom Filter algorithm, which can not only ensure that the data is processed efficiently and quickly, but also greatly reduce the probability of data miskilling and improve deduplication accuracy.

Description

technical field [0001] The invention relates to the field of big data, in particular to a data counting and deduplication method, system, server and storage medium. Background technique [0002] With the popularity of the Internet, network data has shown exponential growth, and the huge amount of data is a major test for deduplication technology. For the counting of data such as user visits, user comments, and user speeches, the traditional simple group counting is obviously difficult to apply to tens of millions or hundreds of millions of data. [0003] At present, the Bloom Filter algorithm is often used for counting and deduplication of such huge data, using multiple hash functions and bitmap storage to achieve the purpose of data deduplication, but this method has data miskilling, resulting in a low deduplication accuracy rate, which is difficult to guarantee The results are reliable. Contents of the invention [0004] In view of this, embodiments of the present inve...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F9/54G06F9/50G06F16/2453G06F16/27
CPCG06F9/5083G06F9/547
Inventor 王毅张文明陈少杰
Owner WUHAN DOUYU NETWORK TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products