Structured data deduplication method and device based on MapDB, equipment and medium
A structured data and serialization technology, applied in the field of data processing, can solve problems such as data integration failure, data integration job exception, system downtime, etc., and achieve the effect of data integration
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0041] This embodiment provides a method for deduplication of structured data based on MapDB. With the help of the MapDB database, the data to be processed is cached on the disk, and the memory mapping technology is used to directly find the corresponding address on the disk from the memory mapping address. At the same time of reading speed, get rid of the limitation of memory capacity, and avoid the problem of data integration failure when the amount of data exceeds the memory. At the same time, when the data to be processed is cached, the processing mechanism of secondary storage is adopted. Based on the principle of memory-mapped files, the linear space in the disk is mapped through the two-level index, and the temporary file of the operating system is used as the physical storage medium of the two-level index. It can not only increase the efficiency of data calling and processing, but also trigger the recycling mechanism to process temporary files when the program is abnorm...
Embodiment 2
[0075] This embodiment corresponds to the method for deduplication of structured data based on MapDB in embodiment 1, and discloses a device for deduplication of structured data based on MapDB, which is the virtual device structure of the above embodiment 1, such as Figure 4 shown, including:
[0076] A data acquisition module 410, configured to acquire messages and deduplication conditions of the messages;
[0077] The data deduplication module 420 is configured to generate a first Key value according to the deduplication condition to traverse the primary index, and if the first Key value is found in the Key-Value record stored in the primary index, the The message corresponding to the first key value is deduplicated with the message; otherwise, the global pointer is self-incremented, and the self-incremented global pointer is used as the first Value associated with the first Key value, and the first key The value and the associated first Value are stored in the primary ind...
Embodiment 3
[0080] Figure 5 A schematic structural diagram of an electronic device provided by Embodiment 3 of the present invention, such as Figure 5 As shown, the electronic device includes a processor 510, a memory 520, an input device 530, and an output device 540; the number of processors 510 in the electronic device may be one or more, Figure 5 Take a processor 510 as an example; the processor 510, memory 520, input device 530 and output device 540 in the electronic device can be connected by bus or other methods, Figure 5 Take connection via bus as an example.
[0081] The memory 520, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as program instructions / modules corresponding to the MapDB-based structured data deduplication method in the embodiment of the present invention (for example, based on The data acquisition module 410 and the data deduplication module 420 in the structured data deduplicati...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com