Adjacent sorting repetition-reducing method based on Map-Reduce and segmentation
A word segmentation and adjacency technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as inability to efficiently process massive information, and achieve the effect of improving the efficiency of deduplication.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0038] The present invention will be further described below in conjunction with the drawings and specific embodiments
[0039] In the data deduplication method, the data set that needs to be deduplicated is called a record set, and each record in the record set contains multiple pieces of field information. The general step of the deduplication method is to compare the records in pairs and compare the similarity of the records to judge whether the records are duplicates. The top layer in the implementation of the deduplication method is the deduplication method framework. The middle is the deduplication method to determine whether two records are the same. The similarity between the records depends on the matching of the fields between the records. The deduplication method is composed of these three Level composition, each pair of records must involve these three levels when comparing similarity. This method focuses on two parts: the de-duplication method framework and the fiel...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com