Adjacent sorting repetition-reducing method based on Map-Reduce and segmentation
A technology of word segmentation and adjacency, which is applied in the fields of instruments, calculations, electrical digital data processing, etc., and can solve problems such as the inability to efficiently process massive amounts of information
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0038] The present invention will be further described below in conjunction with accompanying drawing and specific embodiment
[0039] In the data deduplication method, the data set to be deduplicated is called a record set, and each record in the record set contains multiple pieces of field information. The general steps of the deduplication method are to compare the records pair by pair, and compare the similarity of the records to determine whether the records are duplicated. In the implementation of the deduplication method, the top layer is the deduplication method framework, and the middle is the deduplication method to judge whether two records are the same, and the similarity between records depends on the matching of fields between records. The deduplication method consists of these three Each pair of records must involve these three levels when performing similarity comparison. This method focuses on the two parts of the deduplication method framework and the fiel...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com