Similar web page duplicate-removing system based on parallel programming mode
A programming mode, web technology, applied in instruments, calculations, electrical and digital data processing, etc., can solve problems such as low accuracy and misjudgment, and achieve the effect of improving efficiency, avoiding judgment deviation, and high efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0030] Below in conjunction with accompanying drawing and example the present invention is described in further detail.
[0031] Generally speaking, detection and deduplication of similar web pages includes the following steps: (1) first extract some features of the web page; (2) then encode or quantize the features for fast calculation; (3) then encode (4) Finally, if large-scale calculations are required, a high-performance algorithm must be used on a high-performance computing platform to achieve large-scale calculations. high speed requirements.
[0032] Such as figure 1 As shown, the similar webpage deduplication system based on parallel programming mode provided by the present invention includes a webpage content preprocessing module 100, a webpage feature vector extraction module 200, a webpage feature fingerprint calculation module 300, a webpage fingerprint online deduplication module 400 and a webpage fingerprint distribution Type batch deduplication module 500.
...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com