Similar web page duplicate-removing system based on parallel programming mode
A programming mode and webpage technology, which is applied in the direction of instruments, calculations, electrical digital data processing, etc., can solve problems such as misjudgment and low accuracy, and achieve the effects of avoiding judgment bias, improving efficiency, and optimizing space for time
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0030] The present invention will be described in further detail below in conjunction with the accompanying drawings and examples.
[0031] Generally speaking, the detection and deduplication of similar web pages includes the following steps: (1) first extract some features of the web page; (2) then encode or quantify the features for fast calculation; (3) then use the encoding (4) Finally, if a large-scale calculation is required, a high-performance algorithm must be used based on a high-performance computing platform to achieve large-scale calculation. high speed requirements.
[0032] like figure 1 As shown, the similar webpage deduplication system based on the parallel programming mode provided by the present invention includes a webpage content preprocessing module 100, a webpage feature vector extraction module 200, a webpage feature fingerprint calculation module 300, a webpage fingerprint online deduplication module 400, and a webpage fingerprint distribution module. ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com