Chinese web page text deduplication system and method
A text and webpage technology, applied in the field of Chinese webpage text deduplication system, can solve the problems of wasting user time, wasting search engine resources, reducing retrieval efficiency, etc., to avoid waste of storage space, ensure uniqueness, improve retrieval accuracy and The effect of retrieval efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0041] In order to have a more specific understanding of the technical content, characteristics and effects of the present invention, now in conjunction with the illustrated embodiment, the details are as follows:
[0042] Such as figure 1 As shown, the Chinese web page text deduplication system of the present invention mainly includes two parts: an index server and a retrieval server, wherein:
[0043] Index server, used to calculate the digital signature of Chinese web pages. The index server further includes a webpage text preprocessing module, a combined characteristic sentence extraction module and a digital signature calculation module. The web page text preprocessing module is used to normalize the webpage text to be determined sent by the retrieval server; the combined feature sentence extraction module is used to extract the combined feature sentence of the text processed by the web page text preprocessing module; the digital signature calculation module It is used ...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com