Document similarity calculating method and similar document whole-network retrieval tracking method
A technology of document similarity and calculation method, applied in unstructured text data retrieval, calculation, text database indexing and other directions, can solve problems such as no good, and achieve the effect of improving retrieval efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0033] figure 1 It is a system architecture diagram of the document similarity calculation method in this embodiment. In this embodiment, the document similarity calculation method includes:
[0034] (1) Data preparation-ETL
[0035] Collect the media data of the whole network in real time, and clean the interference information through the "ETL data cleaning system". While the data is purified, the news manuscripts are structured, decomposed into the structure of the smallest unit, and a set of word segmentation is obtained, which is called the data atomization process. .
[0036] (2) Infrastructure construction - ElasticSearch full-text index + Chinese word segmentation
[0037] The ElasticSearch search engine is used as the basic component of the whole system, and the later algorithms are based on ES. ElasticSearch is a distributed multi-user full-text search engine based on Lucene. The scalability of distributed storage can effectively solve the storage problem of mass...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com