Rapid text clustering method on large corpus
A text clustering and corpus technology, applied in the field of relational databases, can solve problems such as large documents, unsatisfactory convergence speed, and sparseness
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0038] For the convenience of description, we refer to the fast text clustering method with index optimization as IGSDMM in the following.
[0039] Two data sets will be taken as examples to introduce the advantages of the present invention over existing clustering algorithms. The introduction of the dataset is as follows:
[0040] NG20. The dataset contains 18,846 documents from 20 mainstream western newsgroups. This is a classic way to measure text clustering algorithms. The average length of documents in NG20 is 137.85, and the average number of words is 91.
[0041] Tweets. The dataset consists of 2472 tweets and is associated with 89 queries. The relationship between tweets and queries is annotated by humans. The average length of tweets is 8.56, and the average number of words is 7.
[0042] Normalized mutual information (NMI) is widely used to measure the quality of clustering results. NMI measures the statistics shared between random variables representing clus...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com