Text data clustering method, device and equipment based on non-parametric VMF hybrid model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A hybrid model and text data technology, applied in text database clustering/classification, unstructured text data retrieval, electrical digital data processing, etc., can solve the problems that are difficult to converge, difficult to determine the convergence state, and unable to obtain analytical solutions, etc. question

Active Publication Date: 2020-09-01

HUAQIAO UNIVERSITY

View PDF3 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The discrete nature of the Pitman-Yor process model cannot be represented intuitively

[0006] 3. The Gibbs sampling algorithm used to solve the model parameters cannot obtain an analytical solution, and it is not easy to converge and it is difficult to determine the convergence state

The concentration parameter in the prior art 2 uses an asymptotic approximation method to obtain an estimated value, but this estimation method cannot effectively deal with high-dimensional data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0098] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0099] see figure 1 , the first embodiment of the present invention provides a text data clustering method based on a non-parametric VMF mixed model, which can be performed by a text data clustering device based on a non-parametric VMF mixed model (hereinafter referred to as a clustering device), and at least include:

[0100] S101. Acquire a text data set to be clustered; wherein, the text data set includes a plurality of texts, and each text is expressed as...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a text data clustering method, device and equipment based on a non-parametric VMF hybrid model. The method comprises the steps of S101 obtaining a to-be-clustered text data set, wherein the text data set comprises a plurality of texts, and each text is expressed as a D-dimensional text vector feature by using a word frequency-inverse text frequency index standardization method; S102 performing modeling on each text by using a non-parametric VMF hybrid model based on a Pitman-Yor process; S103 estimating model parameters of the non-parametric VMF hybrid model through a variational Bayesian inference algorithm; S104 judging whether the non-parametric VMF hybrid model is converged or not according to the deduced model parameters; if not, returning to the step S103, andif yes, executing the step S105; and S105 judging the category of each text according to the posterior probability of the indication factor so as to cluster the texts according to the categories. According to the invention, algorithm convergence can be ensured and the convergence state can be effectively detected.

Description

technical field [0001] The invention relates to the field of text mining, in particular to a text data clustering method, device and equipment based on a non-parametric VMF mixed model. Background technique [0002] With the rapid development of the Internet and the widespread use of news documents, text data clustering, as one of the most useful tasks in text mining, has received increasing attention in recent years. [0003] In prior art 1, Zhong Wenliang et al. proposed a method for clustering unbalanced text data based on the Pitman-Yor process. In this method, each text is represented by a TF (term frequency, term frequency) vector, each attribute of the vector represents the frequency of a specific term (term) appearing in the document, and all terms in each category obey The same multinomial distribution (Multinomial Distribution). This method uses the Polya urn model to build a Pitman-Yor process model based on multinomial distribution, and uses the Gibbs sampling ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F16/35G06F40/216G06K9/62

CPCG06F16/35G06F40/216G06F18/24155

Inventor 范文涛侯文娟

Owner HUAQIAO UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Text data clustering method, device and equipment based on non-parametric VMF hybrid model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology