Short text clustering analysis method, device and terminal device

A cluster analysis and short text technology, applied in the field of text analysis, can solve the problem of low accuracy, and achieve the effect of improving efficiency and accuracy

Active Publication Date: 2019-02-01
HEBEI UNIV OF ENG
View PDF6 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the embodiment of the present invention provides a short text clustering analysis method, device and terminal equipment to solve the problem of low accuracy when traditional topic clustering methods in the prior art perform emotional topic clustering of short texts. question

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering analysis method, device and terminal device
  • Short text clustering analysis method, device and terminal device
  • Short text clustering analysis method, device and terminal device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0051] see figure 1 , provides a schematic flow diagram of an embodiment of a short text clustering analysis method, described in detail as follows:

[0052] Step S101, obtaining a short text data set to be clustered, and performing preprocessing on the short text data set to obtain an initial word set including at least three parts of speech.

[0053]Short texts are composed of multiple parts of speech words to express emotional information. When analyzing short texts, it is necessary to split the short text data set into word sets including multiple parts of speech, and remove the words that have little impact on emotional information. Words with low frequency etc. Specifically, this embodiment can divide the short text into several words through the word segmentation algorithm, and can delete word stems, stop words, and words with low document frequency through the word filtering method. The purpose of this step is to reduce the dimensionality of the data set Denoising, t...

Embodiment 2

[0118] Corresponding to the short text clustering analysis method described in the first embodiment above, Figure 5 shows the structural block diagram of the short text clustering analysis device in Embodiment 2 of the present invention. For ease of description, only the parts related to this embodiment are shown.

[0119] The device includes: a preprocessing module 110 , a feature extraction module 120 , a knowledge pair determination module 130 and a topic clustering module 140 .

[0120] The preprocessing module 110 is used to obtain the short text data set to be clustered, and perform preprocessing on the short text data set to obtain an initial word set including at least three parts of speech.

[0121] The feature extraction module 120 is used to perform feature extraction on the initial word set to obtain a feature word set including a topic feature word set and a topic associated word set.

[0122] The knowledge pair determination module 130 is used to determine a p...

Embodiment 3

[0130] Figure 6 It is a schematic diagram of the terminal device 100 provided in Embodiment 3 of the present invention. Such as Figure 6 As shown, the terminal device 100 described in this embodiment includes: a processor 150, a memory 160, and a computer program 161 stored in the memory 160 and operable on the processor 150, such as a short text clustering analysis method program of. When the processor 150 executes the computer program 161, it realizes the steps in the above-mentioned embodiments of each short text clustering analysis method, for example figure 1 Steps S101 to S104 are shown. Alternatively, when the processor 150 executes the computer program 161, it realizes the functions of the modules / units in the above-mentioned device embodiments, for example Figure 5 The functions of modules 110 to 140 are shown.

[0131] Exemplarily, the computer program 161 can be divided into one or more modules / units, and the one or more modules / units are stored in the memor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention is applicable to the technical field of text analysis, and provides a short text clustering analysis method, a device and a terminal device. The method comprises the following steps: acquiring a short text data set to be clustered, and preprocessing the short text data set to obtain an initial word set including at least three parts of speech; The initial word set is extracted to obtain a feature word set including a topic feature word set and a topic related word set. The preset number of subject feature words and subject related words are determined according to the relevance of subject feature words and subject related words. The subject feature words and subject related words correspond one by one to form knowledge pairs. The preset number of knowledge pairs is input intothe LDA for clustering and the emotional theme of the short text data set to be clustered is determined. The invention optimizes the text analysis algorithm, can more accurately carry out the emotional theme clustering of the short text, and improves the efficiency of the short text clustering.

Description

technical field [0001] The invention belongs to the technical field of text analysis, and in particular relates to a short text cluster analysis method, device and terminal equipment. Background technique [0002] With the popularity of the Internet, chat software such as Weibo, forums, and blogs based on the Internet have produced a large number of short texts with subjective emotions, and these short texts carry a large amount of user information and data information. Due to the characteristics of short texts such as semantic sparseness and high dimensionality, there is an urgent need for effective short text clustering algorithms to cluster and analyze these information to improve the performance of short text clustering, sentiment analysis, and semantic analysis in the field of Internet public opinion. Apply effects. [0003] In recent years, experts and scholars at home and abroad have carried out in-depth research on short text clustering algorithms, and proposed many...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F17/27
CPCG06F40/289
Inventor 吴迪杨瑞欣生龙马建飞黄竹韵张梦甜孙雷
Owner HEBEI UNIV OF ENG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products