Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Short text clustering method and terminal device

A clustering method and short text technology, applied in the field of information processing, can solve the problems of different clustering results, difficult clustering convergence, wrong results, etc., and achieve the effect of stable clustering results

Active Publication Date: 2019-05-21
HEBEI UNIV OF ENG
View PDF11 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In view of this, the embodiment of the present invention provides a short text clustering method and terminal equipment to solve the problem that clustering in the prior art depends on the selection of initial cluster centers and initial division, resulting in clustering results that may be different from data sets True distribution of samples, get wrong results, or problems that make clustering hard to converge

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Short text clustering method and terminal device
  • Short text clustering method and terminal device
  • Short text clustering method and terminal device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0049]In the following description, specific details such as specific system structures and technologies are presented for the purpose of illustration rather than limitation, so as to thoroughly understand the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the invention may be practiced in other embodiments without these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

[0050] In order to illustrate the technical solutions of the present invention, specific examples are used below to illustrate.

[0051] The embodiment of the present invention provides a short text clustering method, such as figure 1 As shown, the method includes the following steps:

[0052] Step 101, preprocessing the short text set to obtain all the texts in the short text set.

[0053] Opti...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is applicable to the technical field of information processing, and provides a short text clustering method and a terminal device, and the method comprises the steps: carrying out the preprocessing of a short text set, and obtaining all texts in the short text set; Calculating the similarity between all the texts and other texts according to all the texts; Determining a clustering center of all the texts according to the similarity between all the texts and other texts; And performing clustering processing on all the texts according to the clustering center. According to the embodiment of the invention, the problem that clustering depends on selection and initial division of an initial clustering center in the prior art, so that a clustering result may be different from realdistribution of data set samples, an error result is obtained, or the clustering is very difficult to converge can be solved.

Description

technical field [0001] The invention belongs to the technical field of information processing, and in particular relates to a short text clustering method and a terminal device. Background technique [0002] With the rapid development of mobile Internet and information technology, the number of short texts such as comments and microblogs has shown explosive growth. Short text data is scarce and text features are sparse. Effective short text representation methods are needed to improve the effects of short text clustering and hotspot discovery. [0003] Usually, the clustering algorithm used in short text clustering is K-Means algorithm. The K-means algorithm takes the number of clusters K as a parameter, and divides n samples into K clusters that are mutually disjoint. The similarity of samples in the same cluster is higher, and the similarity of samples in different clusters is lower. The commonly used similarity judgment is to calculate the Euclidean distance between sam...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27G06K9/62
Inventor 吴迪杨瑞欣生龙马建飞黄竹韵张梦甜孙雷
Owner HEBEI UNIV OF ENG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products