Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text clustering intelligent evaluation method based on hybrid clustering

A text clustering and clustering method technology, applied in the field of text clustering, can solve the problems of slow running speed, redundant feature words, and large impact on the quality of document sets.

Active Publication Date: 2021-08-20
SOUTH CHINA AGRI UNIV
View PDF7 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] At present, text clustering in the prior art mainly uses the K-means algorithm to directly cluster the original text, which will cause a large number of redundant feature words, and the running speed is slow, the clustering effect is poor, and the Defects such as large impact on set quality

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text clustering intelligent evaluation method based on hybrid clustering
  • Text clustering intelligent evaluation method based on hybrid clustering
  • Text clustering intelligent evaluation method based on hybrid clustering

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0064] Such as figure 1 As shown, this embodiment provides a method for intelligent evaluation of text clustering based on hybrid clustering, including the following steps:

[0065] S1: Perform data preprocessing on the original text set X: including word segmentation, removal of stop words, etc., to obtain all feature words D in the original text set;

[0066] S2: Perform the first feature selection, that is, delete the feature words with particularly high and low document frequency (DF) for D according to the set ratio, and obtain the feature subset D′ after rough selection, reduce feature redundancy, and reduce feature words by reducing Feature redundancy can reduce feature dimension and improve clustering accuracy. In this example, the maximum DF is set to 0.15, and the minimum DF is set to 0.0002.

[0067] S3:: use the TF-IDF method to calculate the corresponding weights of all texts in the original text set X, and express all the texts in the original text set X as vec...

Embodiment 2

[0185] This embodiment provides a computer-readable storage medium. The storage medium can be a storage medium such as ROM, RAM, magnetic disk, and optical disk. The storage medium stores one or more programs. When the programs are executed by the processor, the embodiment is realized. 1's intelligent evaluation method for text clustering based on hybrid clustering.

Embodiment 3

[0187] This embodiment provides a computing device, and the computing device may be a desktop computer, a notebook computer, a smart phone, a PDA handheld terminal, a tablet computer or other terminal devices with a display function, and the computing device includes a processor and a memory, the memory stores one or more programs, and when the processor executes the programs stored in the memory, the hybrid clustering-based text clustering intelligent evaluation method of Embodiment 1 is implemented.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a text clustering intelligent evaluation method based on hybrid clustering. The method comprises the following steps: preprocessing a text set to obtain all feature words in an original text set; deleting high-frequency and low-frequency feature words in the text set to obtain a pre-selected feature subset; calculating corresponding weights of all texts in the original text set by adopting a TF-IDF method; performing binary coding on each feature word in the feature subset, and generating a matrix for the feature words after text feature pre-selection; setting an adaptive value function, performing feature reselection on the feature subset, and selecting a global optimal individual with an optimal adaptive value; decoding the global optimal individual to obtain a final feature subset T; using a TF-IDF method to express a vector on the T, performing normalization and standardization, using a hybrid clustering method to perform text clustering, and selecting a final result of clustering; and generating a word cloud for each cluster in the clustering result. The method has the advantages of being good in clustering effect and small in calculation amount.

Description

technical field [0001] The invention relates to the technical field of text clustering, in particular to an intelligent evaluation method for text clustering based on hybrid clustering. Background technique [0002] With the rapid development of big data and Internet technology, text information on the Internet is flooding people's field of vision. However, there is a huge amount of text data with complex semantics on the Internet, which makes it difficult to find a lot of useful information, and some poor-quality texts are not eliminated. In the face of massive amounts of information, how to dig out the precise text information that users need from these data, and how to eliminate poor-quality texts have become very important research contents with broad application prospects. [0003] At present, text clustering in the prior art mainly uses the K-means algorithm to directly cluster the original text, which will cause a large number of redundant feature words, and the runn...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/216G06K9/62G06N3/00
CPCG06F16/355G06F40/216G06N3/006G06F18/23213
Inventor 李康顺雷逸舒郑明坤张海信魏航唐威钱冠如
Owner SOUTH CHINA AGRI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products