Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and device for text clustering and electronic device

A text clustering and clustering technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems that cannot be separated from word segmentation, affect clustering speed, accuracy and recall, and achieve accuracy And the effect of high recall rate, fast speed and simple steps

Inactive Publication Date: 2017-06-13
HUBEI UNIV OF ARTS & SCI
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For languages ​​such as Chinese and Uyghur, the support of word segmentation is often inseparable, and the corresponding accuracy and speed of word segmentation will also follow, which will eventually affect the speed, accuracy and recall of clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for text clustering and electronic device
  • Method and device for text clustering and electronic device
  • Method and device for text clustering and electronic device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0046] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. The components of the embodiments of the invention generally described and illustrated in the figures herein may be arranged and designed in a variety of different configurations.

[0047] Accordingly, the following detailed description of the embodiments of the invention provided in the accompanying drawings is not intended to limit the scope of the claimed invention, but merely represents selected embodiments of the invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts belong to the protection scope of the present invention.

[0048]It should be noted that like numer...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a method and device for text clustering and an electronic device. The method for text clustering comprises the steps of combining a plurality of original document sets of different themes into a document union set; arranging documents in the document union set in an ascending order, and obtaining an ascending order document union set; sequentially calculating similarity of the first document in the ascending order document union set and all document after the first document; if the similarity is larger than or equal to a first threshold, assigning the document and the first document to a class; if the similarity is smaller than the first threshold, marking the document as a non-classified document; sequentially executing similarity calculation and classification of the first document in the ascending order document union set and all non-classified documents after the first document. According to the method and device for text clustering and an electronic device, operations of word segmentation and feature extraction and the like are avoided, steps are simple, the accuracy rate is high, and the method and device for text clustering and the electronic device have language irrelevance and suitable for text clustering of various languages. Besides, the clustering speed and precision can be flexibly adjusted to meet different actual requirements.

Description

technical field [0001] The present invention relates to the technical field of text mining, in particular to a text clustering method, device and electronic equipment. Background technique [0002] As the name implies, clustering is the process of dividing the entire data set into several groups according to certain characteristics and rules. Elements within each group have high similarity in certain characteristics, while elements between groups have greater similarity in these characteristics. The resulting groups are a cluster, also often referred to as a "cluster". Currently, text clustering methods include partition clustering, hierarchical clustering, density-based clustering, semantic-based clustering, and clustering based on various model theories. [0003] Most of the above clustering methods need word segmentation or feature item support, so feature selection or dimensionality reduction is an important research content. For languages ​​such as Chinese and Uyghur,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30G06K9/62
CPCG06F16/355G06F18/22
Inventor 谷琼王贤明宁彬王毅丁函曹文平吴钊华丽胡春阳屈俊峰
Owner HUBEI UNIV OF ARTS & SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products