Characteristic quantification method of graininess-variable text cluster

A text clustering and quantitative method technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as insufficient sensitivity of language phenomena, errors in clustering results, and difficulty in capturing common features of documents, etc., to achieve improved Effects of semantic sensitivity, improvement of clustering F value, and reduction of impact on clustering efficiency

Inactive Publication Date: 2009-05-20
HARBIN INST OF TECH
View PDF0 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The word frequency VSM modeling method commonly used by many text clustering methods is in the case of large-grained clustering, because it is not sensi...

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Characteristic quantification method of graininess-variable text cluster
  • Characteristic quantification method of graininess-variable text cluster
  • Characteristic quantification method of graininess-variable text cluster

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] Combine below Figure 1 to Figure 3 The present invention is described further with specific embodiment:

[0031] The method of the present embodiment is realized through the following steps:

[0032] 1. The concept extension of document keywords (such as figure 1 shown). Using HowNet, the keyword set in the document is expanded into another set of concept words with higher semantic coverage. For example, if one of "flowers", "orchids", "rhododendrons", "camellias", "roses", "daffodils", "chrysanthemums", "petunias", "phlox", "rushes" occurs in the document , since it can be regarded as a kind of flower, it can be mapped to the word "flower";

[0033] Although on the surface, the semantic expansion of words maps words from a set with a large number of elements to another set with a small number of elements, which will lose some of the individual character information of words, but through extension, it captures the The similar or related features between words refl...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a variable granularity text clustering characteristic quantification method, which is realized by the following steps: firstly, concept expansion of keywords of a file, namely a keyword set in the file is expanded into a concept word set with higher semantic covering capacity by utilization of a knowledge network; secondly, calculation of characteristic representation and similarity, namely the similarity between words can be comprehended as the overlap ratio of common characteristics, and the similarity between files which apply text clustering can also be judged by examining the number of the common characteristics between the files; and thirdly, achievement of the effect of variable granularity clustering through combined use of the variable granularity text clustering characteristic quantification technology and detailed clustering algorithms. The variable granularity text clustering characteristic quantification method overcomes the defect of poor clustering effect under the condition of variable granularity clustering due to inappropriate characteristic quantification of the prior file clustering system.

Description

(1) Technical field [0001] The invention relates to a feature quantization technology of variable granularity text clustering. (2) Background technology [0002] In recent years, a surprising number of text documents have become readily available from a variety of sources. There is thus growing interest in developing technologies that can help users navigate, organize, and summarize this textual information efficiently. High-quality text clustering techniques play an important role in achieving this goal. By organizing a large amount of information into a small number of meaningful clusters, people can observe the data from a macro perspective. This technology provides navigation and browsing mechanisms that greatly improve retrieval performance. [0003] Text clustering for Internet applications has become a technology that is rising and quickly recognized by the market. For example, the core text clustering technology (http: / / vivisimo.com / ) used in VIVISIMO, a clusteri...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30
Inventor 刘远超刘铭王晓龙
Owner HARBIN INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products