Document clustering method based on distribution-convergence model

A document clustering and clustering technology, which is applied in the cross-technology application field of data mining and knowledge system to achieve the effect of clarifying the knowledge context, improving computing efficiency and reducing time overhead.

Inactive Publication Date: 2016-02-17
YANCHENG INST OF TECH
View PDF3 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is: to provide a document clustering method based on the distribution-convergence model, and to use the distribution-convergence model to construct a co-occurrence matrix, which solves the problem that the limited memory of a single computing node is difficult to store and process a large matrix. Difficulties such as clustering or low clustering efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Document clustering method based on distribution-convergence model
  • Document clustering method based on distribution-convergence model
  • Document clustering method based on distribution-convergence model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0020] The implementation of the present invention will be described in detail below. The implementation is exemplary and only used to explain the present invention, but not to limit the present invention.

[0021] The document clustering method based on the distribution-convergence model of the present invention comprises the following steps:

[0022] Step 1. Construction method of co-occurrence matrix based on distribution-convergence model

[0023] The present invention first proposes a co-occurrence matrix construction method based on the distribution-convergence model: the distribution-convergence model is used to count the co-occurrence frequency of knowledge attributes in pairs, and combined with the hash graph to construct the co-occurrence matrix, which solves the problem of the limited memory of a single computing node. Problems such as the inability to cluster or the reduction in clustering efficiency caused by storing and processing large matrices.

[0024] (1) Co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a document clustering method based on a distribution-convergence model. The document clustering method comprises the steps of firstly, giving out a co-occurrence matrix establishing method based on the distribution-convergence model, counting co-occurrence frequency of knowledge property by utilizing the distribution-convergence model, and establishing a co-occurrence matrix by combining a hash map; secondly, establishing a nearness degree matrix by combining the co-occurrence matrix with a nearness degree coefficient; thirdly, standardizing the nearness degree matrix; finally, clustering the nearness degree matrix by using a sum of squares method, and realizing efficient fine-grit clustering on knowledge. The document clustering method disclosed by the invention can be applied to a network document knowledge base for document clustering, so that relatively ideal clustering accuracy and relatively ideal calculating efficiency can be achieved, fine-frit document clustering can be realized, and meanwhile, the time cost is reduced.

Description

technical field [0001] The invention relates to a knowledge clustering method, in particular to a document clustering method based on a distribution-convergence model, which belongs to the cross-technical application field of data mining and knowledge systems. Background technique [0002] Generally, the network document knowledge base system mainly classifies the stored documents according to the subject category, but does not carry out finer-grained classification according to the knowledge field under the subject. The coarse-grainedness of this classification makes learners prone to cognitive trek and knowledge overload in the process of document retrieval and reading. Integrating and counting knowledge objects according to their attributes through document clustering can not only classify documents in a more detailed manner, clarify the knowledge context for learners, improve the efficiency of document research, but also reveal the potential of knowledge development laws...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/35
Inventor 李益娟李永萍徐小龙徐友武
Owner YANCHENG INST OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products