Document clustering method based on distribution-convergence model

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A document clustering and clustering technology, which is applied in the cross-technology application field of data mining and knowledge system to achieve the effect of clarifying the knowledge context, improving computing efficiency and reducing time overhead.

Inactive Publication Date: 2016-02-17

YANCHENG INST OF TECH

View PDF3 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The technical problem to be solved by the present invention is: to provide a document clustering method based on the distribution-convergence model, and to use the distribution-convergence model to construct a co-occurrence matrix, which solves the problem that the limited memory of a single computing node is difficult to store and process a large matrix. Difficulties such as clustering or low clustering efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0020] The implementation of the present invention will be described in detail below. The implementation is exemplary and only used to explain the present invention, but not to limit the present invention.

[0021] The document clustering method based on the distribution-convergence model of the present invention comprises the following steps:

[0022] Step 1. Construction method of co-occurrence matrix based on distribution-convergence model

[0023] The present invention first proposes a co-occurrence matrix construction method based on the distribution-convergence model: the distribution-convergence model is used to count the co-occurrence frequency of knowledge attributes in pairs, and combined with the hash graph to construct the co-occurrence matrix, which solves the problem of the limited memory of a single computing node. Problems such as the inability to cluster or the reduction in clustering efficiency caused by storing and processing large matrices.

[0024] (1) Co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a document clustering method based on a distribution-convergence model. The document clustering method comprises the steps of firstly, giving out a co-occurrence matrix establishing method based on the distribution-convergence model, counting co-occurrence frequency of knowledge property by utilizing the distribution-convergence model, and establishing a co-occurrence matrix by combining a hash map; secondly, establishing a nearness degree matrix by combining the co-occurrence matrix with a nearness degree coefficient; thirdly, standardizing the nearness degree matrix; finally, clustering the nearness degree matrix by using a sum of squares method, and realizing efficient fine-grit clustering on knowledge. The document clustering method disclosed by the invention can be applied to a network document knowledge base for document clustering, so that relatively ideal clustering accuracy and relatively ideal calculating efficiency can be achieved, fine-frit document clustering can be realized, and meanwhile, the time cost is reduced.

Description

technical field [0001] The invention relates to a knowledge clustering method, in particular to a document clustering method based on a distribution-convergence model, which belongs to the cross-technical application field of data mining and knowledge systems. Background technique [0002] Generally, the network document knowledge base system mainly classifies the stored documents according to the subject category, but does not carry out finer-grained classification according to the knowledge field under the subject. The coarse-grainedness of this classification makes learners prone to cognitive trek and knowledge overload in the process of document retrieval and reading. Integrating and counting knowledge objects according to their attributes through document clustering can not only classify documents in a more detailed manner, clarify the knowledge context for learners, improve the efficiency of document research, but also reveal the potential of knowledge development laws...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F17/30

CPCG06F16/35

Inventor 李益娟李永萍徐小龙徐友武

Owner YANCHENG INST OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Document clustering method based on distribution-convergence model

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology