Attribute graph literature clustering method based on graph convolutional neural network

A technology of convolutional neural network and document clustering, applied in neural learning methods, biological neural network models, still image data clustering/classification, etc. Effect

Pending Publication Date: 2021-07-23
BEIJING UNIV OF TECH
View PDF0 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] Aiming at the problems existing in the above-mentioned prior art, the present invention proposes an attribute graph document clustering method based on a graph convolutional neural network, which is used to solve the problem of lack of utilization of document citation relations in the process of document clustering, and it can deal with From the unbalanced cluster structure in real graph data, learn node features that are friendly to clustering tasks, and estimate the number of clusters in graph data according to node features, and realize parameter-free attribute graph clustering

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Attribute graph literature clustering method based on graph convolutional neural network
  • Attribute graph literature clustering method based on graph convolutional neural network
  • Attribute graph literature clustering method based on graph convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] This paper takes Cora, Citeseer and Pubmed as examples to verify the effectiveness of the method. Firstly, the document attribute graph data is constructed with the above three databases. Document attribute graph can be expressed as G=(A,X), where A is an adjacency matrix, if document v i with v j There is a reference relationship between them, then A ij = 1, otherwise A ij =0. X is the document attribute matrix, the i-th row vector x in X i contains references to literature v i A description of the content. The construction method of X is as follows: (1) Eliminate function words in document documents, namely adverbs, prepositions, conjunctions, auxiliary words, etc. (2) Eliminate words whose frequency is less than 10. (3) Construct the word vector feature of each document with the remaining vocabulary, if the jth vocabulary is in the document v i appears in , then x ij = 1, otherwise x ij =0. The parameters of the constructed document attribute map are as fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an attribute graph literature clustering method based on a graph convolutional neural network, and belongs to the field of graph data mining. Specifically, literature attribute graph feature learning is carried out by using a cross-layer linked graph convolutional neural network; estimating an optimal cluster number from the node features by using a deep clustering estimation model; alternately executing the two steps to complete training; utilizing the trained model to obtain the characteristics of all to-be-clustered literature attribute graph nodes and the estimated number of clustering clusters; and taking the characteristics and the estimated number of the clustering clusters as input, and obtaining a clustering result of the literature attribute graph by using a k-means clustering method. When a cross-layer linked graph convolutional neural network is trained, a self-separation regularization item based on node pairwise similarity is adopted, so that the characteristics of nodes in the same cluster are similar and the characteristics of nodes in different clusters are far away, and the performance of graph clustering is effectively improved. And the clustering estimation module realizes data-driven clustering cluster number estimation, so that the whole system is more suitable for a real data environment without labels.

Description

technical field [0001] The invention belongs to the field of graph data mining, and in particular relates to a graph convolutional neural network-based attribute graph document clustering method. Background technique [0002] Attribute graph clustering is a basic task in the field of graph data mining. Its purpose is to divide the nodes in the graph into mutually disjoint clusters according to node attributes and graph structure information. Compared with traditional graph clustering methods that only use graph structure information, attribute graph clustering is more suitable for scenarios where nodes have rich content information. Attribute graph clustering has a wide range of practical applications in the fields of community discovery, protein functional module detection, and financial network fraud detection. [0003] A large number of graph clustering works based on deep models have been proposed. Compared with shallow graph clustering methods, deep methods are better...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/55G06K9/62G06N3/04G06N3/08
CPCG06F16/55G06N3/08G06N3/045G06F18/22G06F18/213G06F18/23213
Inventor 冀俊忠梁烨雷名龙
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products