A Community Discovery Algorithm Based on Hadoop Platform

A technology of community discovery and algorithm, which is applied in the field of community discovery algorithm based on Hadoop platform, can solve problems such as not being able to adapt to community discovery algorithm, and achieve the effect of improving data processing ability, improving mining and analysis ability, and high real-time performance

Inactive Publication Date: 2019-05-31
HEBEI UNIVERSITY OF SCIENCE AND TECHNOLOGY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There are many community discovery algorithms currently, but there is no community discovery algorithm that can adapt to running on a distributed system. If the community discovery algorithm can be run on a distributed system, the real-time performance of community discovery can be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Community Discovery Algorithm Based on Hadoop Platform
  • A Community Discovery Algorithm Based on Hadoop Platform
  • A Community Discovery Algorithm Based on Hadoop Platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0029] Zachary Karate Club is a social network composed of members of a karate club in the United States. In the early 1970s, Wayne Zarchary spent two years observing the relationship between the members of a karate club in an American university. network of relationships. This club had conflicts between the supervisor and the principal on whether to raise the club's fees, which resulted in the fact that the club split into two small clubs with the principal as the core and the supervisor as the core.

[0030] The community discovery algorithm based on the Hadoop platform, the specific calculation process is:

[0031] step one:

[0032] Read in the network data and construct the undirected graph G of Zachary Karate Club. The members in the undirected graph G are represented by node n, and the relationship between members is represented by edge m; data sharding is completed on the computing cluster configured with Hadoop environment, and the data is input as Mapper. Table 1-1...

Embodiment 2

[0061] The community in this embodiment is an extremely sparse network community. Such as figure 2 shown.

[0062] The community discovery algorithm based on the Hadoop platform, the specific calculation process is:

[0063] step one:

[0064] Read in the network data and construct the undirected graph G of the community. The members in the undirected graph G are represented by node n, and the relationship between members is represented by edge m; the data fragmentation is completed on the computing cluster configured with Hadoop environment, and the data is used as Mapper input. Table 2-1 shows Mapper input of.

[0065] Table 2-1 Step 1 Mapper input data

[0066]

[0067]

[0068] As can be seen from Table 2-1, the system contains a total of 19 nodes n and 21 edges m; calculate the number D of related nodes associated with a certain node, and store the data of node n and the number D of related nodes in the data structure Hashtable , see Table 2-2. It can be see...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hadoop platform-based community discovery algorithm. The algorithm is realized under a MapReduce framework and includes the following steps that: (1) social network data are read, so that an undirected graph G can be constructed, nodes n and the number of correlative nodes are obtained through data slicing; (2) the closeness between every two nodes is calculated, and processed data are written into a file; (3) data behavior edges and closeness are divided, and a node set is completed, and the data are stored in a data structure Hashtable in the form of nodes and node set serial numbers; and (4) nodes that are not included into the node set are found out according to the data in the step 3, and the nodes that are not included into the node set are included into a node set of which the nodes which are correlated with the nodes that are not included into the node set. According to the algorithm, the data processing capability of the algorithm is obviously improved, and the operation scale of the algorithm can be as high as hundreds of millions of times, and therefore, the capability of the algorithm to mine and analyze a large-scale social network can be obviously improved, and the algorithm has high real-time performance; and a whole network is analyzed and searched, a mining mode according to mining is gradually spread out from the core of a community is adopted, and therefore, the efficiency of the community algorithm can be improved, and the accuracy of the algorithm is high.

Description

technical field [0001] The invention relates to a community discovery algorithm, in particular to a community discovery algorithm based on a Hadoop platform. Background technique [0002] A community is a subgraph composed of nodes of the same type and the connections between these nodes in the network. Automatically searching or discovering communities in the network has important practical value, such as: communities in social networks represent real social groups with common interests or similar backgrounds; communities in citation networks represent related papers in the same direction; communities in the World Wide Web It is a number of websites discussing related topics; a community in a biochemical network or an electronic circuit network is a certain type of functional unit; a community in a music forum can be a number of topics initiated by users with similar preferences, or composed of users with similar interests. user base, etc. Revealing and discovering the co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06Q50/00
CPCG06Q50/01
Inventor 张妍
Owner HEBEI UNIVERSITY OF SCIENCE AND TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products