Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A graph data storage and query method for large-scale social networks

A social network and data storage technology, applied in the direction of memory address/allocation/relocation, etc., can solve the problem that the server program cannot proxy data requests from different maps, cannot effectively support graph computing, and the graph database is difficult to access and update graph data, etc. question

Active Publication Date: 2017-11-14
INST OF INFORMATION ENG CHINESE ACAD OF SCI +1
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The data division of Redis is completed by the client, and the server program cannot proxy the request of different map data
[0009] From the above research work, it can be seen that the data access requirements of graph computing and the characteristics of graph data pose challenges to the storage of graph data. Traditional data storage methods such as file systems or distributed file systems and graph databases with Key-Value storage models are difficult. Efficient support for graph data access and update, and thus cannot effectively support graph calculations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A graph data storage and query method for large-scale social networks
  • A graph data storage and query method for large-scale social networks
  • A graph data storage and query method for large-scale social networks

Examples

Experimental program
Comparison scheme
Effect test

example 1

[0063] Example 1 data access instance

[0064] The present invention has tested the writing performance of the present invention and comparative system Redis and Neo4j under different concurrency, and the writing performance has taken the average value under ten data sets, wherein in the test image 3 Among them, the present invention is represented by NYNN.

[0065] Depend on image 3 It can be seen that the local writing performance of the present invention is obviously better than Redis and Neo4j, and the design of metadata, the data transmission format and the mechanism of data writing affect the writing performance. First of all, the division of graph data in the present invention adopts continuous equal-width vertex intervals, and the metadata can be cached locally through the first data write request, and then the metadata cached locally can be used for data addressing, which reduces the addressing IO overhead. When writing data, write it to the local memory-mapped f...

example 2

[0074] Instance 2 data update instance

[0075] This paper tests the data updating performance of the present invention and comparative systems Redis and Neo4j, and the performance takes the average value under ten data sets.

[0076] Such as Figure 8 As shown, for the incremental update of data, the present invention adopts a chained distribution method, which can eliminate the problem of frequent data movement caused by data insertion and deletion. However, Redis and Neo4j adopt the method of sequential allocation. As the data is updated, the data needs to be moved and relocated. During this process, new query processing cannot be performed.

example 3

[0077] Example 3 Data Remote Access Example

[0078] This paper tests the performance difference of the present invention with and without prefetch mechanism. The present invention provides multiple data prefetch mechanisms, which is convenient for the upper layer application to select the best data prefetch mode based on its own application background. This experiment uses BFS prefetching. The algorithm used in the experiment is to count the total number of edges in the graph data set. The idea of ​​the algorithm is: divide the vertices of the graph into n slices, and each slice specifies a thread for processing. All threads Parallel access to graph data in the present invention. The thread puts the vertices to be visited in the access request into the queue, then processes the vertices at the head of the queue, visits the neighborhood of the vertices, and counts the number of vertices and edges. If the end point of the current edge is located in the shard to which the thre...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale social network-oriented graph data storage and query method. The data storage manager of the invention stores the received graph data in a Key-Value manner, uses the vertex ID of the graph data as the Key, and uses the vertex neighbor The field is Value; data storage for each vertex neighborhood: store multiple edges connected to the vertex neighborhood in a fixed-size memory block in order with time stamps, and form a doubly linked list, and attribute information of the vertex and index information are stored in a data structure. When the data storage manager receives an access request to visit vertex v, the data storage manager transmits the vertex v and its k-order neighbors to the requester; the requester caches the returned data locally, and the next query, first check The local cache, if there is no queried vertex, sends the access request to the data storage manager. The invention can satisfy dynamic update, and is suitable for processing sparse data scenarios and random access.

Description

technical field [0001] The invention relates to a large-scale social network-oriented graph data storage and query method, which belongs to the field of software technology. Background technique [0002] At present, the mainstream method of graph data storage is to preprocess the graph data into records of edges and vertices, and store them in large files in the distributed file system in the form of sequential data sets. When accessing graph data, the large files storing the graph data are accessed in a sequential scan. This organization method cannot provide effective data storage and access performance for multi-round iterative graph computing applications. In order to improve the access performance of graph data, the memory management technology of graph data has become an important trend, such as Trinity, Giraph, etc. [0003] Neo4j is a graph database using the Key-Value storage model. The most basic storage units are vertices and edges. When the breadth traverses BFS...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F12/06
Inventor 周薇包秀国马宏远程工冉攀峰刘春阳王卿韩冀中庞琳李雄贺敏刘玮
Owner INST OF INFORMATION ENG CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products