Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Large-scale graph data query method in distributed environment based on Datalog

A technology of distributed environment and query method, which is applied to the query of large-scale data. In the field of large-scale data query in the distributed environment based on Datalog, it can solve the cumbersome and inefficient writing of graph data processing scripts by users and the difficulty of large-scale data query performance. To meet application requirements and other issues, to achieve operation function optimization, optimization within rules, and achieve the effect between rules

Inactive Publication Date: 2015-03-04
PEKING UNIV
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The present invention aims at using the existing relatively mature MapReduce distributed computing framework to query large-scale data, and aims at the problems that the query performance of large-scale data under the existing framework is difficult to meet the application requirements, and the user's writing of graph data processing scripts is cumbersome and inefficient. A Datalog-based method for large-scale data query in MapReduce distributed environment
[0008] The present invention uses the existing relatively mature MapReduce distributed computing framework to query large-scale data, and the existing large-scale data management system requires users to have strong professional knowledge and the query performance of large-scale data under the existing MapReduce framework is difficult to meet In order to solve problems such as application requirements, the present invention proposes a large-scale data query method in a distributed environment based on Datalog, the steps of which include:

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Large-scale graph data query method in distributed environment based on Datalog
  • Large-scale graph data query method in distributed environment based on Datalog
  • Large-scale graph data query method in distributed environment based on Datalog

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] The specific implementation steps and detailed methods are described below.

[0034]This implementation method is carried out on the Hadoop platform, and mainly considers issues such as the design of a descriptive query language, the construction of an execution plan, and the optimization of query execution. Here, firstly, the design architecture diagram of the whole invention is given, and the contents responsible for each part of the framework are explained, and then the design and implementation of the unique modules of the invention are explained in detail.

[0035] The method of the present invention requires efficient management of large graph data on Hadoop, provides descriptive query language for end users, and optimizes the execution and optimization of descriptive queries as much as possible. For above-mentioned requirements, the present invention proposes such as figure 1 The system framework shown, from figure 1 It can be seen from the figure that this met...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a large-scale graph data query method in a distributed environment based on Datalog. The method comprises the following steps of: 1) performing grammatical analysis on a large-scale graph query instruction based on a Datalog rule set input by a user, and producing a corresponding grammatical tree; 2) constructing an execution plan in which a Datalog rule is used as a unit according to the grammatical tree, and constructing a corresponding Map execution function and a Reduce execution function according to each Datalog rule; and 3) implementing inter-rule optimization, inner-rule optimization and operation function optimization by using an equivalence rule and statistical data, and improving the efficiency of a large-scale graph query execution plan. By the large-scale graph data query method, the cost that a final user writes a graph query script is simplified; expanded recursion Daralog query is provided; and the user can express the corresponding large-scale graph query by using a simple description language. The invention also provides a method for constructing a MapReduce environment execution plan for recursion Daralog query. Datalog graph query can be executed under a MapReduce framework.

Description

technical field [0001] The invention specifically relates to querying large-scale data in a distributed environment, in particular to a method for querying large-scale data in a distributed environment based on Datalog, and belongs to the field of information technology. Background technique [0002] In modern society, graphs are used more and more widely. The rapid development of technologies in the fields of social networks, bioinformatics, and traffic navigation has produced large-scale graph data. How to effectively manage these large-scale data faces many challenges: First, the traditional stand-alone computing model is difficult to support the management of large-scale data, and the storage capacity of a single-machine is limited, so it is difficult to load the entire large-scale data into memory. The processing capability of large graph data is also insufficient, and it is difficult to effectively support various complex operations on large graph data; secondly, the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30
Inventor 高军周家帅王腾蛟杨冬青唐世渭
Owner PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products