Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Associated data compressing method friendly to query

A technology of associated data and compression methods, applied in the field of big data, can solve problems such as aggravating performance problems and reducing query efficiency, and achieve the effect of improving the compression rate

Active Publication Date: 2017-05-24
WUHAN UNIV OF SCI & TECH +1
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Although more and more storage media can be used to store increasingly large linked data sets, large data sets not only lead to low query efficiency, but also exacerbate performance problems in other common processes (such as RDF publishing and exchange)

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Associated data compressing method friendly to query
  • Associated data compressing method friendly to query
  • Associated data compressing method friendly to query

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The technical solution of the present invention will be described in detail below in conjunction with the drawings and embodiments.

[0048] The technical solution provided by the present invention is an associated data set compression algorithm based on a relational matrix, specifically comprising the following steps:

[0049] 1. Define the memory model of triples, including three data segments of subject S, predicate P and object O;

[0050] 2. Input the associated data in N-Triple format and parse it to get a set of triples;

[0051] The detailed process is as follows:

[0052]2.1. Filter out lines starting with "#" or empty lines;

[0053] 2.2. Read each row of data and split the string by spaces;

[0054] 2.3. Map the segmented data to the subject, predicate and object of the triple to construct a triple;

[0055] 3. Build a dictionary and ID the triplet;

[0056] The detailed process is as follows:

[0057] 3.1. Flatten the triples obtained in the previous s...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an associated data compressing method friendly to query. The method comprises the following steps: defining a relation mining rule, and mining a potential incidence relation in a triad; defining a compression query memory model which consists of a subject vector, a predicate vector and an object matrix; defining a serialization mode of the compression query memory model, and implementing serialization and deserialization by using three auxiliary symbols; defining a query mode of executing SPARQL on the compression query memory model, querying a subject and a predicate by using a binary search method, and querying an object by using a linear traverse method; and defining a scheme for solving slow query caused by the over-large object matrix, and dividing a large data block into a plurality of small data blocks. Compared with most of existing compression schemes, an associated data set processed by the method has the characteristics that the compression ratio is increased, and SPARQL query operation can be carried out directly under the compression state.

Description

technical field [0001] The invention relates to the field of big data, and is used for storage, transmission and query of massive RDF, LOD and knowledge map-related data. In particular, it relates to a query-friendly method for relational data compression Background technique [0002] There are many existing associated data compression schemes, but most of them are not friendly to queries. The generally accepted compression scheme is HDT, which has a high compression rate, but it needs to be decompressed first when querying, which is not friendly to queries. Inspired by the HDT scheme, many compression techniques based on the HDT scheme have also been proposed, such as HDT FoQ, WaterFowl, and HDT++. These compression techniques have a common feature: high compression ratio, but they are not friendly to queries. [0003] There are also some query-friendly schemes, such as the BitMat method. This compression scheme uses a three-dimensional matrix to express triplet relations...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
CPCG06F16/33G06F16/374
Inventor 顾进广彭燊黄智生符海东梅琨
Owner WUHAN UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products