Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Gray-code-based distributed data layout method and query method

A distributed data and Gray coding technology, applied in the database field, can solve the problems of unbalanced computer development, over-distributed physical storage of multi-attribute continuous data, complex query and analysis tasks, etc., and achieve the effect of improving the sequential access rate

Inactive Publication Date: 2013-01-23
EAST CHINA NORMAL UNIVERSITY
View PDF0 Cites 27 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Current business intelligence data analysis (such as credit card customer churn analysis, insurance fraud analysis) usually uses online analytical processing (OLAP), data mining, enterprise performance management, predictive analysis, text mining and other advanced data query analysis technologies to achieve decision support functions. Queries and analysis tasks are extremely complex
It can be seen that massive data storage and complex query analysis and processing have brought great challenges to existing database systems
[0005] 2) Imbalance in the development of computer components
[0009] The present invention overcomes the above-mentioned defects in the prior art that a single index cannot support multi-attribute query, the physical storage of multi-attribute continuous data is too scattered, and the scalability of distributed storage is poor, and proposes a distributed data layout method and query based on Gray coding Method, the present invention is an efficient and highly scalable data layout method, which effectively maintains the locality of multi-dimensional data through Gray coding, and combines the master / slave structure of the distributed file system to achieve high scalability of data storage

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Gray-code-based distributed data layout method and query method
  • Gray-code-based distributed data layout method and query method
  • Gray-code-based distributed data layout method and query method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0048] Example 1: Overview

[0049] data sheet ???? Projected into a set of projections according to requirements ?????? , where each projection consists of a set of attributes, namely . Without loss of generality, we look at the projection in detail ?????? , and assume that the projection consists of the attribute group form, where 0 .

[0050] In order to adapt to the current popular distributed file system architecture (such as HDFS, etc.), our data layout also adopts a master-slave architecture, such as figure 1 shown. The host (Namenode) stores a content-aware index (content-aware index) indicated by gray coding, so that the query request can be directly located to a specific data block according to the index, thereby realizing efficient query. Slaves (i.e. Datanodes) store each split data table fragment in index order in a distributed manner. Each slice has a slice header / The load data has two parts. For the slice header, the statistical information of t...

Embodiment 2

[0051] Example 2: Content-Aware Bitmap Indexing

[0052] The content-aware bitmap index proposed in this invention has a dual role: first, it is used to indicate the content of multidimensional data records (i.e., the role of bitmap index); second, it is used to indicate the Location. Basically, the index is created based on the content of the records. The following describes how to create an index based on record content.

[0053] Encoding of numeric attribute values:

[0054] The system supports various numeric attributes, including continuous numeric attributes with known ranges, segmented numeric attributes with known ranges, and numeric attributes with unknown ranges.

[0055] For continuous numeric attributes with a known value range, the value range of the attribute is evenly divided into several segments, and the number of segments after division is a power of 2, and its size is determined by the system (will be introduced when introducing the multi-attribute data c...

Embodiment 3

[0066] Example 3: Index Construction

[0067] This embodiment illustrates the effective storage of data in the present invention, that is, the implementation of content-aware indexing with minimal additional storage overhead.

[0068] Currently popular and commonly used distributed file systems (such as HDFS, GFS, etc.) all have a master / slave architecture, because this architecture can effectively simplify the design of distributed file systems. On the host side, metadata is stored (ie, file name, file storage location, file size, backup, etc.), while on the slave side, payload data (ie, the data that actually needs to be stored) is stored. Based on the master / slave architecture of the above-mentioned distributed file system, this system proposes the deployment of a two-layer index, one of which is for the projection content index, called the projection hierarchical index; and the other layer is for the fragmented content index, called Shard hierarchical index.

[0069] As ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of a database, and discloses a gray-code-based distributed data layout method. The method comprises the steps of: dividing a range of each attribute into a plurality of equal portions; encoding according to a gray code order; marking a certain attribute value of a tuple comprising a plurality of attributes through the gray codes of the equal portions of the attribute value, namely an index code of the attribute value; forming an index key value of the tuple by mixing the index code of each attribute value in the tuple, wherein the tuple achieves the distributed data layout according to the order of the gray codes, the distributed data layout is deployed on a distributed system, the bitmap index of content perception is achieved at a host computer terminal of the system and the content perception is stored in a file name, and physical storage of data and statistical index of the data are achieved on a slave terminal of the system. The invention also discloses a query method employing the database formed by means of the method. The data layout obtained by the method can meet the requirements of data processing such as exact matching search, range search, multi-dimensional search, multi-attribute search and aggregated analysis, and the method is high in disc access efficiency.

Description

technical field [0001] The invention belongs to the technical field of databases, and in particular relates to an efficient distributed data layout method and a query method. Background technique [0002] Dating back to the 1970s, the successful development of the IBM System R system and the Ingres system of Berkeley University proved the superiority of the relational database system in processing commercial data. In the subsequent 1980s, the vigorous development of transaction processing (OLTP)-based database systems derived from this model, such as IBM DB2, Sybase SQL Server, Oracle Database, and INFORMIX-SQL, enabled the database system to be fully commercialized. formed a huge market value. In the 1990s, the data warehouse system proposed by W.H.Inmon to integrate historical data and realize business intelligence services such as business planning and decision support through online analysis (OLAP) and data mining methods opened up a new chapter for the application of d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/30
Inventor 周敏奇周傲英
Owner EAST CHINA NORMAL UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products