Index building method for distributed memory columnar database

A memory columnar and index building technology, applied in the database field, can solve the problem of large memory consumption of Groupkey index, achieve the effect of improving import performance and parallelism

Active Publication Date: 2016-08-10
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] What the present invention is to solve is the problem of large memory consumption of the Groupkey index for establishing a distributed memory columnar database

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index building method for distributed memory columnar database
  • Index building method for distributed memory columnar database
  • Index building method for distributed memory columnar database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0039] figure 1 It is a schematic structural diagram of the distributed memory column database involved in this embodiment, and the distributed memory column database includes a domain controller DC (Domain Controller), an index server IS (Index Server), a data import management module IM (Import Manager ), memory database engine MDE (In Memory Database Engine), storage node CS (Column Store), data import system DIS (Data Import System), data server DS and read component RA (Replication Agent). Among them, the domain controller DC is responsible for issuing data import tasks to the data server DS; the storage node CS is a service node for storing data in the memory database engine, responsible for storing data and providing query functions to the upper layer, which includes the row table storage node rowtableCS and At least one index storage node columnCS; the data server DS is a data import module in the data import system DIS, responsible for importing source data into the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an index building method for a distributed memory columnar database. The method comprises the following steps: segmenting a single column of data into at least two data fragments; computing intermediate data of columnar compression indexes of the data fragments and intermediate data of row table vectors in parallel, wherein the columnar compression indexes include dictionary vectors, index vectors and position vectors; and sequentially storing and updating the intermediate data of the columnar compression indexes of the data fragments and the intermediate data of the row table vectors in order to form the columnar compression indexes and the row table vectors. Through adoption of the index building method for the distributed memory columnar database provided by the invention, an excessively large memory is not required for import computing nodes of a huge table, so that the hardware cost can be reduced.

Description

technical field [0001] The invention relates to the technical field of databases, in particular to an index establishment method of a distributed memory column database. Background technique [0002] Groupkey index, that is, columnar compression index, is a data organization method in a distributed memory columnar database. It uses dictionary compression to compress the content of each column, and uses an index (index) vector to correspond to a value in the dictionary vector. The number of rows is indexed, and the position (position) vector is used to store the row number (rowid) corresponding to the dictionary vector. At the same time, there is a row table (rowtable) vector to maintain the row relationship, which stores the subscript of the element value in the dictionary vector. The traditional method of establishing a Groupkey index for a distributed in-memory columnar database is as follows: read data from the data source into the memory; sort and de-duplicate the data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/2272
Inventor 段翰聪闵革勇钟红霞王瑾李林郑松张博
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products