Index building method for distributed memory columnar database

An in-memory columnar and index building technology, applied in the database field, can solve the problem of large memory consumption of Groupkey index, and achieve the effect of improving import performance and parallelism.

Active Publication Date: 2019-01-29
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] What the present invention is to solve is the problem of large memory consumption of the Groupkey index for establishing a distributed memory columnar database

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Index building method for distributed memory columnar database
  • Index building method for distributed memory columnar database
  • Index building method for distributed memory columnar database

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0039] figure 1 It is a schematic structural diagram of the distributed memory column database involved in this embodiment, and the distributed memory column database includes a domain controller DC (Domain Controller), an index server IS (Index Server), a data import management module IM (Import Manager ), memory database engine MDE (In Memory Database Engine), storage node CS (Column Store), data import system DIS (Data Import System), data server DS and read component RA (Replication Agent). Among them, the domain controller DC is responsible for issuing data import tasks to the data server DS; the storage node CS is a service node for storing data in the memory database engine, responsible for storing data and providing query functions to the upper layer, which includes the row table storage node rowtableCS and At least one index storage node columnCS; the data server DS is a data import module in the data import system DIS, responsible for importing source data into the m...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for establishing an index of a distributed memory columnar database, comprising: dividing single-column data into at least two data fragments; and calculating in parallel the intermediate data and the row table of the columnar compression index of each data fragment The intermediate data of the vector, the columnar compression index includes a dictionary vector, an index vector and a position vector; store and update the intermediate data of the columnar compression index and the intermediate data of the row table vector of each data slice in order to form a column compressed index and row table vectors. The index establishment method of the distributed memory column database provided by the present invention does not require much memory for importing huge tables into computing nodes, and can save hardware costs.

Description

technical field [0001] The invention relates to the technical field of databases, in particular to an index establishment method of a distributed memory column database. Background technique [0002] Groupkey index, that is, columnar compression index, is a data organization method in a distributed memory columnar database. It uses dictionary compression to compress the content of each column, and uses an index (index) vector to correspond to a value in the dictionary vector. The number of rows is indexed, and the position (position) vector is used to store the row number (rowid) corresponding to the dictionary vector. At the same time, there is a row table (rowtable) vector to maintain the row relationship, which stores the subscript of the element value in the dictionary vector. The traditional method of establishing a Groupkey index for a distributed in-memory columnar database is as follows: read data from the data source into the memory; sort and de-duplicate the data ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22
CPCG06F16/2272
Inventor 段翰聪闵革勇钟红霞王瑾李林郑松张博
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products