Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A distributed data storage structure, data storage method, and data query method

A distributed data and storage structure technology, which is applied in database indexing, structured data retrieval, special data processing applications, etc., can solve the problems of index accuracy reduction, low accuracy, and index range expansion

Active Publication Date: 2019-06-04
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, when the data value is relatively large, and there are many data falling in the same index interval, the index range will expand and the index accuracy will decrease.
Take the data of 998 as an example, there are theoretically 2^8 values ​​in the same index range as 998, and the accuracy is very low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed data storage structure, data storage method, and data query method
  • A distributed data storage structure, data storage method, and data query method
  • A distributed data storage structure, data storage method, and data query method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0083] A distributed data storage structure such as figure 1 shown, including:

[0084] Master control node: used to establish the mapping relationship between the data storage unit Block and the physical machine where it is located, count the global load situation, and generate the ID of the data storage unit Block;

[0085] Data import manager: used to buffer external data, sort external data according to data values, generate index Groupkey and data, then generate data storage unit Block to store index Groupkey and data, and finally import data storage unit Block to storage nodes;

[0086]Storage node: store the data storage unit Block, and provide query function to the queryer;

[0087] The storage node includes a sub-metadata manager, a data storage unit Block, and a data reader module;

[0088] Sub-metadata manager: used to maintain the mapping from the column uniquely determined by the database name, table name, and column name inside the storage node to the data sto...

Embodiment 2

[0123] A data storage method according to a distributed data storage structure.

[0124] S1. The master control node determines the number of rows and columns of the table in each data storage unit Block according to the width and number of columns of each table, as well as the IP of the storage node to be sent, and informs the data import management of this information device;

[0125] S2. The data import manager sorts the data read from the external data source, generates a data dictionary, and determines the compressed bit width or byte width of the index vector index, the position vector position, and the row table vector rowtable;

[0126] S3. The data import manager generates compressed index vector index, position vector position, and row table vector rowtable according to the compressed bit width or byte width, and collects metadata at the same time. According to the internal design of the data storage unit Block, fill in the header information and the above data body...

Embodiment 3

[0132] Such as Figure 7 As shown, according to the data query method of a distributed data storage structure,

[0133] The process of finding the data storage unit Block whose database ID is db_id, table name is table_name, and column name is col_name is:

[0134] D1. Access the master control node, send the database column information of the data to be accessed, and a message containing the range of the corresponding column to the master control node,

[0135] The query method is as follows:

[0136] Query rowid by value: query the matching rowid by a given value range or fixed value; query value by rowid: query the value corresponding to each rowid by a given rowid set. For example, the query statement structure is as follows: Query rowid by value: , where the range query or equivalent query condition is for example: name = "Zhang San";

[0137] Query value by rowid: , where range query or equivalent query condition is for example: rowid>10000,

[0138] D2. The master c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed data storage structure, a data storage method and a query method thereof. The involved modules include: master control node: used to establish the mapping relationship between the data storage unit Block and the physical machine where it is located, to collect statistics on the global load situation, and to generate the ID of the data storage unit Block. Data import manager: caches external data, generates a data storage unit block, and imports the data storage unit block to the storage node. Storage node: stores the data storage unit Block, and provides query function for the queryer.

Description

technical field [0001] The invention relates to the field of data storage and computing, in particular to a distributed data storage structure, a data storage method and a data query method. Background technique [0002] Traditional row-based databases are stored in rows. Row storage is generally used in relational databases, and its advantage lies in processing OLTP-type business. On the contrary, columnar databases store data in columns, and each column is stored separately. When data accesses certain columns, only the columns involved in the query need to be accessed, which greatly reduces the data transmission volume of the system. Moreover, because the data types are consistent and the data characteristics are similar, it is very convenient to compress, and the compression rate is mentioned. Row-type databases are good at random read and update operations, while column-type databases are better at querying large amounts of data. The row-column hybrid storage takes in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/22
CPCG06F16/221G06F16/2237G06F16/2272
Inventor 段翰聪闵革勇张建钟红霞詹文翰
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products