Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A distributed index method and system for efficiently querying streaming data based on lsm

A streaming data and distributed technology, applied in the field of streaming data, can solve the problems of wasting space and inability to add indexes, and achieve the effect of maintaining consistency, improving efficiency and improving efficiency.

Active Publication Date: 2022-08-05
WUHAN UNIV OF TECH
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since this will make it impossible to add indexes later, indexes will be created for all non-primary key columns at the beginning, resulting in wasted space

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A distributed index method and system for efficiently querying streaming data based on lsm

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0021] In order to facilitate the understanding and implementation of the present invention by those of ordinary skill in the art, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the embodiments described herein are only used to illustrate and explain the present invention, but not to limit it. this invention.

[0022] see figure 1 , a distributed index method for efficiently querying stream data based on LSM provided by the present invention, comprising the following steps:

[0023] Step 1: Batch update the data;

[0024] The data in memory is incremental data, and the data in disk is baseline data. When the amount of data in memory reaches a certain threshold, the incremental data will be continuously merged into the disk to generate new baseline data and divide the interval;

[0025] In this embodiment, the data is updated in batches based on the LSM-Tree method, whic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a distributed index method and system for efficiently querying stream data based on LSM. First, the data is updated in batches; then the sampling data blocks are sorted and divided into intervals; and then the improved R tree is used for each data partition. The generation algorithm establishes a local index; finally, the data in each data partition and its local index are allocated by the main node to each child node for storage, the main node establishes a global index according to the root node of each local index, and the global index is directly stored in the main node. . In the process of establishing a local index, the present invention adopts the method of improving the R tree, and changes to the method of dynamically inserting data. The global index is frequently used and has a small amount of data, and is stored in the main node, thereby greatly improving the query data. efficiency.

Description

technical field [0001] The invention belongs to the technical field of stream data; relates to a distributed index method and system, in particular to a distributed index method and system based on log structure merge tree (LSM). technical background [0002] Streaming data is different from the large-scale offline data processed by the Hadoop platform. Streaming data is real-time data that is continuous and without boundaries. There are roughly two ways to establish a distributed index. One is to use the Mapreduce parallel computing framework, which is created to query data. The corresponding map mapping table is established, and the client queries through the map table. Distributed databases such as NoSql need to coordinate with each other to achieve atomicity in distributed transactions. After the system writes data and the index is not updated in time, there will be consequences caused by inconsistency between the data and the index. So this method is only suitable for ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/13G06F16/182G06F16/22G06F16/23G06F3/06
CPCG06F16/134G06F16/182G06F16/2246G06F16/2365G06F3/061G06F3/0643G06F3/067
Inventor 邹承明冯丹
Owner WUHAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products