Distributed massive data processing method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A massive data processing and distributed technology, applied in digital data processing, special data processing applications, other database retrieval, etc., can solve problems such as large index cost, global index imbalance, system index and query throughput reduction, etc. To achieve the effect of improving storage efficiency and improving retrieval efficiency

Active Publication Date: 2022-04-01

GUANGZHOU UNIVERSITY +1

View PDF4 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] The current distributed B+ tree has almost no product-level industrial applications. Compression-word-aligned bitmap indexes are mainly used in stand-alone and parallel environments. Some distributed indexes use heterogeneous methods, that is, some use B+ and some use The mixed mode of bitmap, the disadvantage of this structure will lead to unbalanced global index, resulting in reduced throughput of system index and query, and some distributed bitmap indexes are not compressed, so when the amount of data is very large, the index data will be Several times larger than the data, which incurs a huge indexing cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0049] The method in this embodiment is applicable to the client.

[0050] figure 1 It is a schematic flow chart of the distributed massive data processing method provided by the embodiment of the present invention, figure 2 It is a relationship diagram between various physical devices in the distributed massive data processing method provided by the embodiment of the present invention. Such as figure 1 , figure 2 As shown, the present invention provides a distributed massive data processing method, including step 101-step 106.

[0051] Wherein, in step 101, the client A sends an indexing request to the metadata server B.

[0052] Client A reads the address and port of metadata server B in the locally stored configuration file, and sends an index request to metadata server B to request index data. The metadata server B reads the address and port of the coordination server C through its locally stored configuration file of the coordination server C, and sends a data ind...

Embodiment 2

[0085] The method in this implementation is applicable to the server side. In this embodiment, the server side includes a metadata server B and a first data index server.

[0086] Image 6 It is a schematic flowchart of the distributed massive data processing method provided by the embodiment of the present invention. As shown in FIG. 6 , the present invention provides a distributed massive data processing method, including steps 201-204.

[0087] In step 201, metadata server B receives an index request sent by client A; the index request includes the number of records included in the original data to be indexed, the number of columns included in each record, and the data type size of each column.

[0088] Step 202, the metadata server B sends the available server list to the client A, the available server list includes one or more first data index server identifiers, and the first data index server identifiers correspond to the first data index server one by one .

[0089]...

Embodiment 3

[0123] This embodiment is an apparatus embodiment corresponding to Embodiment 1, and is used to execute the method in Embodiment 1.

[0124] Figure 8 A schematic structural diagram of a distributed massive data processing device provided in Embodiment 3 of the present invention; Figure 8 As shown, this embodiment provides a distributed massive data processing device, including an index request module 301, an available server list receiving module 302, a split module 303, a storage location acquisition module 304, a block data sending module 305 and index completion information receiving module 306 .

[0125] Wherein, the index request module 301 is configured to send an index request to the metadata server.

[0126] The available server list receiving module 302 is configured to receive the available server list returned by the metadata server; the available server list includes one or more first data index server identifiers, and the first data index server identifier is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention provides a distributed massive data processing method and device, comprising: sending an index request to a metadata server; receiving a list of available servers returned by the metadata server; splitting the original data to be indexed to obtain the split Block data, and assign a digital number to the block data; use the block information of the block data as the key value, map the storage location of the block data to the list of available server identifiers; The block data is sent to the second data index server to store the block data; and the index completion information returned by the metadata server is received. Using the above method and device, multiple second data index servers can store the received block data at the same time, which greatly improves the storage efficiency, and the search of the stored original data can be completed through the index completion information, Greatly improved the retrieval efficiency.

Description

technical field [0001] The invention relates to the technical field of distributed storage, in particular to a distributed massive data processing method and device. Background technique [0002] The B+ tree index structure is one of the multi-level index structures, and is widely used in file systems for indexing because of its fast query speed. The existing B+ and its variant index structures are in a distributed environment. In order to maintain the balance of the index structure itself, every time an index is edited, it is necessary to perform a left-handed or right-handed operation on the structure. When the amount of data is large, these operation methods are It will cause data to move back and forth between nodes, bringing huge network communication costs. [0003] Based on the compression algorithm, bitmap index structure and distributed computing in the prior art, it is expected to seek a method to solve the above problems. Among them, the compression algorithm is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F16/13G06F16/172G06F16/182

CPCG06F16/2255G06F16/2272G06F16/27G06F16/907

Inventor 王锋刘应波邓辉

Owner GUANGZHOU UNIVERSITY

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Distributed massive data processing method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology