Distributed massive data processing method and device

A massive data processing and distributed technology, applied in digital data processing, special data processing applications, other database retrieval, etc., can solve problems such as large index cost, global index imbalance, system index and query throughput reduction, etc. To achieve the effect of improving storage efficiency and improving retrieval efficiency

Active Publication Date: 2022-04-01
GUANGZHOU UNIVERSITY +1
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The current distributed B+ tree has almost no product-level industrial applications. Compression-word-aligned bitmap indexes are mainly used in stand-alone and parallel environments. Some distributed indexes use heterogeneous methods, that is, some use B+ and some use The mixed mode of bitmap, the disadvantage of this structure will lead to unbalanced global index, resulting in reduced throughput of system index and query, and some distributed bitmap indexes are not compressed, so when the amount of data is very large, the index data will be Several times larger than the data, which incurs a huge indexing cost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Distributed massive data processing method and device
  • Distributed massive data processing method and device
  • Distributed massive data processing method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] The method in this embodiment is applicable to the client.

[0050] figure 1 It is a schematic flow chart of the distributed massive data processing method provided by the embodiment of the present invention, figure 2 It is a relationship diagram between various physical devices in the distributed massive data processing method provided by the embodiment of the present invention. Such as figure 1 , figure 2 As shown, the present invention provides a distributed massive data processing method, including step 101-step 106.

[0051] Wherein, in step 101, the client A sends an indexing request to the metadata server B.

[0052] Client A reads the address and port of metadata server B in the locally stored configuration file, and sends an index request to metadata server B to request index data. The metadata server B reads the address and port of the coordination server C through its locally stored configuration file of the coordination server C, and sends a data ind...

Embodiment 2

[0085] The method in this implementation is applicable to the server side. In this embodiment, the server side includes a metadata server B and a first data index server.

[0086] Image 6 It is a schematic flowchart of the distributed massive data processing method provided by the embodiment of the present invention. As shown in FIG. 6 , the present invention provides a distributed massive data processing method, including steps 201-204.

[0087] In step 201, metadata server B receives an index request sent by client A; the index request includes the number of records included in the original data to be indexed, the number of columns included in each record, and the data type size of each column.

[0088] Step 202, the metadata server B sends the available server list to the client A, the available server list includes one or more first data index server identifiers, and the first data index server identifiers correspond to the first data index server one by one .

[0089]...

Embodiment 3

[0123] This embodiment is an apparatus embodiment corresponding to Embodiment 1, and is used to execute the method in Embodiment 1.

[0124] Figure 8 A schematic structural diagram of a distributed massive data processing device provided in Embodiment 3 of the present invention; Figure 8 As shown, this embodiment provides a distributed massive data processing device, including an index request module 301, an available server list receiving module 302, a split module 303, a storage location acquisition module 304, a block data sending module 305 and index completion information receiving module 306 .

[0125] Wherein, the index request module 301 is configured to send an index request to the metadata server.

[0126] The available server list receiving module 302 is configured to receive the available server list returned by the metadata server; the available server list includes one or more first data index server identifiers, and the first data index server identifier is ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The present invention provides a distributed massive data processing method and device, comprising: sending an index request to a metadata server; receiving a list of available servers returned by the metadata server; splitting the original data to be indexed to obtain the split Block data, and assign a digital number to the block data; use the block information of the block data as the key value, map the storage location of the block data to the list of available server identifiers; The block data is sent to the second data index server to store the block data; and the index completion information returned by the metadata server is received. Using the above method and device, multiple second data index servers can store the received block data at the same time, which greatly improves the storage efficiency, and the search of the stored original data can be completed through the index completion information, Greatly improved the retrieval efficiency.

Description

technical field [0001] The invention relates to the technical field of distributed storage, in particular to a distributed massive data processing method and device. Background technique [0002] The B+ tree index structure is one of the multi-level index structures, and is widely used in file systems for indexing because of its fast query speed. The existing B+ and its variant index structures are in a distributed environment. In order to maintain the balance of the index structure itself, every time an index is edited, it is necessary to perform a left-handed or right-handed operation on the structure. When the amount of data is large, these operation methods are It will cause data to move back and forth between nodes, bringing huge network communication costs. [0003] Based on the compression algorithm, bitmap index structure and distributed computing in the prior art, it is expected to seek a method to solve the above problems. Among them, the compression algorithm is...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/13G06F16/172G06F16/182
CPCG06F16/2255G06F16/2272G06F16/27G06F16/907
Inventor 王锋刘应波邓辉
Owner GUANGZHOU UNIVERSITY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products