Hive data warehouse-based data column-level blood relationship processing system and method

A data warehouse and processing system technology, applied in the field of data column-level lineage processing systems, can solve problems such as poor compatibility, data lineage analysis granularity can only reach table level, etc., to overcome strong coupling, achieve fine-grained and high accuracy Effect

Active Publication Date: 2020-10-23
BEIJING UNIV OF POSTS & TELECOMM
View PDF4 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This solution has poor compatibility. Different Hadoop versions need to be adapted to different analysis solutions, and the data lineage analysis granularity of this solution can only reach the table level.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Hive data warehouse-based data column-level blood relationship processing system and method
  • Hive data warehouse-based data column-level blood relationship processing system and method
  • Hive data warehouse-based data column-level blood relationship processing system and method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0052] Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

[0053] The following describes the data column-level lineage processing system and method based on the Hive data warehouse according to the embodiments of the present invention with reference to the accompanying drawings.

[0054] Firstly, a data column-level lineage processing system based on a Hive data warehouse proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings.

[0055] figure 1 It is a structural framework diagram of a data column-level lineage process...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Hive data warehouse-based data column-level blood relationship processing system and method, and the system comprises: an SQL preprocessing module which is used for preprocessing SQL information inputted by a user; the SQL analysis module that is used for analyzing the preprocessed SQL information into a specific Hive execution plan; the data blood relationship analysis module that is used for analyzing a corresponding data blood relationship dependence relationship by combining Hive execution context information according to the specific Hive execution plan; and thedata storage module that is used for storing the data blood relationship dependency relationship into a database in the form of a data upstream and downstream dependency relationship. According to thesystem, on the premise of ensuring low coupling between a data blood relationship function and a Hive data warehouse, fine granularity and high accuracy of a data blood relationship analysis result are realized.

Description

technical field [0001] The invention relates to the technical field of data processing, in particular to a data column-level lineage processing system and method based on a Hive data warehouse. Background technique [0002] For big data, the data warehouse stores all business data of the entire enterprise. The current big data technologies are all based on the Hadoop ecological big data framework, and the data warehouse is basically based on the Hive data warehouse. Hive is a data warehouse tool based on Hadoop. Hive itself does not provide data storage function, but uses the HDFS component in Hadoop to realize distributed storage of data. The main function of Hive is to map the structured data in HDFS to a two-dimensional database table and provide SQL query function. Hive converts the user's SQL statement into a MapReduce job through an interpreter and submits it to the Hadoop cluster. Hadoop monitors the job execution process and returns the job execution result to the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F16/242G06F16/22G06F8/41
CPCG06F16/2433G06F16/2246G06F8/427
Inventor 鄂海红宋美娜谭泽华
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products