Method for parallelly processing facts based on Hadoop platform

A parallel processing and factual technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem that the efficiency of surrogate key search is not very high, and achieve the effect of improving efficiency

Inactive Publication Date: 2015-08-26
DONGHUA UNIV
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, under the Hadoop platform, the lookup efficiency of surrogate keys is not very high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] In order to make the present invention more comprehensible, preferred embodiments are described in detail as follows.

[0019] The technical scheme of the invention provides a factual parallel processing method based on Hadoop platform. Since the amount of fact data is very large, and the processing of facts is mainly to find dimension keys, in order to speed up the processing of facts, this method starts from the direction of finding dimension keys in the fact table, and adopts the method of multi-path parallel search to improve the processing efficiency of facts , and different lookup methods are considered for different types of dimension tables. The specific steps are:

[0020] Step 1. Store the gradually changing dimension dataset CacheDims in the local cache;

[0021] Step 2. Initialize the dimension dataset Dims=Φ, and at the same time obtain the gradually changing dimension dataset CacheDims from the local cache, and go to step 3;

[0022] Step 3. If the grad...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Factual data quantity is huge, and the processing of facts is mainly searching for a dimension key. Therefore, to quicken processing of the facts, according to the method provided by the present invention, the dimension key is searched from a fact table, multi-way parallel searching methods are used to improve processing efficiency of the facts, and different searching methods are separately considered with respect to different dimension tables. With respect to the situation of different data quantities, the present invention provides corresponding parallel processing methods, such that multi-way parallel processing of the data quantities can be achieved; different searching methods are separately considered according to two different types of slowly changing dimension tables, thereby achieving parallel processing of the facts based on the Hadoop platform, and improving efficiency of fact processing.

Description

technical field [0001] The invention relates to a fact parallel processing method in the distributed ETL process based on the HADOOP platform. Background technique [0002] In the data warehouse field, the data extraction, transformation and loading (Extract-Transform-Load, ETL) process is mainly responsible for collecting data from different data sources, and transforming and cleaning the collected data sets according to user-defined business rules and requirements. Finally, it is loaded into the data warehouse according to the structure of the target data warehouse. Today, traditional ETL technology is facing new challenges from information explosion. For example, it is quite common for an enterprise to collect hundreds of gigabytes of data every day for processing and analysis. However, such a huge amount of data will make the traditional ETL method extremely time-consuming, and the time window required by users to process data is relatively short. [0003] Therefore, i...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/254
Inventor 李继云孙莉解书亮何刚丁祥武乐嘉锦施巍
Owner DONGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products