A method and device for processing etl data based on map/reduce

A data processing and data technology, applied in the computer field, can solve problems such as low data processing efficiency, and achieve the effect of improving data processing efficiency

Active Publication Date: 2020-04-07
KINGDEE SOFTWARE(CHINA) CO LTD
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] In the existing ETL data processing, since there are often more than one data source applications connected to ETL, and different data source applications need to record different data, the existing ETL data processing solutions need to use separate data for each data source application. Formulate corresponding task processes to deal with, resulting in low data processing efficiency

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for processing etl data based on map/reduce
  • A method and device for processing etl data based on map/reduce
  • A method and device for processing etl data based on map/reduce

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some, not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0054] see figure 1 A schematic flowchart of a Map / Reduce-based ETL data processing method disclosed by an embodiment of the present invention is shown.

[0055] Depend on figure 1 It can be seen that the method includes:

[0056] S11: Obtain the original data uploaded by each data source application on the current day.

[0057] Optionally, the original data of each data source application is first uploaded to different files of HDFS (Hadoop Distributed Fil...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Map / Reduce based ETL data processing method and device. The method includes: extracting common data of each data source application; differentiating differential data of each data source application; filtering and cleaning the common data and the differential data of each data source application through the parallel data processing ability of Map / Reduce; and generating predefined format data; and performing dumping. Compared with the conventional method, the method can ensure personalization demands of each application for user behavior collection, coverts data into data with the predefined format, meets the demands of data analysis, can perform high speed cleaning on dirty data, and improve the efficiency of data processing.

Description

technical field [0001] This application relates to the computer field, more specifically, to a method and device for processing ETL data based on Map / Reduce Background technique [0002] As an important part of building a data warehouse, ETL is a process of extracting, transforming, and loading data from the source to the destination. The specific process is for the user to extract the required information from the data source After data cleaning, the data is finally loaded into the data warehouse according to the pre-defined data format. [0003] In the existing ETL data processing, since there are often more than one data source applications connected to ETL, and different data source applications need to record different data, the existing ETL data processing solutions need to use separate data for each data source application. Formulate corresponding task processes to deal with, resulting in low data processing efficiency. Contents of the invention [0004] In view o...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/25
CPCG06F16/254
Inventor 张洋胡博
Owner KINGDEE SOFTWARE(CHINA) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products