Method and device for running ETL (Extract-Transform-Load) process joint component by Flink framework
A component and process technology, applied in the field of process joint components and devices running ETL in the Flink framework, can solve problems affecting the efficiency of data joint, achieve the effect of improving efficiency and avoiding data serialization and deserialization
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0046] Embodiment 1 of the present invention provides a Flink framework running ETL process joint component method.
[0047] A process federated component approach for Flink frameworks to run ETL, including:
[0048] Traversing the ETL's directed acyclic graph DAG, identifying one or more nodes of the Splitting attribute; wherein the Splitting attribute node includes a data source node, a node FLINK_MESSAGE_SHARED_NODE attribute and one or more nodes that need to be converted into AFlink operator;
[0049] According to the ETL process DAG directed acyclic graph node sequence, starting from the data source node, based on the adjacent two nodes of the Splitting property, generates an ETL process subset composed of one or more ETL nodes and connecting lines between the nodes of the adjacent two Splitting properties, used in the Flink operator; constructs the corresponding flink between the nodes of the two adjacent Splitting properties API statement operation operator chain;
[0050] E...
Embodiment 2
[0115] Embodiment 2 of the present invention provides a Flink framework to run the ETL process joint component method, the present embodiment 2 compared to Example 1 in a more practical scenario to show the implementation process using the Flink joint operator.
[0116] In the ETL flowchart, there are sort components in the subsequent subset downstream of the union union component, because the sort component is a node with FLINK_REDUCE_NODE properties, so the sort node is the node of the Flink operator, and the union union component needs to be converted to the union operator of flink.
[0117] The data column information of multiple data sources corresponding to the Union union component is not necessarily completely consistent, and there is a situation where the number of columns is inconsistent and the type of the column is inconsistent; the ETL configuration union component outputs the reference column, taking one of the data source column information as the benchmark, and the...
Embodiment 3
[0139] Embodiment 3 of the present invention provides a Flink framework to run the ETL process joint component method, the present embodiment 3 compared to Example 1 in a more practical scenario to show the present scheme does not use the fink joint operator to run the implementation process.
[0140] The downstream components of the union component in the ETL flowchart do not have components that need to be translated into flink operators, and union union components do not need to be converted to union operation operators provided by the flink framework.
[0141] Among them, the data column information corresponding to multiple data sources corresponding to the ETL union component is not necessarily completely consistent, and there may be inconsistencies in the number of columns and the type of columns; the ETL configuration union component outputs the reference column, using one of the data source column information as the benchmark, and the other data source data are matched an...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com