Method for creating ETL scripts from relational database to Hive
A database and relational technology, applied in the database field, can solve problems such as high time cost and low efficiency of ETL scripts, and achieve the effect of improving efficiency
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0023] Relational database of the present invention to the ETL script method of Hive, as figure 1 shown, including the following steps:
[0024] S101. Identify feature classifications of data in a relational database.
[0025] The data of relational data tables can be divided into two types according to whether it is updated or not: one type is log type data that will not be updated after being inserted into the table (that is, the data in the table will not be deleted or updated); the other type is inserted into the table The business data that will be updated later (that is, the data in the table will be deleted or updated). For these two types of data, the methods of ETL to Hive are different. For log-type data, the daily ETL task is to migrate the data generated the previous day to the corresponding partition of Hive; for business-type data, the daily task is to migrate all the changed data from the previous day (that is, new data, data update data) to Hive, and merge a...
Embodiment 2
[0034] The database script creation method of the present embodiment, such as figure 1 shown, including the following steps:
[0035] S101. Identify feature classifications of data in a relational database.
[0036] The data write frequency of relational data tables can be divided into static data and dynamic data. Static data is characterized by slow data update iterations. For example, supplier data, once entered, is not updated frequently; while dynamic data will increase with the increase in business system traffic. This causes the Hive ETL methods corresponding to the two types of data to be different. Static data needs to always save the latest full amount of data in Hive, while dynamic data itself only needs to save the latest hot data.
[0037]According to the different characteristics of the above-mentioned dynamic data and static data, according to the statistical information of the dynamic management system tables of the relational database, the classification ch...
Embodiment 3
[0040] The relational database of the present invention is to the ETL script creation method of Hive such as figure 1 As shown, the difference between it and Embodiment 1 lies in the addition of the time field identification for incremental fetching.
[0041] After identifying the feature classification of the data table in the relational database, it is also necessary to specify the time range for each ETL operation. Therefore, identifying the time field of the incremental access of the data table is one of the necessary steps. Further, in S101, the time field of the relational database is obtained, and it is judged whether the time field is updated following the change of the data, and if so, whether the time field can use an index (in the index column or the joint index's first One column), if the above two principles are met, specify the above time field as the time field for incremental fetching, and generate the time range for fetching.
[0042] The remaining steps of t...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com