Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

ETL incremental load strategies

3 posters

Go down

ETL incremental load strategies Empty ETL incremental load strategies

Post  juanvg1972 Mon Dec 28, 2015 7:25 am

Hi,

I would like to know which is the best strategy in a ETL incremental load.
I have a CDC system that capture changes that I read, validate, transform and load in the target DB.
These changes belong to a period, all the rows that I treat correspond to a month (for example in Dec I treat November rows, and in January I will treat Decemeber rows).

For example in December, I am treating November rows and for the loading I have to optines:

1) In every ETL incremental load delete all rows of Nov in the target DB and insert the new rows.
- delete from targerdwb where sales_date >= 01/11/2015 and sales_date <= 30/11/2015
- insert CDC rows

2) In every incremental load I detect the rows that are news and the rows have to update.
In order to detect inserts or updates I have to make a join between rows to load and rows in the target DB.
Once detected inserts or updates (flag) I wil execute the SQL statements

I suppose the strategy depends in the volumen of the target DB and the volumen of rows in the CDC capture.
More or less the rows in the CDC are 20% at rhe beggining of the period and 5% at the end.

I will do performance tests to know the faster way of do the load, but I would like to know if there is any other thing
that will be usefull to take in consideration or if there are any other way to do the incremenrtal load.

Any advices about ETL incremental load will be usefull

Thanks in advance




juanvg1972

Posts : 25
Join date : 2015-05-05

Back to top Go down

ETL incremental load strategies Empty Re: ETL incremental load strategies

Post  ngalemmo Mon Dec 28, 2015 2:23 pm

Not sure I understand option #1. If your source is truly incremental, you will only have new or changed data, not the complete month. If it is incremental, #2 is the usual approach.

Assuming the source does contain the complete month, which performs better depends on your DBMS. In most cases option 2 performs better if the source is incremental.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

ETL incremental load strategies Empty ETL incremental load strategies

Post  juanvg1972 Mon Dec 28, 2015 2:32 pm

Hi ngalemmo, thank for your answer,

My option 1 is an error, that was my older strategy, when I did'nt have a CDC system.

One more question:

If I have DW detail fact table (transactions) and DW aggregatted fact table, I am updating data of one month (November f. e.), that is my aproach:

CDC records (transactions) -> join with DW detail records of 1 month, and I flag updates and inserts in CDC records -> execute inserts and updates in DW detail table-> from DW detail table I make an aggregated of the month records and I load this records in DW aggregated table (previosly I delete the records of this month in aggregated table)

is this the usual aproach??, any advice to better performance?, best practices??

Thanks,




juanvg1972

Posts : 25
Join date : 2015-05-05

Back to top Go down

ETL incremental load strategies Empty Re: ETL incremental load strategies

Post  nick_white Wed Dec 30, 2015 1:36 pm

Unless you have accumulating snapshot facts then you shouldn't normally be updating facts as they are a record of what happened at a point in time - so if you are just processing November's data then you shouldn't have any updates as this would be the 1st time you've processed November's facts and there shouldn't be changes to facts from previous months.

How you deal with aggregations is probably best judged on the performance of various options. If most of your existing aggregate records will change then it's probably quicker/easier to just drop them all and re-create them; if only a few aggregates will change (and you can easily identify them) then it's probably quicker just to either drop those and re-insert them or update them

nick_white

Posts : 364
Join date : 2014-01-06
Location : London

Back to top Go down

ETL incremental load strategies Empty Re: ETL incremental load strategies

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum