Historical and Incremental Loading of Dimension and Fact tables

View previous topic View next topic Go down

Historical and Incremental Loading of Dimension and Fact tables

Post  amir2 on Thu Oct 21, 2010 9:40 am

Hello

I have been reading three of the Kimball books and playing with the MDWT_AdventureWorks_SSIS sample. I have learnt a lot but I have also become overwhelmed as I was new to DW/BI up until a few weeks ago. Also, the books are very detailed and there is a huge amount of information to digest.

I am now trying to start the ETL part of our project (we've completed the Dimensional Modelling). I have already hit a problem on my first dimension table. Doing the historical load was very simple using SSIS. But, we are now trying to find a SSIS sample that shows how to do "incremental load" into an already populated dimension table. Some questions:

- Is there a SSIS sample package that someone point me at
- Is "incremental load" the right term to use for this
- Is this covered in the Kimball books - I don't have the ETL book by the way, but I have the others

We haven't even considered our fact tables yet as we are focusing on the dimension table first. However, I am sure the same issues apply for fact tables so any advice or sample packages for fact tables would be greatly appreciated.

For your information, our DW will initially be loaded with 1 year of historical data and never be required to maintain more that 5 years of data.

Thanks in advance!

amir2

Posts: 29
Join date: 2010-07-29

View user profile

Back to top Go down

Re: Historical and Incremental Loading of Dimension and Fact tables

Post  amir2 on Mon Oct 25, 2010 11:39 am

This article seems to do the trick

http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/

amir2

Posts: 29
Join date: 2010-07-29

View user profile

Back to top Go down

Great article. Is this then current best practice?

Post  NbleSavage on Thu Dec 02, 2010 5:49 am

amir2 wrote:This article seems to do the trick

http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/

Great blog post - thanks for the link, Amir. Is this approach (the use of checksums - or even better hashsums) to determine record changes then considered to be the current best practice for running incremental ETL loads?

Peace.

- Savage

NbleSavage

Posts: 24
Join date: 2010-11-28

View user profile

Back to top Go down

Any examples / citations?

Post  NbleSavage on Mon Dec 06, 2010 5:58 am

ttt for any practical incremental ETL examples along the lines of the fabulous Kimball SQL 2005 AdvWorks example depicting the historical load.

NbleSavage

Posts: 24
Join date: 2010-11-28

View user profile

Back to top Go down

Re: Historical and Incremental Loading of Dimension and Fact tables

Post  ngalemmo on Mon Dec 06, 2010 11:12 am

Be wary of checksum algorithms. Not all are appropriate for reliable change detection.

I like to use a 32 bit CRC. It is an established algorithm that is very sensitive to minor changes and transpositions in a string and produces a 32 bit hash, which as a false positive risk of 1 in 4 billion... well within acceptable limits. A 16 bit algorithm has 1 in 65K chance of a false positive, which is way too high.

Some suggest using an MD5 hash, which is 128 or 160? bits. I think it is overkill for simple change detection.

ngalemmo

Posts: 2621
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Historical and Incremental Loading of Dimension and Fact tables

Post  NbleSavage on Mon Dec 06, 2010 7:57 pm

^^Great post^^

Thanks!!

NbleSavage

Posts: 24
Join date: 2010-11-28

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum