Fact load with SCD2- join on current row or between dates

View previous topic View next topic Go down

Fact load with SCD2- join on current row or between dates

Post  gettingthere2 on Thu Jun 30, 2011 8:11 am

Hi - still learning so bear with pls.....

I've inherited a DW in which a sales fact table is truncated and reloaded every day.

Previsouly i've loaded accumulating fact tables with SCD2 Dimensions by just joining to the dimension record which has "currentrow=1". Based on a nightly load this means that the fact is joined to the appropriate dimension record with an accuracy of 1 day. This assumes the ETL suceeds every night and the fact table never gets corrupted/truncated etc.

The way I've just come across for loading a snapshot seems to be to join on the fact event date being between the effective dates of the dimension record.

What i want to know is

a) are these both valid methods?
b) if so which method is more appropriate to which scenerios?
c) what are the pros and cons of each method?


As a start I'm thinking that the currentrow method is fragile in that it requires frequent regular loading to work. Conversely the between dates method relies on having an event date(ID) in the fact table and then there may be multiple dates to choose between.

gettingthere2

Posts : 5
Join date : 2011-06-30

View user profile

Back to top Go down

Re: Fact load with SCD2- join on current row or between dates

Post  gettingthere2 on Mon Jul 04, 2011 4:24 am

gettingthere2 wrote:Hi - still learning so bear with pls.....

I've inherited a DW in which a sales fact table is truncated and reloaded every day.

Previsouly i've loaded accumulating fact tables with SCD2 Dimensions by just joining to the dimension record which has "currentrow=1". Based on a nightly load this means that the fact is joined to the appropriate dimension record with an accuracy of 1 day. This assumes the ETL suceeds every night and the fact table never gets corrupted/truncated etc.

The way I've just come across for loading a snapshot seems to be to join on the fact event date being between the effective dates of the dimension record.

What i want to know is

a) are these both valid methods?
b) if so which method is more appropriate to which scenerios?
c) what are the pros and cons of each method?


As a start I'm thinking that the currentrow method is fragile in that it requires frequent regular loading to work. Conversely the between dates method relies on having an event date(ID) in the fact table and then there may be multiple dates to choose between.

Bump. Cam anyone help?

Thanks

gettingthere2

Posts : 5
Join date : 2011-06-30

View user profile

Back to top Go down

Re: Fact load with SCD2- join on current row or between dates

Post  VHF on Tue Jul 05, 2011 10:25 am

Truncating and reloading the fact table daily is not a bad practice for a small-to-medium DW. It eliminates a lot complexity in CDC (chagne data capture) and ETL and ensures that the DW reflects the latest changes in the source system.

As far as which SCD2 dimension records to which to join when you have an accumulating snapshot with multiple dates, I think you need to pick which date you want to represent the "as of" date for that record. Could be the first date in the accumulating snapshot (ex: order received) or could be the most recent date that is populated (ex: order shipped). But you're going to have to determine some rule for when in the life of an order you want to "lock in" your view of the customer so to speak.

If you always use the current customer record when reloading your fact table, that is not different than SCD1 (which is simpler if there is no requirement for SCD2 tracking of customer (or other dimension) attributes.)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Fact load with SCD2- join on current row or between dates

Post  ngalemmo on Tue Jul 05, 2011 4:03 pm

You could load either way, but the question is, what is the business expecting?

Why is customer a type 2 dimension if the fact table is being loaded the way they are? Was there ever an expectation of customer or sales history? I would question why sales are being loaded the way they are. A combination of a type 2 customer dimension and a fact table that is reloaded daily leads me to think either something wasn't designed right or expectations are not being met. Probably both.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Fact load with SCD2- join on current row or between dates

Post  gettingthere2 on Wed Jul 06, 2011 6:11 am

Thanks for your replies. Reading them has brought some clarity to my mind.

It seems that it depends on whether the business wants the date the fact record was created (more or less the same as the record creation date in the OLTP system) or another date which relates to the fact.

It is actually the Staff dimension that is an SCD2 not the customer. The ability to track staff sales performance as individuals move around is sought, hence the SCD2.

Why is customer a type 2 dimension if the fact table is being loaded the way they are? Was there ever an expectation of customer or sales history? I would question why sales are being loaded the way they are. A combination of a type 2 customer dimension and a fact table that is reloaded daily leads me to think either something wasn't designed right or expectations are not being met. Probably both.

I don't think there is anything wrong with the current arrangement. The Staff SCD2 allows sales history by staff member to be reported, whilst the truncating and reloading of the fact table ensures there is no need to deal with the CDC (ie changes of status and other sale attributes). Joining on (historic) fact and effective dates allows this.

Am i missing something?

If you always use the current customer record when reloading your fact table, that is not different than SCD1 (which is simpler if there is no requirement for SCD2 tracking of customer (or other dimension) attributes.)
Maybe I was not being clear - previsouly on another project I was loading the fact incrementally and dealing with the CDC and had it joined to various SCD2s.

gettingthere2

Posts : 5
Join date : 2011-06-30

View user profile

Back to top Go down

Re: Fact load with SCD2- join on current row or between dates

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum