Surrogate key regeneration

View previous topic View next topic Go down

Surrogate key regeneration

Post  pschmidt on Wed Jul 28, 2010 9:59 pm

I am currently engaged with an organization as a technical lead with their BI deployment. I have become involved with the information delivery team in architectural discussions on "best practice" design of a data mart. What I have uncovered in terms of conformed dimensions and fact data is based on a concept I have never come across before. Let me give you an example:

1. The data mart is implementing the use of surrogate keys. Great, so for cost centre 100 it is assigned a surrogate key of 1.
2. There is no insert/update strategy in place. So, whenever data is loaded the entire mart including dimensions and facts is truncated. For example, if I load July actuals I need to reload all history.
3. What this means is all surrogate keys are regenerating! So, cost centre 100 now has a surrogate key of 2 and all facts are reloaded to reflect this change!

Based on my experience this is a flawed approach in terms of design and implementation. Any thoughts would be greatly appreciated.

Thanks, Paul

pschmidt

Posts : 1
Join date : 2010-07-28

View user profile

Back to top Go down

Re: Surrogate key regeneration

Post  BoxesAndLines on Thu Jul 29, 2010 8:47 am

If you reload the entire data warehouse then the approach is not flawed. SK's are purely for joining. The thing that maybe flawed is the approach to reload everything. Over time, the tables will grow incrementally larger causing longer and longer load times.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Surrogate key regeneration

Post  hang on Fri Jul 30, 2010 7:47 pm

In your case, every incremental ETL load is treated as initial load. Normally you should design your ETL to cater for incremental load on SCD dimension and fact tables, and the initial load is just part of it with unlimited time span.

It's a a lot easier to treat everything as initial load and forget about finding any changes in the source, and it will self correct the data in DW if there are a lot of on-going corrections in source data. It might be OK for small DW as it may only take couple of hours to complete the entire rebuild overnight. But for most DW project, this approach is flawed for the following reasons.

1. You may not have the complete source history to build your DW.

2. Your load window does not allow the time for the entire rebuild.

3. If the load failed in the night, then the whole DW becomes unavailable unless there is an automated backup/restore and database switching mechanism.

In my experience I do frequently rebuild the DW during development to get all SK's aligned, but once in production, it's very risky process that requires quite a few days contingency planning and preparation.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Surrogate key regeneration

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum