Dimension Table - Primary Key Question SCD2
3 posters
Page 1 of 1
Dimension Table - Primary Key Question SCD2
Hi There,
I am designing a dimension for the first time and have a question regarding primary keys with SCD2. Do all dimension tables simply use the surrogate key as the primary key? The reason why I ask is that I want to prevent data being loaded twice a day with the chance of having multiple SCD2 records with the same day stamp which could impact / complicate reporting. Should this restriction logic be handled solely at the ETL level with no logical prevention at the database level?
Thanks.
Michael
I am designing a dimension for the first time and have a question regarding primary keys with SCD2. Do all dimension tables simply use the surrogate key as the primary key? The reason why I ask is that I want to prevent data being loaded twice a day with the chance of having multiple SCD2 records with the same day stamp which could impact / complicate reporting. Should this restriction logic be handled solely at the ETL level with no logical prevention at the database level?
Thanks.
Michael
mick- Posts : 3
Join date : 2014-07-22
Re: Dimension Table - Primary Key Question SCD2
Yes, all dimensions have surrogate keys. Your ETL should be robust enough to identify a file that's already been loaded.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Reloading of data
Another quick question. If data needs to be reloaded for any particular day, what is the best way to do this? given SCD2 changes.
mick- Posts : 3
Join date : 2014-07-22
Re: Dimension Table - Primary Key Question SCD2
Probably not a quick answer - it depends on why you need to reload and what the current state of your system is.
Cleanest solution would be to restore your DW to a previously 'known good' state and re-load your data. But that pre-supposes
a) You have backups and your backup/restore process is known to work. You hear frequent stories of companies that think they are taking backups only to find out they aren't when they need them; or are taking backups but find out they can't restore from them - or it will take so long to restore that you'll never catch up you r backlog of unprocessed transactions
b) you have the source data available and in a format that you can re-build the DW from. This is the point when you understand the reason for not deleting anything from your staging area - when you realise you can't re-create history from your source system
The other approach is to reload your data on top of your DW as it currently stands. There is no generic solution for how to do this - it depends on what the state of your DW is (i.e. why you need to re-load data), how many tables are not SCD1 (so you can't just overwrite them), etc. This approach is almost certainly going to require a custom set of ETL routines.
Whatever solution you select, you need to have built any required routines and tested them during your development phase. Trying to hack some custom ETL together in a Prod environment to reload data while your users are shouting at you to get their data sorted is not a good place to be in!
Cleanest solution would be to restore your DW to a previously 'known good' state and re-load your data. But that pre-supposes
a) You have backups and your backup/restore process is known to work. You hear frequent stories of companies that think they are taking backups only to find out they aren't when they need them; or are taking backups but find out they can't restore from them - or it will take so long to restore that you'll never catch up you r backlog of unprocessed transactions
b) you have the source data available and in a format that you can re-build the DW from. This is the point when you understand the reason for not deleting anything from your staging area - when you realise you can't re-create history from your source system
The other approach is to reload your data on top of your DW as it currently stands. There is no generic solution for how to do this - it depends on what the state of your DW is (i.e. why you need to re-load data), how many tables are not SCD1 (so you can't just overwrite them), etc. This approach is almost certainly going to require a custom set of ETL routines.
Whatever solution you select, you need to have built any required routines and tested them during your development phase. Trying to hack some custom ETL together in a Prod environment to reload data while your users are shouting at you to get their data sorted is not a good place to be in!
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Dimension Table - Primary Key Question SCD2
Thanks for your response Nick. The reason why I asked is that we have the occasion where queries may need to be changed from source system which could impact on fact calculations. I guess the easiest thing to do would be to package them up and deploy them to run the following day knowing that a point in time for reports, there were some logical changes.
Interesting note about storing archives for staging. At the moment the ETL job that I have written simply copies a days data to a staging table before moving it over to the final fact / dimension tables. I could also pump the results back to daily text files if I needed to re-import history.
Interesting note about storing archives for staging. At the moment the ETL job that I have written simply copies a days data to a staging table before moving it over to the final fact / dimension tables. I could also pump the results back to daily text files if I needed to re-import history.
mick- Posts : 3
Join date : 2014-07-22
Similar topics
» [SOLVED] SCD2 Dimension and Fact Table
» Dimension design question
» Monthly Periodic Snapshot Table using SCD Type1 Dimension tables - Design Question.
» SCD2 : question of date ...
» SCD2 Type Change Question
» Dimension design question
» Monthly Periodic Snapshot Table using SCD Type1 Dimension tables - Design Question.
» SCD2 : question of date ...
» SCD2 Type Change Question
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum