Late Arriving Dimension Data

View previous topic View next topic Go down

Late Arriving Dimension Data

Post  GregDC on Fri Jan 09, 2015 3:29 pm

I have a Warehouse with a grain of monthly. Data from the HR department is reported on as of the last working day of the month. However we are being told that are counts by status are off. It would seem that on the month after the report is issued that HR provides records with status changes that are backdated to the prior month. So when HR looks at the counts for the July Status report they are correct according to the data in the dimension/fact tables at the time the report was run, but HR did not get all the field reports until several days (or even weeks) into the next month. They know this happens so they change their systems and backdate the change. When the ETL picks up the change status record there is confusion. Do I change the Status with the backdate (hence making the end-of-month report non-duplicatable, but in agreement with the current HR system) or do I make the change and reflect the change in the month the data was reported (hence disagreeing with the HR system)?

Thanks for helping me think through this problem,

GregDC

GregDC

Posts : 17
Join date : 2015-01-09

View user profile

Back to top Go down

Re: Late Arriving Dimension Data

Post  ngalemmo on Fri Jan 09, 2015 3:37 pm

Yes.  

Why not do both? It depends on what the requirement is.  If the business can't decide then providing both covers all possibilities.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Late Arriving Dimension Data

Post  GregDC on Sat Jan 10, 2015 9:50 am

Thank you for the suggestion. The overall business requires reports to be repeatable (July reports always show the same July results), the HR group however insists that the "correct" answer for the July report needs to agree with their system in mid-August. So I have been trying to think how to implement you "both/and" suggestion.

I will admit that I am not seeing how to do this without some big changes to the dimension's attributes. Can you enlighten me a little about what you are thinking?

Thanks

GregDC

Posts : 17
Join date : 2015-01-09

View user profile

Back to top Go down

Re: Late Arriving Dimension Data

Post  ngalemmo on Sat Jan 10, 2015 3:06 pm

Implement a type 2 dimension. Facts will be tied to the dimension at the point in time the fact was added. If a user wants the current dimension row, you do a self join on the dimension, based on the natural key, to locate the current version of the row.

If you want to avoid the self join there are two ways to handle it. You can have a single dimension table with a type 2 key and a persistent (i.e. type 1) key and store both keys in the fact. If they want current, you query the dimension using the persistent key and filter on the current flag. The other way is to have two dimension tables, one type 1 and the other type 2, again with both keys in the fact table.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Late Arriving Dimension Data

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum