Storing Date Keys in dimension tables versus fact tables

View previous topic View next topic Go down

Storing Date Keys in dimension tables versus fact tables

Post  rsbaxter on Tue Mar 08, 2011 4:49 pm

I have a question about the decision to store date keys in fact tables, and dimension tables. It seems to me that it is sometimes more effective to store date keys in the dimension table instead of the fact table, particularly if that date could be used as an attribute that could describe the rows in many different fact tables. The alternative as I understand it would be to duplicate the date key in multiple fact tables.

The differences in design would be a star schema (fact table) versus a snowflake (stored in the dimension). Are there any other design considerations that could influence this decision? Is the concept of a conformed date dimension the reason a designer would always store the date in the fact table? Are there situations where the date key should be stored in the dimension instead of the fact table?


If you need more detail about the scenario at hand; we are building a data warehouse for an insurance platform. The effective date of an insurance policy is an important date that could be used against many or most of the fact tables. Should we store the effective date in each of the fact tables? Or store it in a single row in the Insurance Policy dimension?

rsbaxter

Posts : 1
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  ngalemmo on Tue Mar 08, 2011 7:38 pm

If you have a policy dimension it makes sense to store the date in the dimension. I would NOT store a FK to the date dimension in the policy dimension.

As it exists in the policy dimension, it would be an attribute of the policy, not a context for the fact.

If you have facts that require this date as a context for the fact, with all the attributes the date dimension provides, then it should be a FK of the fact.

So, for example, a fact table recording premium payments would not have this date as a dimension, as the policy start date is not a context for that event. But it may appear as a dimension in a fact table tracking the lifecycle of a policy.

If you decide not to have a policy dimension (which, by the way, is something you may wish to consider... ) then the date would only appear as an FK in fact tables where it is appropriate.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  BoxesAndLines on Tue Mar 08, 2011 10:19 pm

I generally follow Nick's advice on this. Although, I have been known to use the date dim skey in a dimension under special circumstances.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  Jeff Smith on Wed Mar 09, 2011 12:23 pm

I tend to snow flake the date dimension on other dimensions. In many instances, I need to select the dimension key based on a date in the fact being between dates on the dimension table. The fact data in my processes tend to land in a staging table before it gets loaded to the final fact table. As I load to the staging table, I replace all of the dates in the fact data with the date dimension key. So, when I load fact data into the final fact table, it's easy to set the date key in the fact data between the date keys in the dimension table.

If the only reason for the dates in the dimension table are for assigning the correct dimension key when loading the fact table, then it isn't being used as a snow flake so the negatives of the snowflake aren't applicable.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  hang on Wed Mar 09, 2011 7:20 pm

I would say, it depends. For any date attribute in dimension, I would leave it as its original date(time) format initially and only snowflake them if I need other attributes from date dimension. Most of the cases I end up not doing anything.

To me, snowflaking date attributes in dimension has no difference than snowflaking other types of attributes in dimensions. I would only snowflake dimension with strong reasons. It really comes down to two major goals of dimensional modelling, performance and ease of use, and in this regard, whether the snowflake makes the use of dimension a lot easier and faster.

I have come across a situation that a dimension contains more than ten date keys because of some rules. However when I use these date attributes, I only need their actual calendar dates for comparison, working out the durations. I ended up joining to date dimension ten times in order to get the actual date values. This is a perfect example of confusion between guidelines and rules by misusing date dimension.

Kimball never said the date attributes must be snowflaked by date dimension. However he did say that if you want to leverage rich set of attributes in date dimension then use date dimension as an outrigger, but always remember that extensive use of
outrigger dimensions can compromise usability and performance
. Here is his article: http://www.rkimball.com/html/designtipsPDF/KimballDT61HandlingAll.pdf

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  Jeff Smith on Fri Mar 11, 2011 10:49 am

The date dimension in my opinion is unlike other dimensions. You don't generate surrogate keys the way you would for other dimension tables. Its a small dimension. And has a ton of use. Joins

The drag of snow flaking a date dimension onto other dimensions is having to write the join and not so much about the performance of the query. It violates the ease of use, not the performance issue.

But it all depends on how the data is used. If the dates are used alot, then it should be modeled in a way to reflect it's high frequency use. If it's used once in a blue moon and significant space can be saved by snow flaking, then model accordingly.

I tend to put start and end date keys on dimension tables because when I assign the dimension key to the fact, I have to set a date in the fact between the Start and End dates on the dimension. And since I replace all of the dates in the source of the fact early on in the etl process, it's an easier join and faster process if everything is done via date surrogates.


Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  ngalemmo on Fri Mar 11, 2011 9:33 pm

The drag of snow flaking a date dimension onto other dimensions is having to write the join and not so much about the performance of the query. It violates the ease of use, not the performance issue.

I disagree. Any decent BI tool hides underlying schema complexity. Ease of use is not the issue. On the performance side, a star schema has a strict relationship between facts and dimensions that many databases take advantage of in the form of a 'star join'. When you introduce snowflakes, the star join model falls apart and optimizers go to the tried and true relational join approach where tables are combined in a binary fashion. What the optimizer decides to do can cause significant differences in query times.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Storing Date Keys in dimension tables versus fact tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum