Too many dates on fact - Is there such a thing as Junk Date dimension

View previous topic View next topic Go down

Too many dates on fact - Is there such a thing as Junk Date dimension

Post  Ramtin on Sun Apr 03, 2011 1:49 am

The Source system holds large number of dates for each Project. Appart from the Last two the rest are candidate for role playing on the fact.
However I am wondering if I should Group them and pull them out of the fact and create a Kind of Date Junk Dimension for the lack of a better term.
Or is that a crazy idea and large number of role plays on date dimension in an Accumulative Snapshot Fact is normal and Should I leave it alone?
here is some of dates:
AUTHSD_DATE
PLAN_STR_DATE
PLAN_FIN_DATE
SCHED_STR_DATE
SCHED_FIN_DATE
ACT_STR_DATE
EARLY_STR_DATE
LATE_STR_DATE
ACT_FIN_DATE
EARLY_FIN_DATE
LATE_FIN_DATE
RAISED_DATE
CLOSED_DATE
FINALISED_DATE
WARRANTY_FIN_DATE
LAST_MOD_DATE
DIS_MOD_DATE

Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  ngalemmo on Sun Apr 03, 2011 11:22 pm

I don't know if it is too many... sometimes is just is what it is.

A junk dimension doesn't really buy you anything unless some of the dates just need to be dates (i.e. don't need all the dimensional attributes from the date dimension). Which may be the case with at least a few of them...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  Jeff Smith on Mon Apr 04, 2011 9:03 am

If the dates aren't used very often then why not? I would make sure that the combination of the dates don't blow up the junk dimension. But I would make it a snow flack and build the junk date dimension with the date dimension keys.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

I'd stay away from that.

Post  Galen Boyer on Mon Apr 04, 2011 1:01 pm

The way I understand junk dimensions are because the number of dimensions you have in your entire model is getting large, and you want to try to get the number lower. Then, what you do is find smaller dimensions and start combining them into a "junk" dimension. But, instead, the reason you are thinking of a junk dimension is to lower the number of dimensional keys on the fact, which is a different motivation.

A reason I might do what you are doing is if the number of distinct combinations of the dates is so small that it truly makes it worth it because the resultant size of your fact/dimensional tables lowers. If not, you are just transferring size from the fact to the dimension.

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  BoxesAndLines on Mon Apr 04, 2011 2:25 pm

If it is an accumulating snapshot then I would add all the dates to the fact table (as FK's).
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  hang on Mon Apr 04, 2011 4:18 pm

In general, the date attribute cannot be junked as they are likely of high cardinality. However the intention of junk dimension IS to minimise the number of dimensions in a fact table so that a single join would make a set of dimension attributes available at small expense of data redundancy in junk dimension, but with big saving on the fact. So the key deterministic point is the cardinality of the relevant attributes.

Joining too many times with date dimensions, and only to find that a single calendar date value is needed, is annoying. If you can turn it into an accumulating snapshot, as B&L said, then you may be able to remove some end dates and let ETL work out some lags between some key milestones which could add more values than extra date keys.

Otherwise having some natural dates in fact as degenerate dimensions may be still allowable if these dates are simply used as factors for certain measures. Maybe I am playing evils advocate to some dimensional modeling purest, but I am simply trying to be more pragmatic. Having said that however, if the primary use of any date is to leverage set of attributes in date dimension I would use date key without doubt.



hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Just my opinion

Post  Jeff Smith on Mon Apr 04, 2011 5:11 pm

A lot of it depends on the overall size of the fact table, the number of date fields that are empty, the number of date fields that are usually exactly the same, and how often the dates are queried.

The overall objective is to organize the data so the retrieval is fast and the design is simple. If data is used rarely, then making it a little harder to write in a query is acceptable, particularly if the design improves the performance for 90% of the other queries that hit the table.

If 1 date field is regularly queried and the other date fields are rarely queried, then sticking all but the one date in a junk dimension can greatly shrink the over size of the fact table and query performance should be greatly improved.

An alternative design, particularly if the date fields are unevenly populated (lots of nulls), is to create a dimension for what the dates are for and create a row for every date. Keeps the narrow and many of the date fields would be null, then it could reduce the size of the fact table.

A lot depends on the over size of the fact table, how it will be used, and the frequency of nulls ion the date fields.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Too many dates on fact - Is there such a thing as Junk Date dimension

Post  Ramtin on Tue Apr 05, 2011 10:42 pm

Thank You for all your inputs and assistance.
It turns out Hang was right the cardinality is high so a Junk Dimension does not make alot of sense.
I have decided since Accumulating snapshot is about key milestones, use the milestone dates and throw the rest back on the project dimension so I ended up with following
AUTHSD_DATE
PLAN_STR_DATE
PLAN_FIN_DATE
SCHED_STR_DATE
SCHED_FIN_DATE
ACT_STR_DATE
ACT_FIN_DATE
CLOSED_DATE
FINALISED_DATE
WARRANTY_FIN_DATE

If you have any more thooughts I would welcome them.


Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  hang on Wed Apr 06, 2011 2:05 am

So itís an accumulating snapshot as B&L suggested. You need to cater for the dates that have not been fulfilled at some point of time. The best practice is to turn the dates into date keys and point all unfilled dates to an non-existence record in date dimension. Bear in mind with this type of accumulating snapshot fact, you need to frequently revisit the fact records and update relevant date keys as they become available.

However the date keys could be hard to calculate measures (lags) without joining to date dimension. You may be better off calculating any necessary lags in ETL and store them as measures in the fact so that you only need to use date keys to leverage date dimension attributes for analytic purpose.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Too many dates on fact - Is there such a thing as Junk Date dimension

Post  Ramtin on Wed Apr 13, 2011 3:10 am

Thanks for the tip Hang. I have included days from from inception for each state on the fact so I should be able to quickly calculate any lag between various stat/ milestone with out having to join to date dimension for all the role plays.

Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Too many dates on fact - Is there such a thing as Junk Date dimension

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum