Null value for dates

View previous topic View next topic Go down

Null value for dates

Post  Jeff Smith on Mon Dec 20, 2010 12:08 pm

All of my dimension tables have a value -1 that means there was no match to then dimension table. I try to put in some value such as "Unknown" or "Not available".

Any suggestions for the -1 row in a Date dimension? Should I use some future date such as 12/31/9999? And if I use 12/31/9999, should I continue to use -1 as the default value? I tend to use the Date Dimension Key when caluculating the number of days between 2 dates on a fact, such as the number of days between the Date of Service and the Paid Date.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Null value for dates

Post  gvarga on Mon Dec 20, 2010 12:40 pm

I would suggest also the „Unknown” dimension row instead of the future dummy date!

Try to conciliate with your business partners what to do: if 1 or 2 dates are unknown, how to calculate the days between the Service date and the Paid date ( For instance it could be 0 ?)

gvarga

Posts : 43
Join date : 2010-12-15

View user profile

Back to top Go down

Re: Null value for dates

Post  RossGoodman on Mon Dec 20, 2010 1:44 pm

Another concept that can help is to have a pair of dates, a default low date 1-Jan-1000 and a default high date 31-Dec-9999.

If an "open date" was missing you can use the default low date, if a "close date" is missing you can use the default high date.

This should of course be paired with an audit dimension to record the fact that you have used default dates in place of "real" dates.

Ross

RossGoodman

Posts : 1
Join date : 2009-02-03
Location : Scotland

View user profile http://www.RossGoodman.com

Back to top Go down

If the date is unknown...

Post  rademola on Tue Dec 21, 2010 6:12 am

I normally use an integer representation of dates in the format yyyymmdd (earliest date for me is 19000101 = Jan 1, 1900 and 99991231 = Dec 31, 9999).So for a business transaction say "Service End Date", I insert 99991231 as the date_key while an event that happened in the past that I dont know will have a date_key of 19000101. Calculating intervals still works although I would question why we expect to calcuate say "duration of contract" if the contract itself is still active..., just saying. I would also be interested in what your users think from the user experience point of view.

Ross' suggestion of inserting an audit record for "unreal dates" is very sensible too.

rademola

rademola

Posts : 9
Join date : 2010-12-21

View user profile

Back to top Go down

Re: Null value for dates

Post  ngalemmo on Tue Dec 21, 2010 7:36 pm

Why would you calculate intervals using the key?

While it is common to use a known key for a date dimension (such as YYYYMMDD) it should always be treated as a surrogate (i.e. meaningless) from the point of view of an application. Intervals should always be based on attributes of the date. This avoids problems when it is necessary to generate unknown or error keys when dealing with unknown or invalid dates from source data.

The only 'smart' use of a known date key value is for defining partitions in the physical database. Such use is transparent to the end-user application.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Null value for dates

Post  rademola on Wed Dec 22, 2010 6:24 am

Yes date intervals should only be calculated from date fields. Thanks for pointing that out ngalemmo...

And in a telco I worked at in the past, our call detail records table was partitioned as you explained.


rademola

Posts : 9
Join date : 2010-12-21

View user profile

Back to top Go down

Re: Null value for dates

Post  Jeff Smith on Wed Dec 22, 2010 3:25 pm

Thanks for the advice about how to define the surrogates for the date dimension. I'm fine with the way it's done. If you want to use YYYYMMDD, then by all means use it. I prefer to do it a different way. It works perfectly. Yeah , it might be a little easier when creating partitions but not much - not really.

In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Null value for dates

Post  Guest on Thu Dec 23, 2010 10:53 am

On our dummy row for the calendar I have a default date of 1/1/1900. It could equally have been some far future date - the aim was to have something visible to users that would be immediately apparent as not right. However, I have been considering changing this to be a null date. My rationale for this is that a user running a query and filtering on date (e.g. < 31.12.2009) but not displaying the date on the report would be picking this rows up although they may not belong in that time frame. By using a null default date this situation will not occur, and it also better represents the state of the data. For some snapshots, the event date will not yet be known. I work for a mail order firm, so there is a delay between order capture and parcel despatch. Having a despatch date of null (rather than 1900) means undespatched parcels are not erroneously picked up on a query - it also means that the users would be able to actively select undespatched parcels by specifying a null despatch date (although for reference we do have a parcel despatch status attribute!).

I need to run this past a sample of business users first to make sure they understand the changes, but for me a null date seems the most appropriate option.

Guest
Guest


Back to top Go down

Re: Null value for dates

Post  ngalemmo on Thu Dec 23, 2010 12:51 pm

Jeff Smith wrote:In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?

I use standard practice of assigning a surrogate value. So unknown and invalid date instances are assigned low integer values. To handle invalid dates, I usually have multiple natural keys in the date dimension defined as VARCHAR and containing strings of the date in the various formats being received from different sources. Date key assignment from a source does not require actually interpreting the source data... improper date values do not cause a problem. The simply wind up as inferred rows in the date dimension with a low key value for cleanup at a later time.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Null value for dates

Post  Jeff Smith on Thu Dec 23, 2010 1:55 pm

ngalemmo, The source system only lists Termination Dates for members that have terminated. For members that have not terminated, there is no date. In systems that you've designed, what does the user see in the Termination Date when they have not yet termed? Would leave it as a NULL or would you plug in some distant date such as 12/31/2999?

I'm not referring to bad dates. I'm referring to dates that are purposely left NULL in the source system.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Null value for dates

Post  hang on Thu Dec 23, 2010 5:49 pm

Jeff Smith wrote:Yeah , it might be a little easier when creating partitions but not much - not really.

Table partitioning is important feature to data warehousing which speeds up the performance on very large table dramatically.

An effective technique, namely Sliding Window Partitioning, is commonly used in ETL process when you need to drop portion of the fact data based on a time series and load fresh source. However automating partition management is not a trivial task and having purely meaningless date key can only make the process far more complicated.

Jeff Smith wrote:Yeah In any case, getting back to the original question - if you use yyyymmmdd for the dimension key, what would you use for an unknown date?
You use whatever integer value that you would use for meaningless surrogate date key, e.g. 0, -1 or -999, as it is just an integer anyway. Some people call yyyymmdd integer date key as smart surrogate key, which I think makes perfect sense. But remember, date dimension is the ONLY exception that allows to use smart surrogate key.

Jeff Smith wrote:Would leave it as a NULL or would you plug in some distant date such as 12/31/2999
12/31/2999 or 12/31/9999, it does not matter, as long as it is in far future and not a normal business date. As ngalemmo said, it is other attributes or natural keys that give the meaning for termination, not the date key. If you like, you may even have NULL in some descriptive attributes in your date dimension, although not recommended. But you must not have NULL date key value in the fact table. as it is RI violation which is not allowed in both dimensional and relational modeling. In SQL server SSAS, you can't even process your cube if you violate this fundamental principle.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Null value for dates

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum