SQL Server 2008 Date data type as dimension key

View previous topic View next topic Go down

SQL Server 2008 Date data type as dimension key

Post  VHF on Fri Feb 12, 2010 3:42 pm

We are getting ready to migrate from Microsoft SQL Server 2005 to 2008. Currently we use an integer (in YYYYMMDD format) as the SK on our date dimension table. I am very tempted to start using the new 3-byte Date data type (which stores only the date with no time information) as the SK for the date dimension.

Anyone have first-hand experience with the new SQL 2008 Date data type? Have you bumped into any tools that don't support it properly?

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

SQL Server 2008 Date data type as dimension key

Post  DataWhisperer on Thu Mar 25, 2010 4:04 pm

I think that this is a good idea because in 90% of our queries this would eliminate the need to join to the date dimension. Sometimes a date is just a date. We are modeling processes which are date intensive and so many of our fact tables require many (6 to 10) joins to the date dimension. Too many joins can slow down a query no matter how efficient they are. Besides I think that using the date datatype as the SK would offer the best of both worlds. You could join to the date dimension if you need to, but only when you need to. We could even use our existing date dimension as is because we already have a unique constraint on the DateValue field and it is already the date data type.

To me the real issue would be how to deal with pseudo nulls. With ints we can use -1 as unknown and 0 as not applicable (or whatever), but with the date data type I'm not sure what we would use. We could use real nulls but I hate the idea. We could use pseudos such as 1/1/1900 but those too often end up creating classic date duration miscalculations such as this: "Hey why is that order 110 years overdue?". I think I'm liking the real nulls better.

I'd really like to know what folks think of this.

DataWhisperer

Posts : 1
Join date : 2010-03-25

View user profile

Back to top Go down

"surrogate key is an artificial or synthetic key"

Post  Alan Musnikow on Mon Jun 28, 2010 12:17 am

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence states on page 285:
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.
If "SK" means "surrogate key", a Date data type cannot be an SK of a date dimension since a date is a date and not a surrogate for a date. Nevertheless, a Date data type could be a primary key for a date dimension table and a foreign key to that table from fact tables.
avatar
Alan Musnikow

Posts : 6
Join date : 2010-06-17
Location : Lexington, Massachusetts, U.S.

View user profile http://musnikow.com/

Back to top Go down

Performance on Joins using the 3 Byte Date Data Type

Post  Jeff Smith on Wed Jul 21, 2010 1:16 pm

I would take into consideration the performance of joining on the 3 byte date field. Integers tend to join quickly and a Date Dimension tends to be short so joins to a date dimension should be really fast. And if you had to select for all rows for a particular year, would it be faster to put a where statement on the Year field in the Date Dimension and then do the join or is it quicker for SQL Server 2008 to convert the 3 Byte Date field into a Year value and then apply the join? If you had an index on a Date Field in the fact table and you wanted all of the rows for 2008, would it know to grab the rows from 1/1/2008 to 12/31/2008 or would it have to convert all of the dates to the Year and then look at every row, effectively ignoring the indexing?

Try it. Do a select with the filter on YEAR(Datefield) = 2008 vs where Datefield between 1/1/2008 and 12/31/2009. Also try it with the date field joining to the date dimension, ect.

Just for giggles, if you are experimenting, try making the date dimension an indexed view and join on the fact table with an integer. I would be interested to see if there is any performance improvement, which I doubt because the date dimension is so small but it would still be iunterested.

By the way, if you're using SQL 2008, shouldn't all of your queries be going against the Cubes?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

DateKey in DimDate

Post  itcouple on Wed Oct 13, 2010 9:21 am

Hi

Today my partner asked me why we need DateKey in DimDate and I thought because integer is smaller than datetime but now we have date which is smaller than integer, and I was given link to this topic which seems to be very relevant.

One of the comments is "A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key."

In my opinion Date shouldn't follow this rule and my reasoning is that although natural keys are substituted with surrogate keys because they may be occasionally 'unstable' and several other reason Date is an expeption because it is and always will be fixed. 1st of Jan 2010 will always be 1st of Jan 2010.

I think Date as PK might also simplify certain processes but also it is important to have backward compatibility (old approach) so I doubt this approach will suddenly flood all systems.

Regards
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  Jeff Smith on Wed Oct 13, 2010 10:28 am

What value would you use in your fact/date dimension table to represent a null date?

I think date dimensions are unlike any other type of dimension. I think dates are frequently used to create partitions so using a pure surrogate key in the date dimension can make it harder to define the partitions.

My date dimension uses a pseudo surrogate. It's an integer with the earliest date in the dimension set to 1, the next oldest set to 2, etc. Some prefer using an integer representation of the date, for example 20101013. I think my approach allows me to substract date dimension values from each other to calculate the number of days. But that's just me.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  ngalemmo on Wed Oct 13, 2010 12:08 pm

The main reason for using a surrogate key is to allow for invalid or unexpected values. Using a date data type as a key does not provide a means to reference invalid dates. Physical size of the data field itself (date versus number) is insignificant and is not a reasonable argument to endorse one approach over the other. Neither is 'eliminating joins', in that, if the database properly implements a star join strategy (many of them do, including SS 2008) the greater number of dimensions help, rather than hinder, performance.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  arowshan on Mon Oct 15, 2012 8:30 pm

I understand what the idea is behind using SK's for dimension tables. However, normally a date dimension has a SK, a column of type date/datetime and a character column for the name. One thing I have a hard time understanding is in the case of a date dimension is that even when using a SK you still need to populate the column of type date with a fake date. For example if you have SK of -1 for Unknown, you are still forced to have say 1/1/1900 for the column of type date in the date dimension table. In other words, even when using a SK, you are still forced to store fake dates for invalid dates. The only differece is that instead of putting that fake date in your fact table you are putting it in your dimension table unless you don't even bother with a date/datetime column in the date dimension and only have a character field for the name.

Am I losing my mind of I have a point here


arowshan

Posts : 23
Join date : 2011-10-18
Location : Vancouver, Canada

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  TheNJDevil on Tue Oct 16, 2012 9:38 am

my DateDim table has date_key (int)(not null), date_type (varchar(10))(not null), full_date (datetime)(null), YYYYMMDD (char(8))(null), and a bunch of other date breakdowns. I've pre-populated my date dim with everything within our companies valid date range as well as 100 years into the future. Since full_date is nullable, when I come across date not found, I populate a new record with the next negative SK, and fill in what I can. My blank date (-1, 'INVALID', null, ' ',.....). Hope this makes sense.

TheNJDevil

Posts : 68
Join date : 2011-03-01

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum