SQL Server 2008 Date data type as dimension key

View previous topic View next topic Go down

SQL Server 2008 Date data type as dimension key

Post  VHF on Fri Feb 12, 2010 11:42 am

We are getting ready to migrate from Microsoft SQL Server 2005 to 2008. Currently we use an integer (in YYYYMMDD format) as the SK on our date dimension table. I am very tempted to start using the new 3-byte Date data type (which stores only the date with no time information) as the SK for the date dimension.

Anyone have first-hand experience with the new SQL 2008 Date data type? Have you bumped into any tools that don't support it properly?

VHF

Posts: 220
Join date: 2009-04-28
Location: Wisconsin, US

View user profile

Back to top Go down

SQL Server 2008 Date data type as dimension key

Post  DataWhisperer on Thu Mar 25, 2010 1:04 pm

I think that this is a good idea because in 90% of our queries this would eliminate the need to join to the date dimension. Sometimes a date is just a date. We are modeling processes which are date intensive and so many of our fact tables require many (6 to 10) joins to the date dimension. Too many joins can slow down a query no matter how efficient they are. Besides I think that using the date datatype as the SK would offer the best of both worlds. You could join to the date dimension if you need to, but only when you need to. We could even use our existing date dimension as is because we already have a unique constraint on the DateValue field and it is already the date data type.

To me the real issue would be how to deal with pseudo nulls. With ints we can use -1 as unknown and 0 as not applicable (or whatever), but with the date data type I'm not sure what we would use. We could use real nulls but I hate the idea. We could use pseudos such as 1/1/1900 but those too often end up creating classic date duration miscalculations such as this: "Hey why is that order 110 years overdue?". I think I'm liking the real nulls better.

I'd really like to know what folks think of this.

DataWhisperer

Posts: 1
Join date: 2010-03-25

View user profile

Back to top Go down

"surrogate key is an artificial or synthetic key"

Post  Alan Musnikow on Sun Jun 27, 2010 9:17 pm

The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence states on page 285:
A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key.

If "SK" means "surrogate key", a Date data type cannot be an SK of a date dimension since a date is a date and not a surrogate for a date. Nevertheless, a Date data type could be a primary key for a date dimension table and a foreign key to that table from fact tables.

Alan Musnikow

Posts: 6
Join date: 2010-06-17
Location: Lexington, Massachusetts, U.S.

View user profile http://musnikow.com/

Back to top Go down

Performance on Joins using the 3 Byte Date Data Type

Post  Jeff Smith on Wed Jul 21, 2010 10:16 am

I would take into consideration the performance of joining on the 3 byte date field. Integers tend to join quickly and a Date Dimension tends to be short so joins to a date dimension should be really fast. And if you had to select for all rows for a particular year, would it be faster to put a where statement on the Year field in the Date Dimension and then do the join or is it quicker for SQL Server 2008 to convert the 3 Byte Date field into a Year value and then apply the join? If you had an index on a Date Field in the fact table and you wanted all of the rows for 2008, would it know to grab the rows from 1/1/2008 to 12/31/2008 or would it have to convert all of the dates to the Year and then look at every row, effectively ignoring the indexing?

Try it. Do a select with the filter on YEAR(Datefield) = 2008 vs where Datefield between 1/1/2008 and 12/31/2009. Also try it with the date field joining to the date dimension, ect.

Just for giggles, if you are experimenting, try making the date dimension an indexed view and join on the fact table with an integer. I would be interested to see if there is any performance improvement, which I doubt because the date dimension is so small but it would still be iunterested.

By the way, if you're using SQL 2008, shouldn't all of your queries be going against the Cubes?

Jeff Smith

Posts: 311
Join date: 2009-02-03

View user profile

Back to top Go down

DateKey in DimDate

Post  itcouple on Wed Oct 13, 2010 6:21 am

Hi

Today my partner asked me why we need DateKey in DimDate and I thought because integer is smaller than datetime but now we have date which is smaller than integer, and I was given link to this topic which seems to be very relevant.

One of the comments is "A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key."

In my opinion Date shouldn't follow this rule and my reasoning is that although natural keys are substituted with surrogate keys because they may be occasionally 'unstable' and several other reason Date is an expeption because it is and always will be fixed. 1st of Jan 2010 will always be 1st of Jan 2010.

I think Date as PK might also simplify certain processes but also it is important to have backward compatibility (old approach) so I doubt this approach will suddenly flood all systems.

Regards
Emil

itcouple

Posts: 28
Join date: 2010-10-13

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  Jeff Smith on Wed Oct 13, 2010 7:28 am

What value would you use in your fact/date dimension table to represent a null date?

I think date dimensions are unlike any other type of dimension. I think dates are frequently used to create partitions so using a pure surrogate key in the date dimension can make it harder to define the partitions.

My date dimension uses a pseudo surrogate. It's an integer with the earliest date in the dimension set to 1, the next oldest set to 2, etc. Some prefer using an integer representation of the date, for example 20101013. I think my approach allows me to substract date dimension values from each other to calculate the number of days. But that's just me.

Jeff Smith

Posts: 311
Join date: 2009-02-03

View user profile

Back to top Go down

Re: SQL Server 2008 Date data type as dimension key

Post  ngalemmo on Wed Oct 13, 2010 9:08 am

The main reason for using a surrogate key is to allow for invalid or unexpected values. Using a date data type as a key does not provide a means to reference invalid dates. Physical size of the data field itself (date versus number) is insignificant and is not a reasonable argument to endorse one approach over the other. Neither is 'eliminating joins', in that, if the database properly implements a star join strategy (many of them do, including SS 2008) the greater number of dimensions help, rather than hinder, performance.

ngalemmo

Posts: 1732
Join date: 2009-05-15
Location: Los Angeles

View user profile http://aginity.com

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum