best way for dimension table lookups, if natural key is multi-column (string)

View previous topic View next topic Go down

best way for dimension table lookups, if natural key is multi-column (string)

Post  JimBeam on Tue Sep 20, 2011 10:55 am

I have a Property table in a Real Estate Database with a compound primary key { company code (varchar), PropertyID (int), Business Entity (int) }. When I am populating the Property dimension table in the DWH, I don't want to have to lookup every single field to decide if the Record already exists, so I thought of mapping (hashing) the 3-column-natural key to a single value field but I am afraid, that collissions may sometime occur and that the data will get invalidated.

A second thought of mine was to not hash but string-concatenate the 3-column key to a single field (varchar). That would guarantee the uniqueness but I wonder how efficient that would be.

So, how would you solve this problem? Any smarter suggestions?


JimBeam

Posts : 2
Join date : 2011-09-20

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  ngalemmo on Tue Sep 20, 2011 1:40 pm

Concatenated strings work well. A hash value as a unique key is never a good idea.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  buckleyc on Wed Sep 21, 2011 9:09 am

I don't know why you wouldn't want to join on each field individually. If you concatinate the fields into one, you have to convert the integers to varchar's and what if you have companycode 001, propertyid 12, and BusinessEntity 1 as well as companycode 001, propertyid 1, and businessentity 21? The concatinated string is the same for both.

Better to join on each field. There's no good reason not to.

buckleyc

Posts : 7
Join date : 2011-09-19

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  ngalemmo on Wed Sep 21, 2011 1:23 pm

The reason to do so is to make the natural key source independent.

One extreme example in integrating two disparate order systems: In one case the items were identified by a SKU, while in the other system there were 7 different attributes that identified an item. By abstracting the natural key into a string in the staging table, you can use the same process to update the dimension and load facts regardless of the data source.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  buckleyc on Wed Sep 21, 2011 3:13 pm

Then how do you know "001121" is the same as "0011121" in my example below?
If your dimenional table has column1, column2, and column3 in it, what's the harm in joining on the three keys? Why concatinate? I agree if system 1 stores the SKU in one field and position 1-4 means x and position 5-8 means y in system 2, then concatination makes sense. I disagree with making a blanket assumption that concatinating multiple field business keys is the best way to go.

buckleyc

Posts : 7
Join date : 2011-09-19

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  VHF on Wed Sep 21, 2011 5:36 pm

You should always use a delimiter when building a concatinated key from variable-length fields.

Using the example of companycode 001, propertyid 12, and BusinessEntity 1 with a pipe character for the delimiter, the concatinated key would be 001|12|1. Gives a single unique string key with no ambiguity.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  BoxesAndLines on Thu Sep 22, 2011 9:24 am

I'm not sure you actually have a problem. Like others have said, just do a 3 column lookup. More than likely, you'll want this data in the warehouse as distinct columns. A properly created index will address any performance issues.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  VHF on Thu Sep 22, 2011 10:21 am

Either approach is fine, and neither one breaks any Kimball rules! :-) My guess is that the lookup on one string field plus two integer fields would be slightly slower than a lookup on a single (slightly longer) string field, but I wouldn't expect it to be a huge difference, and becasue this is happening during ETL and not during user queries you shouldn't be too concerned. Use whichever approach you like better.

Are you planning to index the natural key fields? As B&L indicated, an index including the natural key field(s) should you give you swift performance for both loading the dimension and resolving the SK when loading facts.

(With some database engines you may need to explicetly add the SK to your index as well to make it a "covering index" for resolving the SK while loading facts; others will do this automatically if the SK is the PK of the dimension (as it should be!) and there is a clustered primary key on it (which there should be!) Look at before-and-after execution plans as you tune you database. What you want is a single Index Seek whenever possible.)


Last edited by VHF on Thu Sep 22, 2011 10:27 am; edited 2 times in total (Reason for editing : clarification)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  hang on Thu Sep 22, 2011 4:57 pm

Like B&L and buckleyc, generally I would just take the simple approach leaving the NK as is in the dimension no matter how many columns the key is made of. Converting the NK to other format may unnecessarily complicate the ETL process which is already ridden with other heavy lifting tasks. If the dimension size is far less than 100000, it may not be even worth considering indexing at all.

I know SK can make the dimension resilient to the change of NK definition in the source system in terms of its RI with fact tables, but I am not quite sure about how transforming NK can make it independent of source system.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: best way for dimension table lookups, if natural key is multi-column (string)

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum