Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Using Hash keys instead of Numeric sequence keys in Dimensional Model?

3 posters

Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan Wed May 28, 2014 9:13 am

Greetings,

I have been a proponent of the standard Dimensional Models for years and understand very well the reason for using surrogate keys that are implemented via numeric sequences. However, lately I have been getting into the Data Vault methodology which uses Hash Keys instead of numeric sequences and was wondering if anyone has used these in Dimensional Models? The benefits are apparent:

1. The hash keys are computed using the natural key value, so you can derive the values instead of looking them up.
2. They automatically provide an even distribution of values across partitions, which helps with parallel processing.
3. They will never "get out of synch" the way that Numeric sequences do.

The only disadvantage that I can think of is that these Hash Keys are 32-byte values instead of a 4 or 8 byte integer which supposedly will slow down joins dramatically.

Does anyone out there have any real world experience with this, or any experiments/performance tests that have been run on this?

Thanks,

SB

sbendayan

Posts : 6
Join date : 2013-12-20

Back to top Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  hkandpal Wed May 28, 2014 10:32 am

Hi ,

you may have duplicate since the hash function will return the deterministic value, which means the same natural key you will get the same hash value. This will crete problem when you are doing SCD 2 for a dimension.

thanks

hkandpal

Posts : 113
Join date : 2010-08-16

Back to top Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan Wed May 28, 2014 12:27 pm

For the Type 2 SCD problem, you could just hash the Load Date along with the Natural key of the dimension, I think.

sbendayan

Posts : 6
Join date : 2013-12-20

Back to top Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  ngalemmo Wed May 28, 2014 1:53 pm

A 32 byte hash is a huge key. You won't get collisions, but performance will suffer. It really depends on where and when you want to take a performance hit: when you load the data or when users query the data.

The idea behind assigning surrogate keys using integer or bigint sequences is to take the hit one time when loading the data, rather than every time users query the data. 'Out of sync' sequences is a non-issue with proper backup and restore processes.

It's your choice.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan Thu May 29, 2014 1:49 pm

Yes, I would tend to agree....the size would be the one huge downside to using these keys.....what I was wondering is if anyone has tried it and how much does it really slow down the queries?

sbendayan

Posts : 6
Join date : 2013-12-20

Back to top Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model? Empty Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum