Using Hash keys instead of Numeric sequence keys in Dimensional Model?

View previous topic View next topic Go down

Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan on Wed May 28, 2014 9:13 am

Greetings,

I have been a proponent of the standard Dimensional Models for years and understand very well the reason for using surrogate keys that are implemented via numeric sequences. However, lately I have been getting into the Data Vault methodology which uses Hash Keys instead of numeric sequences and was wondering if anyone has used these in Dimensional Models? The benefits are apparent:

1. The hash keys are computed using the natural key value, so you can derive the values instead of looking them up.
2. They automatically provide an even distribution of values across partitions, which helps with parallel processing.
3. They will never "get out of synch" the way that Numeric sequences do.

The only disadvantage that I can think of is that these Hash Keys are 32-byte values instead of a 4 or 8 byte integer which supposedly will slow down joins dramatically.

Does anyone out there have any real world experience with this, or any experiments/performance tests that have been run on this?

Thanks,

SB

sbendayan

Posts : 6
Join date : 2013-12-20

View user profile

Back to top Go down

Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  hkandpal on Wed May 28, 2014 10:32 am

Hi ,

you may have duplicate since the hash function will return the deterministic value, which means the same natural key you will get the same hash value. This will crete problem when you are doing SCD 2 for a dimension.

thanks

hkandpal

Posts : 113
Join date : 2010-08-16

View user profile

Back to top Go down

Re:Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan on Wed May 28, 2014 12:27 pm

For the Type 2 SCD problem, you could just hash the Load Date along with the Natural key of the dimension, I think.

sbendayan

Posts : 6
Join date : 2013-12-20

View user profile

Back to top Go down

Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  ngalemmo on Wed May 28, 2014 1:53 pm

A 32 byte hash is a huge key. You won't get collisions, but performance will suffer. It really depends on where and when you want to take a performance hit: when you load the data or when users query the data.

The idea behind assigning surrogate keys using integer or bigint sequences is to take the hit one time when loading the data, rather than every time users query the data. 'Out of sync' sequences is a non-issue with proper backup and restore processes.

It's your choice.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  sbendayan on Thu May 29, 2014 1:49 pm

Yes, I would tend to agree....the size would be the one huge downside to using these keys.....what I was wondering is if anyone has tried it and how much does it really slow down the queries?

sbendayan

Posts : 6
Join date : 2013-12-20

View user profile

Back to top Go down

Re: Using Hash keys instead of Numeric sequence keys in Dimensional Model?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum