Non-durable natural keys

View previous topic View next topic Go down

Non-durable natural keys

Post  indignatz on Fri Mar 13, 2009 11:45 am

Hi all,
I'm having an issue with an employee dimension whereby the typical natural key (employee id) is not durable. That being the case, and not being able to use the name as a natural key, I'm a bit stuck as to what I should use for a business key in this dimension.

One thought I had is to create my own business key and use a mapping table to map it back to the various employee ids associated with an entity, but I'm definitely open to other suggestions.

Thanks,
Mark

indignatz

Posts : 4
Join date : 2009-03-13

View user profile

Back to top Go down

Add an SK

Post  Todd McDermid on Fri Mar 13, 2009 1:12 pm

I had exactly the same issue with a product table in our system. The solution happens to be simple - if your OLTP system and admins will allow it! Have them add a meaningless, "invisible" record number/identity/SK to the table. They just need to add a column where they can guarantee that a value will always be assigned when a row is "created", will never be altered, and the value will never be repeated (i.e. an identity type column). I had to refer to that column as "invisible" because I had to reassure them that NOONE was required or desired to use it or see it - except me when doing ETL.
avatar
Todd McDermid

Posts : 11
Join date : 2009-02-04
Location : Nanaimo, BC

View user profile http://toddmcdermid.blogspot.com

Back to top Go down

Re: Non-durable natural keys

Post  BrianJarrett on Fri Mar 13, 2009 1:45 pm

Todd has a great idea, provided you can get them to agree to it. I think that would make things very simple for you on the ETL side.

If you can't convince them to do that, then can you identify some sort of composite key from other attributes on the record?

Just out of curiosity, what's causing your natural key to be so unpredictable? Is it rolling over or is it just not unique across the enterprise, something like that?
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: Non-durable natural keys

Post  indignatz on Fri Mar 13, 2009 2:47 pm

Unfortunately the source system is SAP so I don't know that I'll be able to get your solution implemented Todd. And, if I'm understanding you correctly, it wouldn't work anyways as there are multiple rows, with timestamps, in the source system for any given id to track attribute changes. Add to that, that the ids are not updated when changed, but new rows are created with the new id. To further confuse the situation, effective timestamps for each id overlap in the source.

Brian: My understanding is that certain employee ids were retired at some point in the past, but they will still appear in the historical subset of data we'll be loading. Unfortunately, there's not enough data in the source table to create a composite key of any kind.

I realize that the etl to manage this would likely be horribly complex, but is there any merit to my idea of creating my own key and using a mapping table?

Thanks,
Mark

indignatz

Posts : 4
Join date : 2009-03-13

View user profile

Back to top Go down

Re: Non-durable natural keys

Post  BrianJarrett on Fri Mar 13, 2009 11:03 pm

I'm curious how you plan to perform the mapping. I would imagine you'd need to use some sort of matching criteria to map multiple source keys back to a single custom-generated key in your mapping table. Couldn't you just use that same logic to find the record you need to update in the warehouse and then perform your inserts/updates?

I've never had to create my own keys so I might not understand this process entirely. Feel free to elaborate and/or correct me if I'm wrong.
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: Non-durable natural keys

Post  BoxesAndLines on Sat Mar 14, 2009 11:36 am

+1 on Brian's comments. You either have enough data to create a unique key or you don't. Here's an idea for you.
The problem with today's ERP applications today is that when they are sold, they are sold as solving the problem managing all of your employees, vendors, customers, etc. The reality is that these applications solve these problems only if the company improves their processes for managing these entities. Every client I have consulted at with a CRM application (Siebel for example) still have redundant contact information stored in their database because their is little to no control over the entry of new information into the application. It's likely that this situation exists in your source SAP application too.

I would invest in a data quality software that performs name and address cleansing. Use this software to standardize all the employee names and addresses. These will become your matching attributes. You can even store the latest SAP identifier as a lookup key. Build out party data structure or if your going to a dimensional models, an employee dimension and address dimension. Don't store any employee information in the address dimension. I do like to store the address dimension PK's in my employee dimension, but I also hook my facts directly to the address dimension.

The downside with this approach is data quality software is expensive. If you don't already have it, it may be a hard sell to buy it. The reality is though, all DW's need this software. The data quality on addresses and names coming from the OLTP applications is horrible. And, it's getting worse. In fact, I know many companies who have gotten rid of their application data modelers and rely solely on DBA's today. The upside to this development is the data warehouse is in need of modeling expertise more than ever. As Dr. Kimball said at a conference a few years back, "It's a good business to be in".

The upside to this approach is you've started a grass roots master data management solution. Once the business recognizes that you have standard, cleansed addresses, integrated with KPI's, they'll beat a path to your door.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Non-durable natural keys

Post  ngalemmo on Tue May 19, 2009 2:25 pm

In the case of SAP, PERNR is the stable employee identifier... and if you have old employees that were loaded into SAP, they MUST have a PERNR.

But, if you mean SAP/BW as your source, and these old employees were simply 'dumped' into BW from whence they came (a very common practice), then you need to abstract your natural key.

As a rule, I always use a string for the natural key field and build a value based on source system and business key. So, in this case, for SAP employees, I would build a string such as 'SAP|99999' (where 99999 is the PERNR) and for the old employees, use 'OLD|xxxxxxx' (where xxxxxx is the old identifier from the old system). I would also add an update key column, which would be set to the natural key when the row is first loaded.

All fact joins would be based on the natural key, while all dimension updates would be controlled by the update key. Now, if the old employee is rehired, he/she would be assigned a PERNR and a new row would go into the dimension table. If you have some way of finding out that this is a rehire and what the old employee ID was, you would update the update key of the old employee row with the correct SAP natural key. Future dimension updates would then update the old row with current SAP information, effectively integrating the old employee history with current SAP activity.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Non-durable natural keys

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum