Dealing with System upgrades/supplier renumbering.

View previous topic View next topic Go down

Dealing with System upgrades/supplier renumbering.

Post  KevinGurney on Tue Nov 24, 2009 8:14 am

I'm very new to the Data Warehousing arena from an OLTP background and have just had the sudden realization (as I'm sure many do) that it's not simply a case of OLTP with dimensions and surrogate keys

My problem is such:
1. We have a system that has been through three upgrades in the past couple of years.
2. For each upgrade, the suppliers have been renumbered.
For example, what was 1000 under system 1 can now be 3948 in system 2 and 3333 in system 3

The crux here, is the business keys have effectively changed so SCD are tricky (at least with my current understanding).

I believe some kind of mapping table would be useful, however I cannot see how to get that to point multiple records to a single surrogate key is possible.

Also, retaining history of the previously used supplier numbers is beyond me at present.



How do I go about dealing with this? Any thoughts and/or pointers to existing articles greatly appreciated.


p.s. My toolset is limited to SQL Server/SSIS 2005

Thanks,
Kevin.

KevinGurney

Posts : 1
Join date : 2009-11-24

View user profile

Back to top Go down

Re: Dealing with System upgrades/supplier renumbering.

Post  ngalemmo on Tue Nov 24, 2009 12:13 pm

Probably the most important distinction between desiging an OLTP system and a data warehouse model is the role of primary keys. In an OLTP model, a primary key is all important and identifies a specific member (customer, product, etc...) whereas in a dimensional DW model, the purpose of a primary key is to identify a row. The PK-FK relationship in a dimensional model simply defines the relationship between facts and a collection of attribute values residing in a dimension table's row. It is the attribute values, not the key, that provide the basis of integration of data.

When I design a dimension table that needs to receive data from multiple systems, I include 3 sets of keys: a single surrogate primary key, a single natural key and one or more update keys, one for each active source system that provides attributes to the dimension. The natural and update keys are strings and are set to a concatenation of a source system identifier and whatever business key values appropriate for the source. Using a source identifier ensures unique values.

When loading facts, you do the normal lookup based on the natural key to get the associated primary key. If the same customer appears in different systems, there will be multiple rows for that customer, but each row will have a unique natural key so that facts from a particular source will point to rows for that customer from the same source.

The update keys are set to the appropriate natural key values based on a cross-reference to the different sources. Update keys are non-unique. The dimensional update from each source would used the approriate update key to locate and update rows. The various rows for that customer would assume the attribute values of the designated source regardless of where the customer row originated. One advantage of this approach is it is tollerant of errors in the cross reference. If a cross-reference changes, it is simply a matter of correcting the update key value. Subsequent dimension updates would then reset the attribute values of the errant row. There is no need to refresh the PK-FK relationship.

When you have multiple live sources for a dimension, you are still faced with developing the business rules as to which attributes are maintained from which source. You could, potentially, have different versions of the same attribute depending on the source. This is a challenge that needs to be worked out with the business. There is no getting around that.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum