Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

A dozen systems with many dimension codes

3 posters

Go down

A dozen systems with many dimension codes Empty A dozen systems with many dimension codes

Post  Al Wood Mon Jan 10, 2011 9:55 am

Hi,

I have more than a dozen systems to integrate, and each has it's own version of dimension codes. The source systems need to use dummy codes, and multiple codes for one entity. This for operational reasons.
So, to conform a dimension I have a mapping table for each source system. These are to be maintained by Data Stewards. Each lookup table contains the same surrogate key values as the master table. The SSIS lookups use these mapping tables, not the Master DW dimension table.

The problem I have is:
When a lookup value is not found, do I load new natural keys into the Source System Lookup table, or the Master DW Dimension table, or both? For the data to be available I need to load at least the Master DW Dimension table, then I must de-duplicate when the same entity comes through from a different source system. But what if a different entity comes through from a different source with a code that has ALREADY been loaded from the first source system? That would cause a problem.

I think the best I can do is to load the lookup (mapping) tables with the natural key as a placeholder, and insert a generated unique value in the master, e.g. "STAFFMEMBER_A.Lien_FROM_SYSTEM_X". Then when/if we find out that its the same person in many systems, we deduplicate the master, and map the new entity for each system where it occurs. Can anyone think of a better way?

Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  ngalemmo Mon Jan 10, 2011 12:01 pm

Make the natural key to the dimension a source identifier and the code, ensuring unique values. Store a row in the dimension for each natural key. Relate facts to the row based on source and code. Resolve cross reference in the dimension using 'standard' attributes.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  Al Wood Mon Jan 10, 2011 1:54 pm

Thanks for the response.

Store a row in the dimension for each natural key

Doesn't this mean that a staff member will have multiple surrogate keys in the dimension table?

Is that a good way to design the DW?

Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  ngalemmo Tue Jan 11, 2011 1:03 pm

If staff member is the dimension and it is coming in from different sources with different identifiers, then yes. One row for each unique source/identifier.

Include columns to hold the common identifier and the common attributes. Rows that map to the same staff member would hold the same values in the common attributes. Use the common attributes in queries to consolidate information for the same staff member.

Since you tie facts from a source based on the identifier in that source, you ensure a correct and stable assignment. Changes to the cross reference only affect the attributes in the dimension, not the relationship between facts and the dimension.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  Al Wood Thu Jan 13, 2011 8:20 am

Hi,

Thanks again. That's very denormalised; quite a shock for someone coming from a 3NF database world!
I'm working through example scenarios with this design and I'm finding that it's a huge simplification and easy to load if I use SSIS with "readsert" lookups.

Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

A dozen systems with many dimension codes Empty But what if..

Post  Al Wood Fri Jan 14, 2011 2:41 pm

Hi,

I'm wondering what you would do if one of the source systems needed to use two fields to uniquely identify the staff member, but most of them used just one field. Would you add a field to the dimension table?

I'm also wondering if this design can be used for the Location dimension? Customers? Products?

Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  BoxesAndLines Fri Jan 14, 2011 4:20 pm

You need the natural key for lookups during the ETL process. If that's 1,2,3, or 4 columns, you'll need them all. Alternatively, you can manage the deduplication in the ETL process and present the consolidated dimension. Depending on the volume of duplicates, this can offer some performance advantages in reporting.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  ngalemmo Sun Jan 16, 2011 2:22 am

Al Wood wrote:Hi,

I'm wondering what you would do if one of the source systems needed to use two fields to uniquely identify the staff member, but most of them used just one field. Would you add a field to the dimension table?

I'm also wondering if this design can be used for the Location dimension? Customers? Products?

Al Wood

As a rule, I abstract my natural keys. When I design a dimension table I always have a varchar column for holding the natural key in addition to columns for attributes. The natural key column contains a delimited concatenation of whatever source values make up the key. The concatenation is done during staging, since the process that extracts to staging is always source specific, so it can contain specific code to build the natural key values. The code that loads from staging can be source agnostic since it only has to deal with the abstract natural keys to lookup surrogate keys.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

A dozen systems with many dimension codes Empty Re: A dozen systems with many dimension codes

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum