Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Dimension Crosswalk?

+2
hkandpal
SnowShine429
6 posters

Go down

Dimension Crosswalk? Empty Dimension Crosswalk?

Post  SnowShine429 Tue Sep 30, 2014 9:55 pm

hi all - we are a "best of breed" type of shop and we have 4 systems - one for each of the following applications
1. ERP
2. Payroll
3. CRM
4. Time entry

We are building a data mart on top of these systems and we are having so many issues when consolidating data from these systems.

Just to give a simple example, our employees are not named exactly the same in our time entry application and payroll application - so when we pull data from both these systems and "match" by employee name, more than 20% of the employee don't match and this has become a nightmare for us. My questions are:

1. what should we do now that we have a messy situation? I did some research and read about something called "crosswalk" but couldn't find details on how to do it.
2. what could we have done to avoid this situation? Use consistent employee ids across these systems (not sure how though as each system has its own unique ids)?

Would master data management take care of these type of issues? any thoughts would be highly appreciated!

SnowShine429

Posts : 36
Join date : 2013-02-16

Back to top Go down

Dimension Crosswalk? Empty Re:Dimension Crosswalk?

Post  hkandpal Wed Oct 01, 2014 10:26 am

Hi,

were you able to figure why the employee don't match, is it becuase of the same name spelled differantely in both the system. If it is than the best is to update/correct the master data where ever required.
THe best is to have a consistent id for all the system, but If Payroll and Time entry are from a different vendor then some times it may not be possible. One way you could do is to create a crossreference file where you map both the id and store it.

thanks

hkandpal

Posts : 113
Join date : 2010-08-16

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  SnowShine429 Wed Oct 01, 2014 11:58 am

thank you. can you please explain what you mean by master data and how i can create one?

Also, is it ok to link on names then if we don't have common ids?

SnowShine429

Posts : 36
Join date : 2013-02-16

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  hkandpal Wed Oct 01, 2014 1:23 pm

Hi,

by "master data and how i can create one" data meant the reference data or the table where the names are stored, you could update if you conclude that it is wrongly created in either Payroll or Time entry systems.
No linking on names will not work as you sometimes you could have more than one person with the same First, last name.

thanks

Himanshu

hkandpal

Posts : 113
Join date : 2010-08-16

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  TheNJDevil Wed Oct 01, 2014 2:06 pm

I had a problem along those lines with names used between different systems. First off, point out those name differences in the systems and work with the people who enter the data to try to be more consistent. Second, I noticed the last names were usually correct, but first names were different. I created a table of names and the possible aliases those names could go by. For example, Robert -> Rob, Robert -> Bob, Robert -> Bobby, Robert -> Robby, ect...

With that table, I was able to catch 94% of the name differences during the ETL process, spitting out the last 6% into an error file for manual review. Once reviewed, names were either connected in my name_alias table or corrected in one of the systems.

Yes, I know it is not perfect, but it works for us as we know we will go back and sync up the names between systems as time permits.

TheNJDevil

Posts : 68
Join date : 2011-03-01

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  ngalemmo Wed Oct 01, 2014 3:02 pm

The relationship between facts and dimensions should be based upon the source system identifiers (business keys).  If I have multiple sources for the same entity from different systems that use different business keys, I would create a single dimension table with different rows for each business key.  Facts would be associated with the dimension row of the entity from the same source (key authority) and business key.

So, using employee as an example, if the different systems use different identifiers (some may be the same (same authority), some may not) I would have multiple rows for the same employee in the dimension table, one for each authority.

When you query, queries are based on attributes, not keys.  So the challenge becomes establishing common attributes that can be used to consolidate the information across different source systems.  This challenge is much simpler than trying to consolidate the keys, as trying to do that may result in changing keys which is can be problematic and difficult to implement.  The goal is to keep the keys stable and let the attributes do the work.

The dimension table itself would contain both raw source attributes, so you an tie a row back to the original source, as well as a collection of common attributes.  These common attributes could be the result of cleansing/standardization algorithms, MDM system feed, cross references and other techniques.  None of these processes are absolute or simple, which is why you don't want to mess with the keys.  As these processes change, it becomes a matter of updating those common attributes in the dimension table and data quality improves over time.  Users would query using the common attributes.

For example, if you are using an MDM system, the MDM system generates its own key representing what it believes to be the same entity. It would produce both a cross reference (business key and MDM key) as well as a common attribute feed based on the MDM key. Have the MDM key as an attribute on the dimension table. Use the cross reference to maintain the MDM key attribute and use the MDM key attribute as the update key when loading attributes from the MDM system so it updates common attributes on all rows that share the same MDM key.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  nick_white Fri Oct 03, 2014 7:13 am

I would agree with the previous comments, regarding MDM, when applied to complex combinations of source systems that you (your company) have little control over. However, for the specific example of Employee that you gave I think you should be taking a set back and looking at some fundamental issues with how your company has set up its systems.
Every company I have ever worked for, or with, has at least one (and often several) unique identifiers for employees: Employee ID, username (for single sign on to all systems) etc. If your company has set up Payroll and Time Entry systems that don't consistently and uniquely identify employees then those systems need to be fixed.
Trying to use a DW to resolve issues with poor source system design is likely to prove a very painful process, is probably not going to be successful and is putting a sticking plaster over the problem rather than addressing the fundamental, underlying, issues.

Creating an MDM system is likely to cost 10s/100s of thousands of pounds/dollars and is still never going to give you a 100% accurate match between employee records in different systems. Giving every employee a unique employee number when you onboard them into your company and then using that in every system that holds employee data is likely to be much cheaper and will be 100% accurate in uniquely and correctly identifying employees

nick_white

Posts : 364
Join date : 2014-01-06
Location : London

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  BoxesAndLines Fri Oct 03, 2014 8:00 am

ngalemmo wrote:...
So, using employee as an example, if the different systems use different identifiers (some may be the same (same authority), some may not) I would have multiple rows for the same employee in the dimension table, one for each authority...

I've never quite understood this approach. Let's say there are 3 ERP systems. When I create a fact row and do a lookup on the Employee dimension for John Doe, what do I do? I have 3 John Doe's. ETL can't reliably link them as the user stated. They are basically distinct employees. This also bloats the dimension by tracking history for each source as well as complicating what should be simple dimension queries like, how many employees do I have in the great state of Texas.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  nick_white Fri Oct 03, 2014 8:22 am

I think the approach works as long as the Entity you aren't de-duplicating is only used for analysing other entities and is not itself being analysed e.g.

"How many sales were made by employees resident in Texas?" works fine with this approach (of not de-duplicating your Employee records) - though it does assume that all records for an employee correctly identify their state of residence. But, as you say, asking "How many employees are resident in Texas?" doesn't work unless you uniquely identify your employees

nick_white

Posts : 364
Join date : 2014-01-06
Location : London

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  ngalemmo Sat Oct 04, 2014 2:55 am

BoxesAndLines wrote:
I've never quite understood this approach.  Let's say there are 3 ERP systems.  When I create a fact row and do a lookup on the Employee dimension for John Doe, what do I do?  I have 3 John Doe's.  ETL can't reliably link them as the user stated.  They are basically distinct employees.  This also bloats the dimension by tracking history for each source as well as complicating what should be simple dimension queries like, how many employees do I have in the great state of Texas.

The fact that ETL cannot reliably link them is the reason for the approach. Let's assume there are two systems, system A has employee #123 and the system B has employee #ABC which are actually the same person. (This can be employees, customers, products, locations…). Because they cannot be reliably linked, you cannot store them in one dimension row since the linkage may be incorrect.

Instead, there are two rows, one for #123 and the other for #ABC. Among the common attributes would be an employee ID that is a derived value based on some matching algorithm (a cross reference, MDM system or whatever). If it is determined that these are in fact the same employee, both rows would be assigned the same derived employee ID along with common attribute values such as a standardized name.

To get your count of employees in Texas from the dimension you would count distinct derived employee ID's (you would never count names as two different people can have the same name).

This approach works well particularly when it is difficult to associate them, because you can change the association by changing attribute values rather than keys. If you used a single row and discover some time later the association is incorrect, you would need to somehow rekey any referencing fact to the correct employee. This can be very difficult to do, if not impossible, as you would need to know which employee the fact originally referred to. Whereas if you kept them in separate rows from the beginning, you would always know which system the fact came from and what natural key appeared in the source.


ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  BoxesAndLines Thu Oct 09, 2014 9:53 pm

So you migrate the "party id" as well as the dimension SK to the fact tables?
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  ngalemmo Fri Oct 10, 2014 2:40 am

If by 'party id' you mean the derived key that represents an entity, no. It stays in the dimension as the common identifier, it does not go in the fact table. Rows that have been matched would contain the same common identifier and other common attributes.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Dimension Crosswalk? Empty Re: Dimension Crosswalk?

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum