Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Linking or Merging Similar dimensions from Different Source Systems

+2
Jeff Smith
meb97me
6 posters

Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Thu Feb 03, 2011 12:48 pm

We've got a situation with the DW we're building where we have 2 seperate systems that we are wanting to bring the data together for cross analysis

1) The first system is the main operational system ie a contact management application that records various events between employees & customers
2) The second is a requisitioning/PO system

We have an employee dimension built from the first system however we are now facing the problem of whether to try and integrate the second systems Users in this employee dimension.

The problems we have identified are
1) Names etc are not consistant between the systems
2) completely different attributes are recorded in each system
2) There is no accurate method to match up users between the 2 systems, the first system logins use AD accounts but the second system uses manually generated logins which sometimes match up to their AD user names and sometimes don't

We're envisaging nightmares of each system trying to overwrite each others chnanges to the dimensional data because they aren't the same and then not alway being able to match the 2 up together

Would it be better to have a seperate User dimension for the second system and then have a bridging table to link (where possible) between the employees dimension from the first system and users dimension table of the second that way hopefully gleeing some useful information from the second system relating back to the first. Ie comparing susccesfull outcomes for each employee for the month against what they spent on clients that kind of thing.

hopefully that made some sort of sense

Any input very much appreciated


Last edited by meb97me on Thu Feb 10, 2011 12:50 pm; edited 1 time in total

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  Jeff Smith Thu Feb 03, 2011 1:00 pm

You can create an employee dimension and load it with data from both systems. The dimension can contain attributes from both systems or you can create an employee attribute dimension populated from the 2 systems. But what ever is done, you won't be able to link the employee from the 1 system to the other because no such cross walk exists. But, if you put the employees from the 2 systems into the same dimension and a crosswalk system is created, then it's just a matter of adding a fe more columns to the employee dimension.

The lowest level of the employee dimension becomes Source System/employee. Later on, if the cross walk gets created, you can have the Source System/Employee roll up to the Employee.

Jeff Smith

Posts : 471
Join date : 2009-02-03

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  VHF Thu Feb 03, 2011 1:29 pm

meb97me wrote:Would it be better to have a seperate User dimension for the second system and then have a bridging table to link (where possible) between the employees dimension from the first system and users dimension table of the second that way hopefully gleeing some useful information from the second system relating back to the first. Ie comparing susccesfull outcomes for each employee for the month against what they spent on clients that kind of thing.

It might be better in this case to have a separate User dimension for the second system, but a bridge table is not the right answer. (A bridge table is to connect a fact to a dimension, not to connect two dimensions.) What you need to do is conform the dimensions. Conformed dimensions are those that share at least one common attribute, which thus allows aggregated reporting across fact tables. To report across your fact tables at the employee level you need some sort of employee ID attribute that is common to the two dimensions.

Would it be possible to add a secondary identifier to one of the source systems that could reliably link employees across systems? Otherwise, you need to tackle it during ETL. Would matching on AD name/userID handle the bulk of the cases with the rest cleaned up manually?

Rather than a bridge table in DW, you might need a mapping table in the "kitchen" that would map users between the two systems. The mapping table would be used during loading of the dimensions to populate a common employee id attribute in each dimension.


Last edited by VHF on Thu Feb 03, 2011 4:39 pm; edited 2 times in total (Reason for editing : clarification; fix typo)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Diffierent Source Systems

Post  MJGascoyne Thu Feb 03, 2011 5:23 pm

Your options vary depending on the tools at your disposal , the willingness of your user community to become involved in the management of the data and the scale of your dimensions.

Assuming manageable data sizes, I would definitely retain the data in a single conformed dimension table, from a maintainability perspective it keeps your environment tidier and consistent - i.e. employee is handled and sourced from a single location. As suggested above the true key to the data in this table becomes Source_System_Id/ Native_Employee_Id (obviously it is accompanied by a surrogate key).

In terms of merging the data, the mapping table is a good solution. I would imagine the challenge here is the ongoing maintenance of that table - in which case you will need to find an owner who will stay on top of the data (and then figure out some way of giving them controlled access). In addition, if you have the time you may be able to provide routines to assist in this maintenance activity (such as making newly arriving members obvious).

An alternative approach to merging the data could be introducing some form of cleansed parent level within the dimension table itself. For example:
SK || Native_EmployeeId || Employee_Name || Source_System_Id || Cleansed_Parent_id
1 || 101 || Bob Smith || 1 || 3 -- Id of DWH generated cleansed level
2 || J87 || SmitB || 2 || 3
3 || 3 || Bob Smith || 0 -- i.e. Data Warehouse generated || NULL

Again this has implciations for maintainability (i.e. someone needs to look after the data) and also determining how best to allow people to manage the data (i.e. providing a GUI).

MJGascoyne

Posts : 2
Join date : 2011-02-03

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Fri Feb 04, 2011 10:14 am

thanks for the replys gents

After some more thought, digesting the above proposed susggestions and further discussion we think the way ahead for us is to use the current Employee dimension we have and add some additional attributes required from the 2nd PO system.

We will then attempt to match up the 2 systems employee records using their logins. However for this to work we are going to ask the business to align the logins for the 2nd PO system to match that of the persons AD account.(this hopefully shouldn't be a problem, and from a technical point of view wont cause any problems with data mapped to that user as a seperateid userid is used in the tables to mark data assigned to a particular user)

That way if the persons already exisits as a user of the first Contact Management system we will be able to add the additional attribute field valus to their record from teh 2nd system.

A slight issue we will have is that we will have a lot of users who are set up in the 2nd system but not the first. If thats the case then we will create a new employee dimension record for them based using what data we have from the 2nd system. However if they subsequently get set up as a user of the Contact System we will use that as the primary source for maintaining any fields which are common between the 2 system ie FirstName, Last Name, Department etc

We're also going to need to store both natural keys in the record So we should have something like this

EmployeeSKContactSystemIDPOSystemIDUsername
112345 mjackson-- This person is only present in the Contact System
2 ABCDEmmouse-- This person is only present in the PO System
367890FGHIJdduck-- This person is present in both systems and identified by a common username

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  Sideout72 Thu Feb 10, 2011 12:41 pm

I had a similar problem (in fact, multiple cases of the problem, where multiple systems needed have conformed data). As an example, we had 3 source systems, each of them had lists of Carriers (kind of like vendor lists), where wanted to see a single view of all business done with a Carrier.

In the end, I built a mapping application that allowed designated end users to "map" the Carriers together. It used two or three mapping tables to store the mapping information (2 if the mapping itself needed to be SCD type 1, 3 if the mapping needed to be type 2). To make a long story short (as I could get windy) - the ETL would detect if a mapping was already in place between the source system Carrier and the list of conformed Carriers. If it already existed, there was nothing to do. If it didn't exist, it would try some fuzzy logic to determine if the carrier already existed, and map the source system carrier to the conformed carrier. If it couldn't find a match, it would (a) create a new conformed carrier, (b) create the source system carrier and map it to the new conformed carrier, (c) send an email to an appropriate user group (I ran the data governance committee, so I was given all the power to dole out responsible for data integrity tasks) notifying them that a new carrier was create that couldn't be mapped properly, and they would need to use the mapping application.

The mapping application was used by end users to "fix" any of the mappings the ETL tool created. For instance, if two carriers were supposed to be conformed, the end user would take both source carriers and map them to the same conformed carrier. The next time the ETL ran, the data from both source systems for that carrier would appear to be from the same carrier in the data warehouse.

This was a simplified explanation. In fact, the type 2 SCM (slowly changing mapping) method I used was quite original, fast, and allowed lots of control about how entities mapped to each other, rolled up (grouped in parent child relationships), and changed over time. The mapping application itself took about 5 days (one person) to build quickly from scratch using VB, and 1 day (one person) every time there was a new dimension that needed to be added to the process. The ETL process around the conforming of the dimensions took about 2 extra days per dimension (one person) that needed to be conformed using this methodology. In the grand scheme of things, it was a huge win for minimal effort.

If you happen to be interested in more detail, feel free either post here or send me a private message, and I can send on additional information about the solution.

Good luck!

Sideout72

Posts : 4
Join date : 2011-02-09

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Thu Feb 10, 2011 12:55 pm

Sideout72 that's sounds like a great solution! we had considered a mapping application however we're having problems at present trying to get the business to take responsibility for data governance. No one seems willing to step up and take responsibility. (anyway thats a side issue!)

I'd certainly be interested in hearing more details as i'm sure will many other so feel free "to get windy" as it were

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty An even better way!

Post  Al Wood Fri Feb 11, 2011 11:08 am

I think there is an even better way!

http://forum.kimballgroup.com/t908-a-dozen-systems-with-many-dimension-codes

This method allows facts to be loaded when there is no dimension row, but without the need to use the "UNKNOWN" dimension record.
Basically, new dimension entities, e.g. new staff are just loaded into the dimension with their natural keys, so you can report off the fact tables and see a name, even if they are not conformed exactly yet. Yet at the same time there is no risk that two different people will get reported as one if they accidentally have the same natural key in different systems.

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Fri Feb 11, 2011 11:58 am

i dont think that approach would work because as i understand it i would have a dimnsion table looking like this



EmployeeSKEmployeeKeyEmployeeNameDomain Account
1SOURCEA:EMPA John Smith jsmith
2SOURCEA:EMP99 J Smithjsmith

So if i wanted to produce a staff list there would be duplicates for John Smith even if they were the same person

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  Al Wood Fri Feb 11, 2011 12:41 pm

EmployeeSK Source_System SourceNaturalKey Domain Account Reporting Name (updated by Data Steward)
1SOURCE_A John Smith jsmith Mr John Smith
2 SOURCE_B J Smith jsmith Mr John Smith

Then some more people join but the SOURCE_A dimension update can't retreive a full name:
EmployeeSK Source_SystemSourceNaturalKey Domain Account Reporting Name (updated by Data Steward)
1SOURCE_A John Smith jsmith Mr John Smith
2 SOURCE_B J Smith jsmith Mr John Smith
3 SOURCE_A N Smith nsmith N Smith
4 SOURCE_A J Smith jmsmith J Smith

A few days later a data steward gets around to it:
EmployeeSKSource_System SourceNaturalKey Domain Account Reporting Name (updated by Data Steward)
1SOURCE_A John Smith jsmith Mr John Smith
2 SOURCE_B J Smith jsmith Mr John Smith
3 SOURCE_A N Smith nsmith Mr Nathan Smith
4 SOURCE_A J Smith jmsmith Mrs Jane Mary Smith

Just make sure all your reports use the [Reporting Name ... ] field. And a staff list uses DISTINCT [Reporting Name ... ]

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  VHF Fri Feb 11, 2011 12:51 pm

meb97me wrote:So if i wanted to produce a staff list there would be duplicates for John Smith even if they were the same person

Not if you aggregate on Domain Account (or Reporting Name or some other attribute that uniquely indentifies an employee accross the enterprise). You need to do a GROUP BY in the SQL to ensure no duplicates.

One advantage to having a separate dimension record for each source system is if there was a mistake it can be corrected later in the dimension without having to update any fact records. The previous post alludes to this with the example of data cleanup.

I think the multiple record approach is most advantages when dealing with very large dimensions and especially if MDM is involved. For example, in a system with millions of customer records an MDM system might figure out later on that what it "thought" were two people are really the same person. No need to update any fact records--just update the dimension records as needed.

When dealing with a more modest dimension I think both approaches are valid. I'm facing this issue with my (modest) customer dimension (a few thousand records) and right now I'm leaning towrds one record per customer with a natural key for each source system.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Fri Feb 11, 2011 2:55 pm

Rather than reporting using SQL directly out of the data warehouse the duplicate person become apparent once you get to the cube (SSAS in my case) as the key of the dimension is based on the surrogate key so if you drag the person attribute into a pivot in excel or view the dimension you will see multiple versions of the same person.

Maybe i'm getting hung up on this point and its not really an issue but I can see that if we are going to be promoting self service for the BI project and users drag a list of employees out for their department say, they are going to complain as soon as they see multiple versions of the same person.

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  VHF Fri Feb 11, 2011 3:34 pm

You are taking the right approach in considering what users are going to see with self-service BI. When weighing two alternative designs, one of the most important points to consider is how each option will affect the final output. After all, the underlying purpose of DW/BI is to make information available to the users in a meaningful way!



VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  meb97me Fri Feb 11, 2011 3:54 pm

cheers VHF i'm glad someone can confirm that i'm actually thinking about things in the right way (well sort of :))

However I've just thought that really we're going to be in the same situation with any of the dimensions which have Type 2 change attributes and therefore multiple versions of themselves over time.

i guess this maybe more about the dimension design in the cube software by creating attributes & appropriate keys??

meb97me

Posts : 34
Join date : 2010-07-28

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  Al Wood Mon Feb 14, 2011 5:38 am

Hi,

I've only built a few cubes, but I feel sure there is a way to build a cube so a chosen attribute e.g. [Full unique name] is used as the key by the cube. I would say that the surrogate key values from the fact/dim structure are not useful information to display in the cube anyway.

Al Wood

Al Wood

Posts : 46
Join date : 2010-12-08

Back to top Go down

Linking or Merging Similar dimensions from Different Source Systems Empty Re: Linking or Merging Similar dimensions from Different Source Systems

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum