Conforming Dimensions - Standardising, De-duplicating and Suvivorship

View previous topic View next topic Go down

Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  johnryan on Wed Sep 23, 2009 11:02 am

Hi,

I'm currently reading the DW ETL toolkit, which seems to have some excellent ideas. However, as it doesn't come with any downloads (e.g example SSIS packages) - I'm struggling to understand a few things. If anyone can answer the following it would be most appreciated:

1)2 sources for a dimenion interests me. Am I right in understanding that the issues here are essentially that we could have an attribute (e.g. customer location for the Customer Dim) in both data sources. Therefore, we use surviorship here to favour one data source (for that attribute only) over the other data source?

2)The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?

If anyone does know of any examples of this on the web - that would be most appreciated also

Thanks
John

johnryan

Posts : 3
Join date : 2009-09-23

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  BoxesAndLines on Wed Sep 23, 2009 10:35 pm

Depends. If the addresses have the same meaning, i.e. they are both billing addresses then I would apply a survivorship rule. If the addresses represent different meanings, i.e. shipping and billing, I would store both addresses.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  johnryan on Thu Sep 24, 2009 7:21 am

Thanks for the reply.

So in the case that they are both billing address, is the surviorship rule as simple as just saying always take the address from Data Source A?

Also - I'm still unsure - The Kimball surviviroship support meta data (pg 159) - is this something that someone must populate manually, or can it be populated automatically during the ETL?

johnryan

Posts : 3
Join date : 2009-09-23

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  BoxesAndLines on Thu Sep 24, 2009 8:38 am

It could be source A, it could be most recent, it could be composite, it could be the validated address. There are all sorts of survivorship rules you can define.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  ngalemmo on Thu Sep 24, 2009 11:08 am

The issues you are bringing up are what Master Data Management (MDM) tools are all about. Things can get very complicated very quicky which is why most MDM solutions use rules based engines to evaluate the sources and figure out what to do. I would only attempt to do such tasks in ETL if they are simple and very clearly defined by the business. Also, if I was to do it in ETL, I would design the process where no data is destroyed so that the inevitable "gee... did we say do that?" issues can be resolved without a lot of pain.

Even with an MDM system, you should expect some level of error as well as ongoing reworking of the rules which may adversly affect the DW content. As a rule, where I have multiple independent sources for a dimension, I maintain rows for each source (to retain source identity) and use attributes to 'integrate' enities across sources.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  johnryan on Mon Sep 28, 2009 10:52 am

Thanks for the replies,

I'd presumed that this is all possible without MDM - but I guess then it can get to a point where it just gets too complex without MDM.

As a final request is anyone aware of any downloadable examples of this implemented within the ETL? E.g. an SSIS package?

johnryan

Posts : 3
Join date : 2009-09-23

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  BoxesAndLines on Tue Sep 29, 2009 6:48 pm

It is possible without a MDM application. In fact I read the other day that the majority of MDM/CDI implementations are in-house solutions.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Conforming Dimensions - Standardising, De-duplicating and Suvivorship

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum