Multiple sources for the same data - which one to extract from?

View previous topic View next topic Go down

Multiple sources for the same data - which one to extract from?

Post  arowshan on Thu Oct 20, 2011 2:50 pm

Here is the situation:

The system that generates the data does not talk to a database directly. It talks to other components (systems) that end up storing the same source data in different ways. I understand that there is probably some architectural problems with this model but that is what we currently have. The question is should we build different data marts from these sources or should we choose one to be the most reliable and extract from there. The argument for extracting from all of them is that it gives end-users a way to cross check between them.

Thanks


Last edited by arowshan on Fri Oct 21, 2011 11:51 am; edited 1 time in total

arowshan

Posts : 23
Join date : 2011-10-18
Location : Vancouver, Canada

View user profile

Back to top Go down

Re: Multiple sources for the same data - which one to extract from?

Post  BoxesAndLines on Thu Oct 20, 2011 9:54 pm

Extract from all, consolidate into unified dimensional model.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Multiple sources for the same data - which one to extract from?

Post  arowshan on Fri Oct 21, 2011 11:56 am

If you are getting almost the same data let's say coin-in amount from three different sources (possibly with different grains), how would you consolidate into the same dimensional model? Is that one fact table? I mean in Kimball terms those are all representing the same business process. Could you elaborate?

arowshan

Posts : 23
Join date : 2011-10-18
Location : Vancouver, Canada

View user profile

Back to top Go down

Re: Multiple sources for the same data - which one to extract from?

Post  ngalemmo on Fri Oct 21, 2011 3:51 pm

Generally speaking, best practice is to get the data from the original source and store it at the lowest grain attainable. Not knowing your exact situation or what the other systems do to the data, it's hard to say what is the best course of action. But I would be very wary of trying to get the same data from multiple sources, all of which have manipulated the data in some manner.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Multiple sources for the same data - which one to extract from?

Post  BoxesAndLines on Fri Oct 21, 2011 7:48 pm

arowshan wrote:If you are getting almost the same data let's say coin-in amount from three different sources (possibly with different grains), how would you consolidate into the same dimensional model? Is that one fact table? I mean in Kimball terms those are all representing the same business process. Could you elaborate?

Like information should be pulled from the highest quality source. What you generally find in the different sources are different data points that add context to your measures. One dimension could be sourced from one source, another dimension could come from yet another source. In worst cases, even your measures can be sourced from multiple sources. Often times, you will find facts that only come in from one of the sources that the other sources do not contain. In this case, to get a holistic view, you need to ensure all distinct facts are loaded into a common model.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Multiple sources for the same data - which one to extract from?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum