Business Logic: DWH vs. Source system

View previous topic View next topic Go down

Business Logic: DWH vs. Source system

Post  inglev on Tue Feb 17, 2009 4:07 am

Hello everyone,

We are currently implementing a Data Warehouse (consolidating data from several source systems) and we had an argument on where the business logic should reside.

Simplified example: The source contains the fields “amount”, a “discount” and a “total amount”. The “total amount” is supposed to be the “amount”*(1-“discount”), but for us, the DWH team, it is already available as a readily loadable fixed field.
So, should we (a) just take the “total amount” field or (b) take the raw fields “amount” and “discount” to calculate the “total amount” in the ETL?

In the first case, one would basically not bother about how the field is derived and leave its logic to the source system. However, I reckon there will be almost certainly records where the total amount != amount*(1-discount) leaving to inconsistencies if people start to compare the source system-delivered “total amount” with the “actual amounts” and “discounts”.

On the other hand, doing the calculation in the ETL would at least ensure consistency as it would capture the most detailed data. However, the DWH would be not just a dispositive system anymore. I would say that this is ok – in fact, that the DWH is actually supposed to be the central repository of business rules and that it is easier and more consistent than relying on a potentially unstable source system.


So, what’s the best way?


Btw- I know that there is always the possibility to load both and leave it up to the users which field to choose, but my question is more of a general nature on what the best practice would prescribe.

inglev

Posts : 3
Join date : 2009-02-17

View user profile

Back to top Go down

Re: Business Logic: DWH vs. Source system

Post  BrianJarrett on Tue Feb 17, 2009 7:39 am

I'm not sure that there's an official "best practice" for something like this (of course there might be and I could be unaware of it). But if it were up to me I think I would use the pre-calculated source system total field, provided I could ensure its accuracy. Using that field most accurately represents your system of record and eliminates the possibility of introducing math errors. It also saves you a step (albeit a simple one) by eliminating the need to calculate.

However, if the field is occasionally inaccurate then I'd first try to push that fix upstream to the source system people. If they won't fix it then you could either calculate the total amount yourself or you could create a variance field that would allows the users to see the inaccuracies while still allowing them to sum the fields together to get the correct total amount, despite the source system not providing it. Of course you could also just dump the source system pre-calculated total field altogether and do it yourself in the ETL, provided your users are okay with that.

True you could store both an ETL-calculated total field and a source system-calculated total field but I think that might be confusing to end users.

In the end I think it depends on the accuracy of your source data and your end user requirements.

Hope this helps.
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: Business Logic: DWH vs. Source system

Post  BoxesAndLines on Tue Feb 17, 2009 10:46 am

Take the raw columns and derived columns. At some point, you'll want both. You don't necessarily have to expose both to the reporting community. I would imagine that some people would be interested in raw columns while others are simply interested in the total. This is also an easy error to catch during the ETL process. Review Ralph's error check writings for a good process to capture and report these defects. You would be surprised how motivated source systems are at cleaning up their dirty laundry once it is published as daily metrics on a dashboard!
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Business Logic: DWH vs. Source system

Post  inglev on Tue Feb 17, 2009 11:26 am

Many thanks guys for the anwers. I guess, the best way indeed might be the way to take them both and decide later on how to publish the figures.

inglev

Posts : 3
Join date : 2009-02-17

View user profile

Back to top Go down

Re: Business Logic: DWH vs. Source system

Post  higerdm on Tue Feb 17, 2009 2:36 pm

One thing to consider it that "discount" is a non-additive field. Unless you are looking at detail rows, it has no value. When summarizing, your formula should be Summary Discount = 1 - (sum(total amount)/sum(amount)) to get a weighted average across all the records in the group.

Another thing, assuming we are talking about invoice data. Did the customer paid the Total Amount regardless of whether it is calculated correctly? If so, then you don't want to recalculate it in the DW. If you recalculate it, then you DW will loose credibility because it does not represent reality and reality is the name of the game. Also, recalculating it, even if it is correct in the source system, may pose a problem with rounding errors, if the rounding is done differently.

Personally, I'd put the Amount and Total Amount in the fact table and calculate the discount.

higerdm

Posts : 1
Join date : 2009-02-03
Age : 57
Location : Wichita, KS

View user profile

Back to top Go down

Results

Post  rpcasey001 on Tue Sep 22, 2009 4:27 pm

Did this get resolved?

rpcasey001

Posts : 7
Join date : 2009-05-29

View user profile http://www.R-P-C-Group.com

Back to top Go down

Re: Business Logic: DWH vs. Source system

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum