ETL for Fact load

View previous topic View next topic Go down

ETL for Fact load

Post  shrikant.kulkarni on Sat Feb 22, 2014 10:43 pm

I have customer interaction fact which has series of all business events across life cycle of customer. This fact table is suppose to grow vertically, expected columns are : customer_key, BusinessEvent_key, EventDate_key, EventCount (measure) and there could be lot of more measures derived based on self joining this fact across multiple business events for same customer. This table is expected to grow HUGE as it records every business event for customer.

I also have separate facts tables across these business events which provides much more detailed metadata about each of business event. For example - customer interaction fact would say customer called on date x, filed a claim on y date, claim was fulfilled on z date. claim fact would tell me peril, geography, product, policy and all rest of metadata of this event at that point in time, and same goes with fulfillment fact. Effectively, there are 6-7 different fact tables which can potentially derive customer interaction fact table.

Question I have is how should I keep ETL flexible across multiple facts. One way is I could populate all of the facts individually in parallel and then derive interaction fact by union of each of these fact. Second way is I keep ETL for interaction fact completely isolated from other facts and populate it incrementally from persisted history.
Advantage of second option is I can customize rest of facts and its grain as when I want (example - what if my claim fact end up more than one record for same claim id for same business event for multiple times, then I won't know which row to push for interaction fact). Also I don't want to respond business saying customer interaction fact does not have update information until fulfillment fact is ready (say fulfillment fact failed in ETL load)
Disadvantage of this option is, counts reported from interactions fact or claim fact for claims could differ (as these two are asynchronous processes) and I could potentially write common ETL process across two or more streams unless I serialize their execution (which I absolutely don't want, I am looking out to run as many things in parallel as they could be).

What is best practice here?
I follow kimball's two tier arch - persisted history / staging as tier 1 and star schema as tier 2. So I could potentially achieve both options. One more thing, business wants interaction fact as first deliverable in DW, at this point they don't care about the point in time metadata of those business events and they could join this fact with staging and still get those.


Best Regards
Shrikant







shrikant.kulkarni

Posts : 2
Join date : 2014-02-22

View user profile

Back to top Go down

Re: ETL for Fact load

Post  nick_white on Fri Feb 28, 2014 9:02 am

Hi,
I'm not sure there is a best practice - any solution has its pros and cons (as you outlined below) and the "best" solution would be whatever suits your particular circumstances. In order to put some structure about how to come to a conclusion as to the best design, I would think about what your "drivers" are and their relative importance to you e.g.

Maintenance: if a business event fact or interaction fact design changed would you need to update both or can you come up with a solution that isolates changes to one from changing the other?

Performance: if the interaction fact has measures based on multiple business events then presumably this involves fact table updates? If this is the case then, given that updates are slow and should be avoided if possible, consider a design that allows deletes/inserts of the fact table when changes happen. This probably means accumulating all the business event records for each load into a staging area version of the interaction fact and then at the end merging it with the existing corresponding fact record, deleting that and inserting the merged record. This approach may lead you away from trying to populate the business event and interaction fact tables at the same time.

Sorry - probably doesn't really answer your question but may help your thought processes a bit

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum