Does This Warrant the Use of One or More Fact Tables?

View previous topic View next topic Go down

Does This Warrant the Use of One or More Fact Tables?

Post  scruzloose33 on Mon Feb 06, 2012 3:20 pm

I am trying to create a dimensional model that contains information about online consumers and how they are matched to mortgage lenders in their search for a refinance. There is a lot of information about each consumer that is driven to the site, such as:

Contact Information
Date They Registered
Geographic Location
Source Information (Search Engine, Keyword, etc.)

There is also a lot of information about the mortgage lenders each consumer is matched with. A consumer can be matched with one or more lenders and each lender might pay a different price to be connected with the consumer.

I am interested to be able to see things such as:

1. How many lenders does a consumer get connected with on average?
2. Details on lender purchases (average price paid, average low price, average high price, etc.)
3. Information about lender purchase history over time

The finest granularity fact table would call for one record to represent the transaction of connecting a consumer to a single lender (i.e., there would be multiple records for each consumer). But, I am worried that with this model that it will become very difficult to see data trends for the individual consumer (e.g., average number of lenders an individual consumer is connected with).

Would it make more sense with this to have two data marts (therefore two different fact tables): one storing a single record for each consumer and one storing a fact table entry for each connection of a consumer to a lender? Or, should this be considered one business process and therefore I should try to make it work with only one data mart and fact table?

Thanks,
Shane

scruzloose33

Posts : 5
Join date : 2012-02-06

View user profile

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  scruzloose33 on Mon Feb 06, 2012 4:39 pm

It seems like this is a similar topic to the one discussed here: http://forum.kimballgroup.com/t1529-finding-the-grain-with-one-to-many-fact-tables

Right now, I am more interested in the aggregate data, but it sounds like I should be thinking about defining the grain at the consumer-to-lender connection level. I can also build an aggregate fact table at the one-record-per-consumer grain if necessary, but the detailed consumer-to-lender information seems necessary to avoid the need to rebuild later.

Does that sound correct? I was originally thinking that I should only have one fact table per business process, but it doesn't seem that there is any harm in having multiple fact tables with different granularity.

scruzloose33

Posts : 5
Join date : 2012-02-06

View user profile

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  ngalemmo on Wed Feb 08, 2012 11:35 pm

I see the business process issue a bit differently than I believe you do. Its true an atomic fact should represent a single business event/process and that fact tables that are derivations (aggregates) of other facts is fine. But, I think you are intermixing two different events that should be kept separate. One event is the customer generating a referral. Each referral represents revenue to the company and should be tracked at the lowest possible level. Maintaing all related dimensions for the fact allows the information to be represented in many different ways. In addition to referrals, you also have actual sales from those referrals which occurs at a different time using different processes. Such sales events should be maintained in their own atomic fact table.

Once you have these established, aggregate the data any way you want. The decision to actually persist the aggregate in a new fact table is a decision based on SLA's, costs, and user benefit.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  scruzloose33 on Thu Feb 09, 2012 12:03 am

ngalemmo wrote:I see the business process issue a bit differently than I believe you do. Its true an atomic fact should represent a single business event/process and that fact tables that are derivations (aggregates) of other facts is fine. But, I think you are intermixing two different events that should be kept separate. One event is the customer generating a referral. Each referral represents revenue to the company and should be tracked at the lowest possible level. Maintaing all related dimensions for the fact allows the information to be represented in many different ways. In addition to referrals, you also have actual sales from those referrals which occurs at a different time using different processes. Such sales events should be maintained in their own atomic fact table.

Once you have these established, aggregate the data any way you want. The decision to actually persist the aggregate in a new fact table is a decision based on SLA's, costs, and user benefit.

Thanks for the input, that is helpful. I have been teetering back and forth as to whether the original online lead generation is truly a separate process from the sale/connection of the lead to the lender(s). I now believe they are indeed different processes, but I have just confused myself because they are so related and because of my lack of experience in dimensional modeling. Since I want to be able to see things such as profit margin per lead, I need to be able to see the end-to-end connection from the lead generation source (cost) to the lead sale to the lender(s) (revenue). The cost of the lead generation is definitely going to be a measure in the first fact table, but what is the best practice for storing this in the 2nd fact table that primarily represents the sales? Since one lead may cost $20 to generate, but it can be sold 4 times to different lenders, should I be dividing the total cost by the number of sales so the cost correctly rolls up in profit reports on the 2nd fact table? The tricky part here is that all the sales might not happen at once, so two sales might happen the first day (giving each a $10 cost), but then 2 more sales might happen a few days later. It seems odd that at that point I would want to go back and update the charges for all 4 of those sales to $5.

I see my options as follows:

1. Divide the total cost for each lead by the number of times it was sold and store that in the fact table that contains the sales transactions to the lenders. The tricky part here is that I would need to retroactively update the cost measure for several records whenever a new sale is made on a lead that was already sold in the past. This doesn't seem very elegant.

2. Don't store the cost in the fact table that represents lead sales to lenders at all and only use that table solely for revenue and lender purchase reporting. In order to get profit margin information, the first fact table (that represents the original lead capture transactions) would most likely need to be an accumulating snapshot because it would also need to store revenue that could change as new sales are made.

3. Link the sales transaction fact table back to the lead generation fact table and report across both of them through this "foreign key" relationship. From what I understand, this is really more relational thinking and doesn't usually make sense in a dimensional model.

Thanks again for any suggestions!

scruzloose33

Posts : 5
Join date : 2012-02-06

View user profile

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  ngalemmo on Thu Feb 09, 2012 12:43 am

The best practice is to store it in fact table #3.

The atomic fact tables are what they are. An accurate record of leads generated and a separate, and an accurate record of sales generated by the lead. The metrics you are talking about are aggregates of these two base facts. You can either generate these metrics at time of query, or, if it is too slow, create an aggregate of the two atomic facts, creating table #3.

So, option 3 is what you do, but not as you describe it. In a dimension model, you combine facts along common dimensions. The generic pattern to do this is you query each fact separately pulling needed measures and the same dimensional attributes, summing the rows by the set of dimensional attributes. You then combine the two sets by attribute using a join or union. If you do not build your atomic facts with common conforming dimensions you cannot combine them, which is why you create atomic tables in the first place, since an atomic table contains all possible dimensions pertaining to the measures.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  scruzloose33 on Thu Feb 09, 2012 12:53 am

ngalemmo wrote:The best practice is to store it in fact table #3.

The atomic fact tables are what they are. An accurate record of leads generated and a separate, and an accurate record of sales generated by the lead. The metrics you are talking about are aggregates of these two base facts. You can either generate these metrics at time of query, or, if it is too slow, create an aggregate of the two atomic facts, creating table #3.

So, option 3 is what you do, but not as you describe it. In a dimension model, you combine facts along common dimensions. The generic pattern to do this is you query each fact separately pulling needed measures and the same dimensional attributes, summing the rows by the set of dimensional attributes. You then combine the two sets by attribute using a join or union. If you do not build your atomic facts with common conforming dimensions you cannot combine them, which is why you create atomic tables in the first place, since an atomic table contains all possible dimensions pertaining to the measures.

Once again, very helpful. Thanks for the clarification!

scruzloose33

Posts : 5
Join date : 2012-02-06

View user profile

Back to top Go down

Re: Does This Warrant the Use of One or More Fact Tables?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum