Fact Normalisation Question

View previous topic View next topic Go down

Fact Normalisation Question

Post  nash on Wed Aug 10, 2011 10:41 pm

I understand that generally normalising a fact is not a good idea, unless there are good reasons. However, I have the following scenario (not uncommon) from insurance industry and seeking you views on it.

Scenario:
I have a normalised database and then this needs to be transformed into a dimensional schema for reporting.

I have following two tables:
'Policy': keeps record of individual policies
'Policy Status': Keeps track of status changes such as Active, Cancelled, Lapsed etc.
A policy can have many statuses.

The requirement is to report on:
1- number of policies (comes from 'Policy' table)
2- number of active policies (comes from 'Policy Status' table)
3- number of cancelled policies (comes from 'Policy Status' table)
4- number of lapsed policies (comes from 'Policy Status' table)

Questions
a. Should I create two fact tables, on around Policy and other around 'Policy Status' entities?

OR

b. Should I create one fact, with all measures (1 to 4 above) denormalised? This will require me to keep individual date columns for active, cancel and lapse status.
(Furthermore, with this option, I will be updating the same row when a policy moves status and lose the 'system entry date' for individual statuses. Which means I can't reproduce same report for same period that is run on two different dates if there is a data movement between the report runs. But if don't update the same row, then I need to keep one row per change and keep something like current row flag (which doesn't sound very elegant)).

Please share your views on this.

Regards,

Nash

nash

Posts : 18
Join date : 2010-03-12

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  hang on Thu Aug 11, 2011 1:59 am

nash wrote:I understand that generally normalising a fact is not a good idea, unless there are good reasons.
I wonder if it is a typo. All fact tables should be highly normalised while dimension tables are generally denormalised.

Anyway, to answer your question, the option b seems to be the best approach falling into the category of Kimballís accumulating snapshot fact table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  nash on Thu Aug 11, 2011 2:50 am

nash wrote: I wonder if it is a typo. All fact tables should be highly normalised while dimension tables are generally denormalised.

No this is not typo.

Yes fact table should be normalised, but the fact (attribute) itself need not be. I will try explaining this...
By fact normalisation what I meant to say (which may not have come out quite right) was that you don't generally add 'Fact Type' as dimension on the fact itself. This will spread the different fact types across the rows; i.e. fact normalisation. For example if you have different facts like order amount, discount amount, sales amount etc, you would generally want to keep them on the same row), possiblly sparsely populating the fact. Normalising this would mean introducing 'Amount Type' dimension (with order, discount, sales values).

Now yes, while I am inclined towards option 2, it has those issues such as not being able to reproduce same report figures for same date-range on two different runs if data is subsequent modified for the reporting date-range (this is a requirement for us). This is because same fact row will be updated for other columns (and I will lose sys_insert_date). For example late landing facts.

And, with option 2, if I want to keep sys_insert_date for each type of fact, then I need to introduce current_row_flag. This will also make the fact 'insert only' (just like normalising the fact by spreading different facts across the rows). I will also have to be careful that I only select one row when reporting (i.e. BI layer to apply filtering etc) - doesn't sound very elegant.

Not sure if I explained this well!

nash

Posts : 18
Join date : 2010-03-12

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  VHF on Thu Aug 11, 2011 10:08 am

Both approaches to fact table design are valid. Putting multiple measures (order amount, discount amount, sales amount, etc.) into a single record in the fact table is more typical. In this case the grain of the fact table is one record representing (for example) a line item on an order.

However, for certain types of analysis adding the Amount Type dimension and having a row in the fact table with a single Amount measure represent a pricing component is useful. This design would also be useful if you are modeling too many pricing components to reasonably handle with a separate field for each. For example, if there were 10 different types of discounts.

You may find a need to have both types of fact tables in your DW to support various reporting.

(Iím not sure what the best terminology is to describe these two approaches, but I would consider both to be normalized. Having individual amount columns for specific measures such as order amount, discount amount, etc. is not a violation of 3NF. Now, if you had amount1, amount2, amount3, etc. that would be denormalized!)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  BoxesAndLines on Thu Aug 11, 2011 12:08 pm

What you are describing is not a normalization issue but rather a data modeling generalization technique. Regardless, a Policy can only be in one status at a point in time. So your statement that a Policy has many Statuses is not true. The end result is all you need is one fact table with a Policy Status dimension. If you want to track the history of Policy status changes, then make the Policy Status dimension a type 2.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  ngalemmo on Thu Aug 11, 2011 1:31 pm

It boils down to 'can your query tool handle such a generalization?'. You are only dealing with a hand full of columns and pretty much every BI tool I've seen perfers columns in a row rather than a name/value pair structure. In this particular case it seems like an overcomplication to a fairly simple problem.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Fact Normalisation Question

Post  nash on Thu Aug 11, 2011 5:59 pm

Thanks for your answers. Appreciate you taking the time to read through this.

Fact normalisation seems the term some people use to describe this generalisation techniques. I think Kimball also uses this to describe the scenario I presented here, although he's not the only one.

BoxesAndLines wrote: ...a Policy can only be in one status at a point in time. So your statement that a Policy has many Statuses is not true. The end result is all you need is one fact table with a Policy Status dimension. If you want to track the history of Policy status changes, then make the Policy Status dimension a type 2.

Yes - at a point-in-time it can have only one and over time it can many statuses. One fact table will be great if I can understand how it will satisfy all of the four requirements (pls see above). Having a status dimension will satisfy last 3 requirements, but then for the first requirement, I will have to have a separate fact for Policy counts isn't it? I don't think I would be able to get Policy count from this fact (gracefully).
I don't have any issue with that at all, unless there is better way.

ngalemmo wrote:...pretty much every BI tool I've seen perfers columns in a row rather than a name/value pair structure. In this particular case it seems like an overcomplication to a fairly simple problem.

I admit this is not an uncommon scenario and may be sounding a bit of an overcomplication.
What makes it a little bit challenging (or different for me) is that the measures like active policies, cancelled policies and lapsed policies have their own respective dates (active date, cancel date, lapse date). This is not the case with example of order amount, sales amount etc (they all transact at the same time). Now keeping them on the same row is not problem, but then I am not able to satisfy reproduction of the same report requirement with same results on two different runs where data has moved for that reporting period. This is due to losing control fileds (like sys_date) on the same row for subsequent status updates.


Last edited by nash on Thu Aug 11, 2011 7:27 pm; edited 1 time in total

nash

Posts : 18
Join date : 2010-03-12

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  hang on Thu Aug 11, 2011 6:48 pm

nash wrote:By fact normalisation what I meant to say (which may not have come out quite right) was that you don't generally add 'Fact Type' as dimension on the fact itself.
I agree with the guideline, especially in your case when the fact row is not necessarily sparsely populated. Here is another relevant topic with Warrenís more elaborate comment on the fact over-normalization, which I referred to the single fact column shared by multiple measures, to differentiate general understanding on ER normalization (eg. 3NF or 4NF).

http://forum.kimballgroup.com/t816-multiple-measures-in-a-fact-table-modelling-question

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact Normalisation Question

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum