Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Fact Normalisation Question

5 posters

Go down

Fact Normalisation Question Empty Fact Normalisation Question

Post  nash Wed Aug 10, 2011 10:41 pm

I understand that generally normalising a fact is not a good idea, unless there are good reasons. However, I have the following scenario (not uncommon) from insurance industry and seeking you views on it.

Scenario:
I have a normalised database and then this needs to be transformed into a dimensional schema for reporting.

I have following two tables:
'Policy': keeps record of individual policies
'Policy Status': Keeps track of status changes such as Active, Cancelled, Lapsed etc.
A policy can have many statuses.

The requirement is to report on:
1- number of policies (comes from 'Policy' table)
2- number of active policies (comes from 'Policy Status' table)
3- number of cancelled policies (comes from 'Policy Status' table)
4- number of lapsed policies (comes from 'Policy Status' table)

Questions
a. Should I create two fact tables, on around Policy and other around 'Policy Status' entities?

OR

b. Should I create one fact, with all measures (1 to 4 above) denormalised? This will require me to keep individual date columns for active, cancel and lapse status.
(Furthermore, with this option, I will be updating the same row when a policy moves status and lose the 'system entry date' for individual statuses. Which means I can't reproduce same report for same period that is run on two different dates if there is a data movement between the report runs. But if don't update the same row, then I need to keep one row per change and keep something like current row flag (which doesn't sound very elegant)).

Please share your views on this.

Regards,

Nash

nash

Posts : 18
Join date : 2010-03-12

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  hang Thu Aug 11, 2011 1:59 am

nash wrote:I understand that generally normalising a fact is not a good idea, unless there are good reasons.
I wonder if it is a typo. All fact tables should be highly normalised while dimension tables are generally denormalised.

Anyway, to answer your question, the option b seems to be the best approach falling into the category of Kimball’s accumulating snapshot fact table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  nash Thu Aug 11, 2011 2:50 am

nash wrote: I wonder if it is a typo. All fact tables should be highly normalised while dimension tables are generally denormalised.

No this is not typo.

Yes fact table should be normalised, but the fact (attribute) itself need not be. I will try explaining this...
By fact normalisation what I meant to say (which may not have come out quite right) was that you don't generally add 'Fact Type' as dimension on the fact itself. This will spread the different fact types across the rows; i.e. fact normalisation. For example if you have different facts like order amount, discount amount, sales amount etc, you would generally want to keep them on the same row), possiblly sparsely populating the fact. Normalising this would mean introducing 'Amount Type' dimension (with order, discount, sales values).

Now yes, while I am inclined towards option 2, it has those issues such as not being able to reproduce same report figures for same date-range on two different runs if data is subsequent modified for the reporting date-range (this is a requirement for us). This is because same fact row will be updated for other columns (and I will lose sys_insert_date). For example late landing facts.

And, with option 2, if I want to keep sys_insert_date for each type of fact, then I need to introduce current_row_flag. This will also make the fact 'insert only' (just like normalising the fact by spreading different facts across the rows). I will also have to be careful that I only select one row when reporting (i.e. BI layer to apply filtering etc) - doesn't sound very elegant.

Not sure if I explained this well!

nash

Posts : 18
Join date : 2010-03-12

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  VHF Thu Aug 11, 2011 10:08 am

Both approaches to fact table design are valid. Putting multiple measures (order amount, discount amount, sales amount, etc.) into a single record in the fact table is more typical. In this case the grain of the fact table is one record representing (for example) a line item on an order.

However, for certain types of analysis adding the Amount Type dimension and having a row in the fact table with a single Amount measure represent a pricing component is useful. This design would also be useful if you are modeling too many pricing components to reasonably handle with a separate field for each. For example, if there were 10 different types of discounts.

You may find a need to have both types of fact tables in your DW to support various reporting.

(I’m not sure what the best terminology is to describe these two approaches, but I would consider both to be normalized. Having individual amount columns for specific measures such as order amount, discount amount, etc. is not a violation of 3NF. Now, if you had amount1, amount2, amount3, etc. that would be denormalized!)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  BoxesAndLines Thu Aug 11, 2011 12:08 pm

What you are describing is not a normalization issue but rather a data modeling generalization technique. Regardless, a Policy can only be in one status at a point in time. So your statement that a Policy has many Statuses is not true. The end result is all you need is one fact table with a Policy Status dimension. If you want to track the history of Policy status changes, then make the Policy Status dimension a type 2.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  ngalemmo Thu Aug 11, 2011 1:31 pm

It boils down to 'can your query tool handle such a generalization?'. You are only dealing with a hand full of columns and pretty much every BI tool I've seen perfers columns in a row rather than a name/value pair structure. In this particular case it seems like an overcomplication to a fairly simple problem.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  nash Thu Aug 11, 2011 5:59 pm

Thanks for your answers. Appreciate you taking the time to read through this.

Fact normalisation seems the term some people use to describe this generalisation techniques. I think Kimball also uses this to describe the scenario I presented here, although he's not the only one.

BoxesAndLines wrote: ...a Policy can only be in one status at a point in time. So your statement that a Policy has many Statuses is not true. The end result is all you need is one fact table with a Policy Status dimension. If you want to track the history of Policy status changes, then make the Policy Status dimension a type 2.

Yes - at a point-in-time it can have only one and over time it can many statuses. One fact table will be great if I can understand how it will satisfy all of the four requirements (pls see above). Having a status dimension will satisfy last 3 requirements, but then for the first requirement, I will have to have a separate fact for Policy counts isn't it? I don't think I would be able to get Policy count from this fact (gracefully).
I don't have any issue with that at all, unless there is better way.

ngalemmo wrote:...pretty much every BI tool I've seen perfers columns in a row rather than a name/value pair structure. In this particular case it seems like an overcomplication to a fairly simple problem.

I admit this is not an uncommon scenario and may be sounding a bit of an overcomplication.
What makes it a little bit challenging (or different for me) is that the measures like active policies, cancelled policies and lapsed policies have their own respective dates (active date, cancel date, lapse date). This is not the case with example of order amount, sales amount etc (they all transact at the same time). Now keeping them on the same row is not problem, but then I am not able to satisfy reproduction of the same report requirement with same results on two different runs where data has moved for that reporting period. This is due to losing control fileds (like sys_date) on the same row for subsequent status updates.


Last edited by nash on Thu Aug 11, 2011 7:27 pm; edited 1 time in total

nash

Posts : 18
Join date : 2010-03-12

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  hang Thu Aug 11, 2011 6:48 pm

nash wrote:By fact normalisation what I meant to say (which may not have come out quite right) was that you don't generally add 'Fact Type' as dimension on the fact itself.
I agree with the guideline, especially in your case when the fact row is not necessarily sparsely populated. Here is another relevant topic with Warren’s more elaborate comment on the fact over-normalization, which I referred to the single fact column shared by multiple measures, to differentiate general understanding on ER normalization (eg. 3NF or 4NF).

http://forum.kimballgroup.com/t816-multiple-measures-in-a-fact-table-modelling-question

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Fact Normalisation Question Empty Re: Fact Normalisation Question

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum