Flexible model

DBADS Wed Aug 25, 2010 10:06 am

We are designing a model for anaylsis and we are relatively new to that. We came across two alternatives:
1- Apply the dimensional model design as facts and dimensions (the normal star format)

Fact:
Dim1_ID|Dim2_ID|Dim3_ID|Measure1_Value|Measure2_Value|Measure3_Value

2- A new approach is suggested due to the following 2 givens:
i- The number of measures is big (10 per fact) while the data is relatively small
ii- The measures will definitely increase in the future for the same facts

So instead of adding new columns to existing facts, it was suggested to design the fact this way:
Dim1_ID|Dim2_ID|Dim3_ID|Measure_ID|Measure_Value
While all measures are stacked in one table:
Measure_ID|Measure_Category|Measure_Name|etc...

So, which way to go?

Thank you

ngalemmo Wed Aug 25, 2010 12:04 pm

10 measures is not necessarily a lot, but either approach is valid. It all depends on what it is you are modeling and what value the measure type dimension provides.

BoxesAndLines Wed Aug 25, 2010 1:06 pm

I like option 1 better as it is a cleaner design. With option 2, you really have no idea what's in the fact table by looking at the data model.

ngalemmo Wed Aug 25, 2010 3:11 pm

True, but again, it depends.

In a model developed for a large CPG manufacturer, one of the data feeds was pricing components relating to orders. The components were a level of detail below the order line and specified the bits that made up the particular price being charged. The components and final price would vary from one order to the next. There were some substantial attributes relating to a component, such as promotion/deal, accounting information an so on. While it would have been possible to categorize the components and flatten the structure across multiple fact columns, significant information would have been lost or aggregated.

hang Wed Aug 25, 2010 8:23 pm

It’s all about dimensionality. If the measure dimension is a part of business requirements, then you don’t have other choice but create another dimension like your measure dimension. However I would have a more specific dimension name for it. I would not create a measure dimension only because there are more measures in the fact table, although too many measures, say more than 20, in a single fact table does warrant some consideration of remodelling.

Sometimes the fact table is modelled with too many measures that are the result of cross-tabbing a single measure on some dimension (say date). So in that case, the fact table should be normalised with single measure and let the application cross-tab the metrics accordingly.

In my experience, I have created a band dimension for a fact table that initially contained measures for Band1..Band10 and BandPercent1..BandPercent10, where clearly only two measures, Band and BandPercent plus a band dimension are required. In this case the fact is said to be dimensionalised by Band as well. With the new Band dimension, not only does the fact become more normalised, but also does the band dimension provide you with more useful attributes, like the band sort order.

DBADS Thu Aug 26, 2010 2:53 am

But we don't have all the previous cases.
These are a bunch of independent measures having the same aggregation level.
The only valid arguement (till now) in favor of option 2 is the flexibility of adding new measures to the fact(s) without doing any structural change (adding a new column).
But isn't this the tipical case with data marts?

DBADS Thu Aug 26, 2010 3:31 am

Another point related to this evaluation:
Is there a performance gain in favor of one of the two options?

hang Thu Aug 26, 2010 5:07 am

In that case, BoxesAndLines is right. Don't mess up the fact table with option 2 as it is bad for both performance and dimensional modeling.

If the facts share the same dimensionality, they need to be diced and sliced by the same set of dimension attributes and shown side by side for the purpose of comparison. With option 2, you would have to do self join on the same fact table to achieve that consolidated view, bad for performance, while option 1 consolidates the facts naturally. The option 2 also muddles the fact table with some presumed constraints in order to get any fact to work, or otherwise you could easily double or triple aggregate the metrics.

If you do have distinct measure groups separated by micro business processes and the number of the measures has reached an uncomfortable level, you could create separate fact tables with same set of dimension keys so that you can still consolidate the facts by the dimension conformance. But take that as exception as we normally do the other way around, trying to pack the facts with the same dimensionality into a single fact table.

Sponsored content

Flexible model

Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model

Re: Flexible model