Wide fact tables

View previous topic View next topic Go down

Wide fact tables

Post  foops on Mon Aug 06, 2012 3:00 pm

I'm in a bit of a dilemma and I need some advice. I've currently designed a datamart it contains 10 dimensions and 2 fact tables. One for the detail and one for aggregrated data. My client has identified approximately 66 measures that need to be calculated by Day, WTD, MTD and YTD for the current year and the previous year. I have created a dimension called measure dimension that captures all these measures and a fact that captures the measures for each time period.

Measure Dimension
surrogate key
measure name

Fact (table contains about 5 foreign keys)
current daily count
current wtd count
current mtd count
current ytd count
previous daily count
previous wtd count
previous mtd count
previous ytd count

A microstrategy developer has said that he does not approve this design and would like to have one wide fact table with a column for each measure. His arguement is that the current design means that the fact table will grow to a billion rows in 3 years.

We are building the ODS in sql server, we do not have the enterprise addition and so cannot take advantage of partitioning.

Please advise on the best approach to take here. I do not think that the wide fact table is the answer.

Thank you

foops

foops

Posts : 2
Join date : 2012-08-06

View user profile

Back to top Go down

Re: Wide fact tables

Post  Jeff Smith on Mon Aug 06, 2012 3:45 pm

Will all 66 measures be calculated for every item? If not, the flattened table could be larger than your normal design.

How often will all 66 measures be used in reports? I think the key is to minimize the volume of data that has to be pulled from the disk as I/O tends to be slower than processors these days.

A poor mans way of creating partitions is to create different fact tables for each year or month, or however you were going to partition the the fact table as if you had the enterprise edition. Treat each table like you would a partition, pointing it to the appropriate filegroup. Create a clustered index on the field as if it were a partition. Create a union view across the individual tables. The view should take advantage of the clustered index.

But you should still get the enterprise edition.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Wide fact tables

Post  Mike Honey on Mon Aug 06, 2012 8:21 pm

Hi foops,

I'd be trying to solve this without aggregating the fact data. If you have the base detail of the event rows you are counting with their dates, combined with a Date dimension that makes it easy to determine the relative time aspects (current/previous day, wtd, mtd, ytd etc), and it's all properly indexed, you will probably get surprisingly good query performance.

Heres the sort of SQL I'm thinking of:

SELECT COUNT(*)
FROM Fact_Events
INNER JOIN Dim_Dates ON Fact_Events.Date_SKey = Dim_Dates.Date_SKey
WHERE Dim_Dates.WTD = 'Current'

You will also save a lot of time, effort and resources by not pre-aggregating.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Wide fact tables

Post  ngalemmo on Mon Aug 06, 2012 10:51 pm

Or, if you must aggregate, why not just a daily value? Doing the to-date stuff should't be all that bad. In terms of the measures themselves, its not clear why there are 60+. Are these derived from a much thinner fact or its just the way it is? If they are derived, keep the thin fact, summarized by day and any other dimensions and build a view that derives the different values based on business rules.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Wide fact tables

Post  foops on Fri Aug 10, 2012 3:23 pm

Thank you all for your responses ..... I highlighted my concerns to our management and we have decided to look into upgrading to the enterprise edition rather than put work arounds in place.

foops

Posts : 2
Join date : 2012-08-06

View user profile

Back to top Go down

Re: Wide fact tables

Post  LAndrews on Fri Aug 10, 2012 4:25 pm

I'm not sure if partitioning will solve your issue.

It appears the grain of the fact is daily .... everything else is calculated based on that day. Basically the same number of rows with or without the WTD,MTD,YTD stuff.

I think the microstrategy developer is more impacted by the measure dimension ... eliminating that would reduce the rows from 2 billion to about 30 million.

However, without knowing more about the measures and how they are used, its hard to suggest a solution. For example, if the measures are typically reported/analyzed side-by-side, then I prefer to have each measure as a column. (e.g. Units & Cost). If the measures are reported as rows (as in a P&L report) then I prefer to have a row for each measure. (e.g. Revenue and Profit)




LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Flattening approach and pros/cons

Post  vickyejain on Mon Aug 20, 2012 3:48 am

I have seen the question of flattening vs. normalizing come up various times wherein the report developers most of the times tend to prefer the flattened version and ETL developers prefer the normalized version. Something I have done in the past is to process all data in a normalized version and then flatten the data into a giant wide fact table that has place holder columns for multiple metrics considering future growth (say 100 measure columns in your case - measure1, measure2, measure3, etc.)

You can store a mapping between a measure id/name and the pivot number that tells you (and the ETL code) which column to put data in. After this, you are free to discard the normalized table created during ETL processing. The upside here is that the data model is generic and can accommodate increase/decrease in measures easily and will be straight-forward to develop reports on (especially if your performance requirements are stringent), although a major downside is that your data model is friendly to end users (the generic measure names do not say much). Additionally, the I/O when reading from the fact table has to be considered whenever reading data - running simple 'select * from' queries can be a lot more taxing here without restricting to specific columns.

I personally don't recommend a generic flattened approach owing to these downsides, but sometimes the business needs (changing business metrics in dynamic markets, report performance requirements) demand solutions like this.

vickyejain

Posts : 7
Join date : 2012-08-20

View user profile

Back to top Go down

Re: Wide fact tables

Post  murugan2012 on Wed Aug 22, 2012 3:19 pm

It is not advisable to create these kind of persistent preaggregated tables for many reasons like lack of flexibility, occupying more space, taking more time to build these aggregates and also not every day the users can do these type of reporting, but still you need to refresh the to-date values assuming some one will be accessing it. These types of aggregates to be built purely on the basis of need and also on the fly by using the views or using any reporting tools. All the reporting tools handle these type of to-date calcuations easily. You can achieve this in creating olap cubes or dimensional meta data pacakges. simply store the daily level facts for current and previous years.

murugan2012

Posts : 5
Join date : 2012-08-22

View user profile

Back to top Go down

Re: Wide fact tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum