Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

How to report on sparse areas of sparse fact table

2 posters

Go down

How to report on sparse areas of sparse fact table Empty How to report on sparse areas of sparse fact table

Post  gxclarke Sat Oct 08, 2011 6:53 pm

A source system tracks student attendance for a school district by reporting absence events. Attendance on any particular day can be determined by examining three datasets: school calendar, student enrollment, and absence.

On any given school day, the number of enrolled students in attendance is usually much larger than the number that are absent, so this approach reduces the number of records stored to track attendance significantly.

I am trying to determine the proper way to represent daily attendance in a dimensional model. The most obvious way is to create a factless table with a grain per school day per student, and an attendance dimension that has values for both attendance and absence reasons. This is quite straightforward to work with OLAP, but the downside is the size of the fact table.

For example, for 30,000 students and 188 school days means that there are approximately 0.5 million records per year (if this doesn't seem large enough to be an issue, then consider an example in which attendance must be reported on per period rather than per day). Contrast this to a fact table that records only absences and the number is considerably smaller. However, if I do this, then I am not sure how to build cubes that aggregate daily attendance facts.

The specific OLAP technology being used is SQL Server Analysis Services 2008 R2. Any thoughts?


Last edited by gxclarke on Sun Oct 09, 2011 7:03 pm; edited 1 time in total (Reason for editing : Corrected typo)

gxclarke

Posts : 2
Join date : 2011-10-08

Back to top Go down

How to report on sparse areas of sparse fact table Empty Re: How to report on sparse areas of sparse fact table

Post  Mike Honey Sun Oct 09, 2011 7:43 pm

Hi gxclarke,

Your "obvious" design sounds more useful to me.

From my experience with SQL Server Analysis Services 2008 R2, I'd say your data volumes are relatively light. I'd expect you could comfortably process that data on a commodity server in a matter of minutes, and expect query response times below 5 seconds.

For example, on a virtual dev server at my current site I can process a roughly equivalent-sized fact (2.5m rows) in 30 seconds (including "30%" aggregations).

Good luck!
Mike
Mike Honey
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

http://www.mangasolutions.com

Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum