How to report on sparse areas of sparse fact table

View previous topic View next topic Go down

How to report on sparse areas of sparse fact table

Post  gxclarke on Sat Oct 08, 2011 6:53 pm

A source system tracks student attendance for a school district by reporting absence events. Attendance on any particular day can be determined by examining three datasets: school calendar, student enrollment, and absence.

On any given school day, the number of enrolled students in attendance is usually much larger than the number that are absent, so this approach reduces the number of records stored to track attendance significantly.

I am trying to determine the proper way to represent daily attendance in a dimensional model. The most obvious way is to create a factless table with a grain per school day per student, and an attendance dimension that has values for both attendance and absence reasons. This is quite straightforward to work with OLAP, but the downside is the size of the fact table.

For example, for 30,000 students and 188 school days means that there are approximately 0.5 million records per year (if this doesn't seem large enough to be an issue, then consider an example in which attendance must be reported on per period rather than per day). Contrast this to a fact table that records only absences and the number is considerably smaller. However, if I do this, then I am not sure how to build cubes that aggregate daily attendance facts.

The specific OLAP technology being used is SQL Server Analysis Services 2008 R2. Any thoughts?


Last edited by gxclarke on Sun Oct 09, 2011 7:03 pm; edited 1 time in total (Reason for editing : Corrected typo)

gxclarke

Posts : 2
Join date : 2011-10-08

View user profile

Back to top Go down

Re: How to report on sparse areas of sparse fact table

Post  Mike Honey on Sun Oct 09, 2011 7:43 pm

Hi gxclarke,

Your "obvious" design sounds more useful to me.

From my experience with SQL Server Analysis Services 2008 R2, I'd say your data volumes are relatively light. I'd expect you could comfortably process that data on a commodity server in a matter of minutes, and expect query response times below 5 seconds.

For example, on a virtual dev server at my current site I can process a roughly equivalent-sized fact (2.5m rows) in 30 seconds (including "30%" aggregations).

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum