What is the right granularity of time?

View previous topic View next topic Go down

What is the right granularity of time?

Post  Bergtroll on Tue Mar 06, 2012 7:14 am

Hi again,

We have a subscription system over here where our customers are able to retrieve informational data from. Depending on their contract some customers have to order subscriptions in advance. Some of them may correct it to the actual number used retroactive. Others may just use the informational data and report their subscription retrospective afterwards. To make it even more complicated, also based on contract, subscription fees may be calculated on a daily, monthly or quarterly base.

Since the heterogeneity of the many customers data processing systems and the easy duplication and redistribution of informational data there never was an effort for a fine-grained fool-proof data usage tracking system. The subscription fees are solely based on what our customers reports. To avoid misapplication there are on-site audits checking how the processing systems handle our data. We are then calculating usage numbers from our audit and compare it to what has been reported.

My question now is: What is the right granularity for the time dimension and how should I model it? The lowest grain is to store each subscription on a day-by-day base with a date foreign key. But since most of the subscriptions are for several month or even years, it would really explode the fact table with dense and redundant data. E.g. a 36 month period of monthly subscriptions from customer C for product P from 01.01.2009 till 31.12.2011 would result into 1095 day-by-day entries. Having more than 25 million customer-product combinations already a three years period could easily lead to more than 100 billion records. On the other hand I could model it with two date foreign keys e.g. subscription_start and subscription_end resulting in maybe 100 million records.

My worry is, that using subscription periods instead of day by day entries will make comparison of the reported data with the audit results much harder.

Hope you can share your experience with me on this. Kind regards,

Bergtroll

Bergtroll

Posts : 15
Join date : 2011-02-02

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  Bergtroll on Tue Mar 06, 2012 7:18 am

P.S. Maybe its a good idea to store the periods and join a day-by-day table only occasionally on reporting purpose?

Bergtroll

Posts : 15
Join date : 2011-02-02

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  Bergtroll on Tue Mar 06, 2012 7:44 am

I forgot a very important constraint. The reporting for a certain customer, product and date has to be unambiguous. Therefore no overlap between two subscription periods is allowed. This should be validated in the cleaning step but is by far not as easy as checking a set of row attributes to me unique. I worry that this will be a big performance impact in the data preparation process.

Bergtroll

Posts : 15
Join date : 2011-02-02

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  BoxesAndLines on Tue Mar 06, 2012 10:35 am

Use an accumulating snapshot fact table. There is one row per subscription with a start and end date. Adding a row for every subscription for every customer is not a good idea.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  Bergtroll on Tue Mar 06, 2012 12:38 pm

Thank you, I have the feeling that this could totally fit our needs. Regarding your tip I read the following very helpful articles.

http://www.kimballgroup.com/html/designtipsPDF/DesignTips2002/KimballDT37ModelingPipeline.pdf
and
http://www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT42Combining.pdf

Now I only have to find out how to transform our month based reporting into ongoing subscriptions :-). If I find out this should be a nice topic for a kettle tutorial!

Best regards,

Bergtroll

Bergtroll

Posts : 15
Join date : 2011-02-02

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  BoxesAndLines on Tue Mar 06, 2012 2:50 pm

Month based reporting is easy. Create a metric (0 or 1) that indicates whether the subscription is currently active and sum on that column. To see how many were active last month, you would need to use your date dimensions.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: What is the right granularity of time?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum