Beginner Modeling Question - Determining the right grain

View previous topic View next topic Go down

Beginner Modeling Question - Determining the right grain

Post  brian.scott@talgov.com on Wed Nov 04, 2015 5:02 pm

Hello friends -

I got my hands on some smart meter data for water meters that report to me hourly and daily consumption amounts (other utilities are available, but for this exercise I'm focusing on water consumption).   I wrote an ETL and BI solution that can collect all of the water consumption, find customers who have had continuous hourly consumption over the past three days (i.e,. 72 consecutive hour readings with some consumption), take those identifiers, merge to customer and work management data to build a dispatching dashboard.  It works great at detecting leaks.  I have something similar to detect meters that no longer communicate with the server; i.e., dead meters.   Great.  But, I'm writing to a .tde from the ETL tool and publishing out to tableau directly; i.e., I have no underlying relational data store.  

So, I know that the next batch of questions from the corner office guys will be things like, average percent of system with leaks, how many leaks on Day X, increase or decrease in leaks over time, water saved since we started implementing, etc.  Sooooo, I need to start putting this stuff somewhere persistent.  

My basic grain that I came up with to answer those questions was: status of a meter on a particular date.  I had this tagged in my head as a periodic snapshot with one row for each meter on each day.  Facts included:

gallons consumed
cost of gallons consumed
approximate gallons consumed due to leak
approximate cost consumed due to leak
approximate size of leak (non additive) [? or add as a binned dim?]
historical daily average gallons consumed (semi additive?)

For dimensions, my initial guess was:

date
meter
customer
owner
weather
leak characteristics (i.e., gradual increase in consumption, large growth with plateau, spikes in consumption, none)
consumption characteristics (i.e., continuous consumption, zero consumption, normal consumption, no record [i.e., dead meter])

For leaks, I know that one of the questions coming my direction is average time from leak detected to leak drop off.  It seems like my ability to calculate this would be hindered with the above model in the event that a service address had more than one leak over time.  I considered adding a 'number of days since continuous consumption began' non-additive fact, but I kind of hated the idea.  Similarly, I thought about adding a 'date continuous consumption begins' property, but hated that even more.  

As far as detecting leaks versus detecting dead meters, should I have different grained facts, or is it OK to combine to a level of meter status per day per meter? I had it suggested to me elsewhere that what I am describing is a set analysis, and I should *only* include meters as they come on the leak list, and then don't add a row when they stop showing continuous consumption. This might work, but my BI solution that is in place gives the user the ability to view consumption historically over timeframes far before continuous consumption started (i.e., show the user consumption for the same thirty days in previous calendar year); which is why I was kind of trying to store all my data in one fact so that the BI tool isn't trying to join fact to fact. Am I doing this wrong?

I know what I'm trying to model isn't unique, but I'm struggling to put all of the pieces in place in my head.  Any insight/suggestions/critiques would be *greatly appreciated*.  

Thanks!

brian.scott@talgov.com

Posts : 4
Join date : 2015-03-05

View user profile

Back to top Go down

RE: Beginner Modeling Question - Determining the right grain

Post  zoom on Thu Nov 05, 2015 3:33 pm

I think your current state of data is that you do not have a transaction fact that captures hourly meter status. If you do have it, then your next daily snapshot of Fact table should be loaded from that transaction fact table. DW is built around users need. My advice is to wait for the requirements and then act.
Here is my comments on the Facts you listed:

A. gallons consumed --- make sense to use
B. cost of gallons consumed --- make sense to use

C. approximate gallons consumed due to leak
1) Define a leak? I would use a more generic name for it like “Calculated water used value”, because this value represents what is actual defined in your “leak characteristics” dim
2) How do you establish the base or starting point of water consumption to calc its value? Since it is daily snap fact then do you compare yesterday “gallons consumed” to current day “gallons consumed “ value to come up with it (user should tell you how to calc it).

D. approximate cost consumed due to leak
Cost to charge water usage can change any day. How do you handle that situation when one day water cost was $5 per gallon and next day cost increase or decrease per gallon ? Again user should tell you how to calc it.
E. approximate size of leak (non additive) [? or add as a binned dim?] --- you do not need this. Looks like same as “C” above.
F. historical daily average gallons consumed (semi additive?)
How many days are used to calc daily average? I would not store this value but provide date range input that a user can enter on the report and then calc it on the fly and show all measures based on that date range.

zoom

Posts : 97
Join date : 2010-08-23
Location : Chicago

View user profile

Back to top Go down

Re: Beginner Modeling Question - Determining the right grain

Post  nick_white on Fri Nov 06, 2015 7:23 am

To echo some of zoom's comments, I would approach it as follows:
1. Work out what the events/activities/etc. are that you you actually have data for - this is your lowest level of information - and then build fact tables based on these. Anything that cannot be derived from these low level facts is not something you can deliver (unless you can get the data from elsewhere and load them as low-level facts)
2. Work out how to build up the low level facts into the information you actually want to report on and then determine the best way to implement this - aggregates, (accumulating) snapshots or in your BI tool rather than your dimensional model

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Beginner Modeling Question - Determining the right grain

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum