Modeling Times within Times

View previous topic View next topic Go down

Modeling Times within Times

Post  Dylan on Wed Oct 22, 2014 5:08 pm

I have a business case that Iím having an issue deciding how it should be modeled. There's already a fact table but the design predates everybody remaining on the team, and there isn't a consensus if things should change or not when we're making other changes.

This process essentially projects prices for different entity types. It runs every 15 minutes, projects prices for 15 minute intervals for various entities, and publishes 10 intervals into the future.

Different users want to be able to look at process level details differently. Some want process id so they can line it up with other reports they have from different systems. Some want process end time, and some want process begin time which doesnít actually exist and is just the end time Ė 15 minutes. Some people want to look at the Nth prediction within the process. And some people want to look at predictions of a given price.


So a single execution of the process will look like this.

Process
ID: abc
Finish Time: 13:30
Calculated Start Time: 13:15

Predictions
ID: abc
Entity: xyz
Times: †(end of a 15 minute interval)
(1) 13:45
(2) 14:00
(3) 14:30
(4) 14:45
(5) 15:00
(6) 15:30
(7) 15:45
(8) 16:00
(9) 16:15
(10)16:30

The key on the fact is currently: entity, process_start_time, interval_begin_time. Its only measure it price.
We have a 15-minute interval time-dimension that has rows for every day ((4*24) = 96 rows per day). Both times in the fact are aligned to this dimension.

Our discussions started with wanting to add the Nth value (1-10) to the fact table. The pros of this being that it's simple and direct. But some people didn't like that there was no dimensional attribute for 1-10, or that we're essentially stuck always depending on everything being separated by 15 minutes. The follow up to this being that if intervals or processes became something else, we could copy the existing 15-minute dimension so historically they're correct, and going forward interval and processing lengths could differ.

This was followed by somebody proposing that we change the grain of the dimension so that it's natural key was expanded from interval_begin_time to also include process_start_time. The main selling point of this is that (a) we now have the 1-10 attribute, and (b) we can use partition pruning on either date, even if the rules around the process change (interval lengths change, number of predictions change, time between process and first predictions change, etc). The downside being that the table will have 10x more rows every day, and be nearly 2x as wide.

Nobody can seem to convince each other that one method wiser than the other. Does anybody have an recommendations on how they would model this?

Dylan

Posts : 5
Join date : 2010-11-30

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  nick_white on Thu Oct 23, 2014 3:23 am

Hi - when you say "We have a 15-minute interval time-dimension that has rows for every day ((4*24) = 96 rows per day)" - does that mean that you have the date in your time dimension as well as the time? If so, I would start by splitting those two out into separate Dims.

Your fact table then then have an FK to Date Dim for Process date (and/or any other relevant dates you want to report on).
You would also have an FK for Process Time
Your Time Dim has a grain of second or minute (however accurate you want to be). It would contain whatever higher level attributes you need - 15 min interval, half hour, hour, etc. - however your business splits up its day.
If you do this then it doesn't matter if your business moves from 15 minute intervals to something else (or even uneven intervals) as you are just recording the time of the event.
Possible include a lag measure (if you think it might be useful) - difference between the process start time and the event time.
Add a degenerate dimension for the measure number (1-10).

So your fact table might contain these columns:
Process Start Date FK
Process End Date FK (if a process can cross midnight)
Process ID FK
Entity ID FK
Interval Time FK
Interval Number (DD)
Interval Measure
Lag Measure

Just a thought: if you are always going to have 10 measures per process then you could put all 10 in a single fact record - with an Interval DD and Interval Time FK for each measure. Makes it easier if you need to do calculations based on multiple measures in the same process - but obviously would require development effort if you ever decided to have more than 10 measures per process

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  Dylan on Thu Oct 23, 2014 4:41 am

Thanks for the response.

The 15 minute-interval dimension has a date-time key. It also has an out-trigger to a hour level time dimension (whose key is also a date-time) which in turn has an out-trigger to a day level dimension.

If process-id is unique to that start would it belong in a "process" dimension? There are a couple other facts that would be in a similar boat to this one, all of which would benefit from having access to that value.

And if there is a relationship between a process-start-time and the intervals in it, is is safe to relate those dimensions together? I would imagine it resulting in something like this...

Process Dim:
Start time (key)
End time
Start hour (fk to hour dim)
Process id
First Interval (1st fk to interval dim)
Last Interval (2nd fk to interval dim)
...Other process related attributes

Interval Dim:
Start Time (key)
End Time
Start hour (fk to hour dim)
First process time (1st ky to process dim)
Last process time (2nd key to process dim)
...other Interval related attributes

Fact
Process Start Time (fk)
Interval Start Time (fk)
Entity Id (fk)
Interval Number (DD)
Interval Measures

Dylan

Posts : 5
Join date : 2010-11-30

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  Dylan on Thu Oct 23, 2014 4:43 am

Forgot to note, all intervals in a process will be contiguous, and all processes that an interval appears in will be contiguous. Its essentially a tolling window that advances every 15 min.

Dylan

Posts : 5
Join date : 2010-11-30

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  nick_white on Thu Oct 23, 2014 8:21 am

You should always split Date and Time into separate Dims - so your Date Dim has 1 record per day and your Time Dim has one record per minute (or second or millisecond...) in a day i.e. if your Time is in minutes it would have 60*24 = 1440 records.

Joining Dims to Dims (snowflaking) is normally a bad idea - the exception is normally joing to the Date Dim so you can filter on Date attributes. So I think you need to take out the FKs between Process and Interval. Dims should be related to each other via Fact tables, not directly.
You don't need links to an hour Dim (or an Hour Dim at all) as you already have this information via your links to the Time Dim.

You should always have much fewer records in a Dim than a Fact. If your design ends up with a 1:1 relationship between a Dim and a Fact then the design is probably wrong.

If you have a group of attributes that relate to Process (or Interval) then put them in a Dim - but only if they don't mean you'll end up with a Dim record per Fact table

Hope this helps?

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  Dylan on Thu Oct 23, 2014 9:41 am

Each process will spit out predictions of this type for 500+ entities, so the relationship from fact to dim would be 500+:1. But they would both be growing at a constant rate.

So if it's advised to not snowflake dimensions (which I do know to avoid) even though it is just to align times, then the process ID would have be repeated in every fact that this type of interval exists? The source spits out different prediction types (ie different measures for different entity types) to different source system tables which end up in different facts.

Dylan

Posts : 5
Join date : 2010-11-30

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  nick_white on Fri Oct 24, 2014 7:55 am

Does a process have multiple attributes that do not change each time it is run (Name, owner, type, etc.)? If it does, and you wish to report on them, then I would put them in a Process Dim and relate that Dim directly to your Fact.
If a process only has attributes that have values when the process is run, and are different for each run of the same process, (start time, process run id, etc) then I would not put these in a process Dim as, as you say, you would have a new Process Dim record every time you ran the process. Most of these run-specific attributes I would assign directly to the fact e.g. Process Start Date = FK to Date Dim, Process Start Time= FK to Time Dim, Your Process ID probably ends up as a Degenerate Dim directly on the fact table if you need it - other attributes may uniquely identify the Process (Process Start Date, if no two processes can start at the same time) so you may not need it.

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Modeling Times within Times

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum