Two-Part Fact Design Question

View previous topic View next topic Go down

Two-Part Fact Design Question

Post  ThomVF on Thu Jun 26, 2014 9:32 am

OK.  The question isn't two-part...the Fact is.

I have a source "Event-Log" table that clearly qualifies as a Transaction Fact, containing some junk, regular, and degenerate Dimension attributes:  Event-ID, Event-Date, Event-Type, Action-Code, Cycle#, Department-ID, Employer-ID, etc.

However, there is a corresponding "Message" that is optionally associated with Event.  A "Message" is essentially a text string that was optionally sent to the Employer in question, as part of the "Event".  Each Message is stored in a different source table and they are grouped into "Message Definition Type" categories.  Each "Message Type" is essentially a text-block template, containing one or more embedded  positional parameters ({0}, {1}, {2}...), like you would see in a word processing mail-merge template.  Each Message also has 0..N "Message Value" children entries, which are the actual values that were substituted into the template to form the completed message.

Since a Message cannot correspond to more than one Event, and vice-versa, they at the same grain.  I need to keep track of the fully-formed Message (not the template), as well as the Event.  I can do this by replacing the 0..N Message values into the template text.

Problem is, there are 70M Events and about 95% of them have a corresponding Message.  My instinct was to create a Message Dimension for the Event Log Fact, containing the substituted Message Text, but that's a 1:1 Dim:Fact with ~70M records in each.  Not exactly a good Dim:Fact situation.  

Since the Message is essentially just a [Message#, Message-Type, and Message-Text], it doesn't really lend itself to being a fact.

To be honest, Message feels more like Operational Data, not Dimensional.  But since I'll have an Event Log Fact, I need to find a way to design in the Message data with it.

I'm hoping the business decides that they don't really need to see the Message text, and only needs to track the Message-Type of each Event Log Fact.  I'm working on this but may not convince them of it.

What's the general approach to this situation, when you have a large volume 1:1 Dim:Fact?
What is the approach when you have to account for Dimensional data that has corresponding Operational Data?

ThomVF

Posts : 3
Join date : 2014-06-26

View user profile

Back to top Go down

Two Part Fact Design

Post  PLYounger on Thu Jun 26, 2014 12:23 pm

Hi Thom.

You could store the message with the fact, but I would tend to either 1) create a factless fact or 2) create it as a dimension. If you created it as a dimension, the users could use it to filter by message type, etc.

I'm interested to know what you decide :-)

PLYounger

Posts : 8
Join date : 2012-03-16
Location : Washington DC

View user profile

Back to top Go down

Re: Two-Part Fact Design Question

Post  ngalemmo on Thu Jun 26, 2014 3:50 pm

If the message is 1:1 with event, you could just store it in the event dimension. The message value data could be a bridge based on event or you could incorporate the values in the stored message, however if these values are derived based on the specific message target and are actually attributes of the target, I wouldn't store them at all… at best just store the metadata related to the value.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Two-Part Fact Design Question

Post  nick_white on Fri Jun 27, 2014 8:08 am

I assume that you are not using the message data for analysis purposes, you just want to be able to display it in a report along with the event it relates to? In which case just put it in a table with an FK to the Fact table PK and just join to it in your reporting tool when you need to display it - effectively you aren't treating it as part of a dimensional model, it is just transactional data.
Kimball discusses this in his book (search for "textual comments") and his recommendation is either use a Comments Dim (if cardinality of comments is less than the grain of the fact) or treat it as an attribute in the transaction dimension but says this second option is likely to give you a performance hit - hence my preference for taking out of the dimensional model entirely. In your reporting tool you'd run a subquery to select all the events you want to report on (using your dimensional model) and then join this dataset to your standalone message table

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Two-Part Fact Design Question

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum