Fact Table Grain at a Sub-Atomic Level

View previous topic View next topic Go down

Fact Table Grain at a Sub-Atomic Level

Post  seadog2010 on Thu Mar 04, 2010 6:24 pm

I'm thumbing through the Design Tips and would love to find some thought given to the statement made in The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (pg 34):

"Preferably you should develop dimensional models for the most atomic information captured by a business process. Atomic data is the most detailed information collected; such data cannot be subdivided further."

I'm now supporting fact tables that all have a grain that is not at the atomic, but sub-atomic level. For example, if a policy's premium is the lowest grain of data we collect, and that premium is split between 3 brokers, the premium metric is split by 3, resulting in 3 fact records, not 1. Bridge tables could handle these multivalued dimension values but are not used ("too complicated"). Taken to an extreme, which I've seen frequently, when that same policy premium is split between 3 brokers, 3 sales reps, and 3 sales offices, the resulting grain (premium, per broker per sales rep per sales office) is now represented in 27 fact records, not 1!

Aside from the obvious downside of size (generating many large fact records when one would suffice) and confusion (premium, or quoted lives, are already at the atomic level), I'd like to see further cons (and maybe some pros) to violating the above statement (as our shop has done, all too willingly). Subdividing the grain (quoting 0.7 lives) beyond it's natural level, seems, well, unnatural. In what cases would this be acceptable practice?

seadog2010

Posts : 23
Join date : 2010-03-04

View user profile

Back to top Go down

Re: Fact Table Grain at a Sub-Atomic Level

Post  ngalemmo on Thu Mar 04, 2010 6:44 pm

These are not 'sub-atomic' facts, otherwise you would need to account for the loss of mass and subsequent release of energy when the atom is split (E=mc^2 and all that).

What you describe are simply different manifestations of the facts for different purposes, and in reality, the premium distribution you describe would be considered an aggregate... since the premium revenue or commission distribution is a derived value based on the aggregation of policy information and business rules.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Fact Table Grain at a Sub-Atomic Level

Post  seadog2010 on Fri Mar 05, 2010 10:27 am

If I read past the semantics, it appears you're allowing metrics below the lowest collected grain (what I would call de-aggregation, or fractionalzing), as well as deriving the metrics about the lowest collected grain (summarizing, or aggregation) - the latter being a normal practice anyway.

In essence, all metrics are derived and therefore can be considered 'aggregated' from business rules and data; I'm reluctant to use this terminology loosely.

Again, allowing such de-aggregation seems counter to the statement in 'The Data Warehouse Toolkit', unless I'm misreading it.

-thx for the quick response.

seadog2010

Posts : 23
Join date : 2010-03-04

View user profile

Back to top Go down

Re: Fact Table Grain at a Sub-Atomic Level

Post  ngalemmo on Fri Mar 05, 2010 12:32 pm

Yeah... its all semantics.

Aggregate is a more general term, while summary is more specific. An aggregate is a combination of things, for example, concrete, which is a mixture of limestone, sand and rock. It can also be used to refer to a summary, which is the definition Kimball uses when he says an aggregate fact is a reduction in grain. Both are valid.

For example, in a order lifecycle fact table, where you combine orders, shipments and invoice data (each being atomic facts on their own) you can create an aggregate that would be at an increased grain (greater dimensionality) than the original atomic facts on their own.

So, when you talk about an atomic fact table, a fact table representing a particular business event or state, the grain of that fact is at its lowest level of detail as it relates to that specific business event or state.

In your case, collecting a policy premium is a business event, reflected in your fact table. The distribution of revenue relating to that premium is another business event reflected in another fact table that happens to be at a higher grain.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Fact Table Grain at a Sub-Atomic Level

Post  seadog2010 on Fri Mar 05, 2010 12:53 pm

I agree that the measurement of premium belongs in one fact table (a specific business process) and the allocation of that premium (for commission purposes) belongs in another fact table (a specific business process). Unfortunately that's not the case here, but that's another story (designing dimensional models using an all-in-one, 'kitchen sink' approach).

To be clear, premium is not assigned 100% to one sales rep or another, or to one broker or another. If this were the case you could readily summarize total premium amounts for each rep or broker - a higher grain. The reality is that premium ($100) is split between sales reps (for bonus purposes), such that one rep might see $40 and another $60 apportioned to them. Of that same $100, (for compensation purposes) one broker might see $75, another $20, and yet another $5 apportioned to them.

When each fact record represents not simply premium, but premium by sales rep by broker, it suggests the grain has been forced lower (more detailed) than what is collected from the business.

seadog2010

Posts : 23
Join date : 2010-03-04

View user profile

Back to top Go down

Re: Fact Table Grain at a Sub-Atomic Level

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum