Factless fact table versus Dimension (downstream implications)

dkbunkers Tue Apr 06, 2010 4:45 pm

Factless fact table versus Dimension (downstream implications)

Let's keep this simple for discussion purposes (I’m leaving off dimensions and surrogate key fields on fact table, it really isn’t this small)

We are trying to build a dimensional DW for a source system which
tracks developer tasks and the time being logged to those tasks.

Example (OLTP tables)
TASK
TaskId
ProjectId
TaskClassification
TaskSection
TaskDescription
PROJECT
ProjectId
ProjectType
ProjectName
ProjectOwnerEmpId
TASK_HOURS_WORKED
TaskId
DeveloperId
WorkedDt
WorkedHours

Dimension design option #1 – Treat the TASK as a factless task table itself.

FactTask
ProjectKey
TaskClassificationKey
TaskSectionKey

DimProject
ProjectKey
ProjectTypeDesc
ProjectOwnerName
ProjectOwnerDepartment
ProjectName
DimTaskClassification
TaskClassKey
TaskClassificationId
ClassificationType
DimSection
SectionKey
SectionGroup
SectionType
SectionName

ClassificationGroup
ClassificationName

Above seems correct / workable when treating the instance of a task as a “fact”

However, When I add in the FactTaskWorkedHours table I find myself reproducing/supporting the same dimensional keys on that fact table which already exist on the FactTask. Leads me to believe that FactTask, DimSection and DimClassification should not exist and should be part of a DimTask table to store those attributes and thus treat DimTask as a fact table when needed.

The down side to easier maintenance that goes along with a “DimTask” as both a dimension and a Fact table would be; I don’t have a “master list” of available Classifications and can only rely on the classifications that are actually assigned to a task, not to mention the expense of going thru a larger “dim table” just to get an small attribute like classification (100,000 tasks versus 20-30 classifications)

ngalemmo Tue Apr 06, 2010 5:27 pm

However, When I add in the FactTaskWorkedHours table I find myself reproducing/supporting the same dimensional keys on that fact table which already exist on the FactTask. Leads me to believe that FactTask, DimSection and DimClassification should not exist and should be part of a DimTask table to store those attributes and thus treat DimTask as a fact table when needed.

What do you need to accompish? There is nothing inherently wrong with using dimensions in more than one fact table.

From what I understand, the FactTask table exists to record all possible tasks associated with a project, while the other fact table is to record hours posted against specific tasks for the project. If you need both to support user requirements, so be it. You are tracking two different things relating to the same set of dimensions... that's just a normal day at the office. Two fact tables is the correct approach.

dkbunkers Tue Apr 06, 2010 5:40 pm

Thanks for input. Trying to weigh the pros/cons of dimensional theory and shared dimensions between the 2 facts (Task itself and logged work time on the task) against the the maintenance of having 10-15 dimensions hanging off FactTask that also need to hang off of FactTaskWorkedHours since I care about those dimensions as attributes of the worked hours (since they are attributes of the task itself). The relational oltp side of me things life is simpler if DimTask exist with all the "descriptors" within DimTask itself versus de-normalizing those same attributes as part of many dimensions any time I want to measure something that is related to the task as well). So you would agree that dim theory says that if/when a new Task Attritbute(s) is created in OLTP system that I'd be pulling an new dimension and adding keys back to dimension on both FactTask and FactTaskWorkedHours?

seadog2010 Fri Apr 09, 2010 10:17 am

I'm not quite sure this was covered above so I'll put this out for discussioni:

What about the scenario where one business requirement might be best satisfied by defining a factless table, yet another best be satisfied by a dimension?

Example: Policy. I could go either way with this one. It may be more efficient to decompose all elements of a policy (status, broker, etc) into various dimensions. It equally makes sense to make Policy a conformed dimension, allowing you to then attach it to multiple fact tables when there's interest in measuring premium (or case counts, or elig lives counts, etc).

* * *
1. Any general guidelines that would tell me when a concept (like Policy) be a factless table or a dim?
2. Can both a Policy_Factless_Fact and a Policy_Dim coexist in a system, depending on their usage? Pros and cons?

-Seadog

dkbunkers Fri Apr 09, 2010 10:55 am

Exactly the scenario I was trying to futher understand regarding best practices or guidelines , although worded much better.

seadog2010 Fri Apr 09, 2010 11:14 am

Thanks for the feedback.

Most dimensions to me are clear-cut - the static, 'reference' ones - status, priority, jurisdiction, occupation, and the 'master data' ones - customer, product, etc.

The ones I struggle with (those that have been implemented prior to my joining the company) are the ones more 'transactional' in nature, implemented as dimensions, and are quite volatile and thereform FCD's. It might just be the way they were modeled that is causing them to be FCD's (i.e., they contain multiple cols (paid-to-date, policy status date) that to me are transactional and therefore very volatile. The dimensional models were orig built by developers (not architects) familiar with relational, not dimensional, modeling, so I'm having to undo years of muddy design and understanding. When all dates are put into dimensions, you have little choice but to link dimension to dimension (the dreaded snowflake) in order to take advantage of the robust attributes of the Date dimension.

To me, a policy is a transactional concept. Pulling the various components together - customer, product, benefits, broker, plan, etc - seems to be the purpose of a factless table. So many of these components rely on dates (prod eff date, broker eff date, benefit eff date, etc) that are constantly in flux. On the other hand, it could also be considered a living entity, like a Customer, that changes over time.

Yes, another day at the office. I just want to get a better sense of the 'Great Taste - Less Filling' debate of when a concept (like Policy) is a transaction and when is it a dimension..... and whether the two can coexist when they may represent the same thing.

-Seadog

seadog2010 Mon Apr 12, 2010 10:27 am

ngalemmo;

any comments on the above?

-Seadog

ngalemmo Mon Apr 12, 2010 12:05 pm

The basic function of a fact table is to record business activity or business states. The relationship between policy and broker, for example, is a business state. If policy is a conforming dimension, then if the intent is to track things, such as, which policies a broker carries, then a factless fact table makes sense.

But, if 'policy' actually represents a unique combination of products, deductables, riders and other conditions specific to a sale, then from a sales perspective, policy (like order number) is a degenerate dimension. The sales fact would contain a combination of dimensions and facts relating to what makes up the policy. In such a situation, tying the definition of a policy to a sale (or quote) make sense because a policy (in this context) does not exist unless a sale is made.

seadog2010 Mon Apr 12, 2010 1:51 pm

The statement that a fact table functions to “record business activity or business states” is key.

Most certainly a Policy represents a unique combination of products, deductibles, riders, etc. It also contains strong transactional aspects as well (changing status, billing dates, etc). Your note seems to suggest that Policy could be modeled either way, as a factless table or as a conformed dimension. Could it be both?

I'm leanding towards Policy as a data mart, with fact table and surrounding component dimensions, rather than as an individual conformed dimension. If used as a DD in other fact tables, Policy ID could be used to drill-across to this Policy Data Mart for all info about that Policy. This takes care of the volatile dates within the dimension. Would you agree with this approach?

But, are there ever times where it makes sense to model Policy as both a data mart (factless or with facts, like premium) and as a conformed dimension, for purposes depending on the query? Or is this overkill in your mind?

Thanks.

-Seadog

ngalemmo Mon Apr 12, 2010 2:32 pm

If a policy is the insurance equivalent of an order, trying to make a dimension of it, to represent all the characteristics of a policy, would drive you crazy... literally. Sure, there is a policy number, but that is about it. Treat it as a degenerate dimension for the fact tables.

What matters is the fact tables and what they represent. It would seem to me there would be multiple facts at different grains representing different events and states to cover the full lifecycle. I wouldn't try for some uber-table that condenses the entire lifecycle until I have at least built all the components.

seadog2010 Mon Apr 12, 2010 3:45 pm

Thanks, I think I'm straight on this now.

-Seadog

Sponsored content

Factless fact table versus Dimension (downstream implications)

Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)

Re: Factless fact table versus Dimension (downstream implications)