Modeling of dimensions based on location

View previous topic View next topic Go down

Modeling of dimensions based on location

Post  kjfischer on Wed Jun 08, 2011 3:56 pm

The fact table is based on invoice_detail data which includes the measures of quantity_shipped and invoice_amount.

The dimensional attributes associated with the invoice are part, customer, channel, business_coordinator, shipto_location.

The business_coordinator is an employee who is assigned to a zip, state, country (location). This is their territory.

The customer has address with zip, state, country (location)

The shipto_location has address with zip, state, country (location).

There also is a concept of geographic regions that rollup from country to country_groups to regions.

I can see the desire to rollup sales by customer location all the way up to geographic region. Also, the same for sales by shipto location up to a geographic region. Would this make sense to create a geo_location dimension that rolls up from zip code, state, country, country_group, region? Would you include that geo_location_id in the customer_dim table and the invoice_fact table? Or would it make sense to create a location dimension which just included zip, state, and country? And that could be referenced in another dimension which would rollup to country_group and then region?

The business_coordinator dimension seems different because the business_coordinator employee is only assigned a zip, state, country territory and it changes.

So, it seems like I would want to know the business_coordinator assigned to the invoice sale, but this doesn't rollup to a geogrphic region.

Any input is appreciated.

kjfischer

Posts : 28
Join date : 2011-05-04

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  Bob Probst on Wed Jun 08, 2011 4:14 pm

It sounds like you're wanting to snowflake here, I'd avoid it.

I would definitely join only through the fact table and not include the location ID in the customer or shipto dims.

So one location Dim will have 2 FKs in the fact: customer location id and shipto location id.

I see no reason not to do the same with the coordinator if you suspect that people will want to drill by Coordinator geography.

Bob Probst

Posts : 18
Join date : 2010-05-26

View user profile http://datajuggler.blogspot.com/

Back to top Go down

Re: Modeling of dimensions based on location

Post  LAndrews on Wed Jun 08, 2011 6:48 pm

This is the power of the dimensional model - you don't need to worrk about duplication of attributes ... the key is to keep them conformed.

I'd conform the location attributes (zip, state, country, country_group, region) across all three dimensions (business_coordinator, customer, location.


LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  hang on Thu Jun 09, 2011 3:33 am

This is one of few cases where you should snowflake your dimension. Kimball called it Location Outrigger. The rational is you donít want to maintain location related attributes in many primary dimension tables if the changes within location structure need to be applied consistently.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  Bob Probst on Thu Jun 09, 2011 11:02 am

hang wrote:This is one of few cases where you should snowflake your dimension. Kimball called it Location Outrigger. The rational is you donít want to maintain location related attributes in many primary dimension tables if the changes within location structure need to be applied consistently.
But why would you choose to snowflake that instead of just putting the Location Key in the fact table?

Bob Probst

Posts : 18
Join date : 2010-05-26

View user profile http://datajuggler.blogspot.com/

Back to top Go down

Re: Modeling of dimensions based on location

Post  hang on Fri Jun 10, 2011 12:52 am

Please refer to Kimballís dimensional modeling book, chapter 10, page 226. However Kimball did not say location should not be a FK in the fact table. I guess in case of monster dimension, most of SCD 2 attributes, if not all, should be linked through fact table, monster dimension treatment should be considered as exception, as the performance becomes more critical, rather than general guideline .

My understanding on dimensional modeling is, there are two primary goals, ease of use and performance. To achieve these two goals, denormalised dimension and normalised fact tables set the main theme but with proper balance. Generally, fact tables are deep and dimension tables are shallow except for monster dimension. So query for relationship through fact tables is much more costly and less obvious than through a single self contained dimension table or dimension with outrigger. If the location attributes are not shared by many other dimensions, then it should be denormalised into the main dimension table. Since the location attributes are normally shared by many dimensions, the maintenance becomes a major consideration as part of ease of use, and therefore a location outrigger is a properly balanced arrangement, eliminating a highly repeated FK in the deep fact table. Unnecessary FKs could be the main cause of Centipede fact table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  hang on Fri Jun 10, 2011 5:38 pm

Bob Probst wrote:But why would you choose to snowflake that instead of just putting the Location Key in the fact table?
Bob, there is a crucial point that I forgot to highlight in the pervious post. The dimension relationship reflected by fact table may not give you a full picture as not all the dimension records will appear in the fact even though they truly exist in the relevant dimensions.

Now you may ask why the treatment to monster dimensions is valid. I think it is a trade-off between performance and dimension self-containess/completeness. Fact driven dimension relationship will still work when dimensions are only used to provide contexts for facts, and that would suffice most BI requirements.

However I can still think of a couple of concepts introduced by Kimball to address the issues left by the trade-off, transaction dimension and periodic factless snapshot table dedicated to keeping dimension relationships. But I won't elaborate the technique here as you may refer to Kimball's book to get the full gist of the concepts.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  Bob Probst on Mon Jun 13, 2011 9:27 am

Thanks for the elaboration Hang. I was just taking the most straight-forward approach as it occurred to me.

There were no performance concerns and nothing in his post suggested any over riding need to snowflake. I've found that associating dimensions through the fact table to simplify the ETL process and will simplify the process if they want to aggregate on any of those location dimensions.

You're referring to some very specific solutions (and have prompted me to dig out my Kimball books for the first time in a couple year!) and you might be right on the mark -- but I tend to favor simplicity over complexity unless performance dictates the latter.

Bob Probst

Posts : 18
Join date : 2010-05-26

View user profile http://datajuggler.blogspot.com/

Back to top Go down

Re: Modeling of dimensions based on location

Post  kjfischer on Mon Jun 13, 2011 4:43 pm

Thank you for your comments. It did become apparent to me that I could/should have a single location dimension which rolls up from zipcode to a geographic region. And link to the location dimension from the invoice_fact with a customer_location and a shipto_location.

And, after giving it more thought, I think the Business_coordinator is probably a separate dimension. A business_coordinator is assigned a zip, state, and country. But, it doesn't make sense to rollup to a world region. Additionally, the assignments will likely change routinely as with many sales / territory assignments lending to a type 2 dimension whereas the location dimension is more static.

kjfischer

Posts : 28
Join date : 2011-05-04

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  kjfischer on Thu Jun 23, 2011 12:37 pm

I have a followup question to my original regarding how to model the business_coordinator associated with the sales fact. While there is always a shipto_location associated with the sale, there is not always a business_coordinator, i.e not every geogrpahic location has a business_coordinator assignment. I can see the need to query sales by shipto_location and I will have this info from zipcode upto world geographic region based on the data provided on the order. I can also see the need to query by business_coordinator assignment who may be assigned to a zipcode, state, country. But, as mentioned, there are many geographic regions who will not have an assignment. Will this be ackward to have so many sales facts linked to a "unassigned" row in a dimension?

kjfischer

Posts : 28
Join date : 2011-05-04

View user profile

Back to top Go down

Re: Modeling of dimensions based on location

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum