Snowflaking and hierarchies

View previous topic View next topic Go down

Snowflaking and hierarchies

Post  sr123 on Thu May 03, 2012 5:30 pm

I'm having to make complicated (at least I think so!) design decisions and am hoping to hear others' opinions regarding this -

So I have 4 hierarchies that have been identified by the business guys, A-->B-->C ; C-->D ; C-->E and C-->F (--> being 1 to many here) ie. for instance if A is location, B is data center, C is room, and D, E and F lets say are things in or pertaining to the room. Therefore, a location could have many data centers which in turn can have many rooms; similarly a room can have more than 1 storage and it can have more than 1 server etc. Each of these are individual tables at the source and have been identified as dimensions for DW and I think the way to build these into the model is to snowflake them!? Am I right in thinking so? These are all tables with lot of columns and individual attributes, and also these are reported against pretty independently in the source system today, hence I don't think combining theses hierarchical tables into a single dimension is the way to handle this..the fact that dimension 'C' is a part of all these hierarchies seems confusing to implement..

Also, some of the reference (joining) columns in these tables are nullable and don't have any data..for instance, storage records without room column. How will these roll up in a hierarchy? Is it a good practice to replace these nulls with 'room not found' etc? Long post with lot of questions, but any help/advise is highly highly appreciated!!
Thanks in advance!

sr123

Posts : 10
Join date : 2012-03-05

View user profile

Back to top Go down

Re: Snowflaking and hierarchies

Post  ngalemmo on Thu May 03, 2012 6:29 pm

You have one hierarchy, not four. Its a location hierarchy, you have a location (c) that is within another location (b) and so on. The things in the location are not part of the hierarchy, they are things in a location. So if you are trying to keep track of the things, you have a fact table that indicates the location of the things. You have a bridge table for location to support the location hierarchy so you can roll up the things in different locations.

If you want one fact table to reference all things, you can handle the dimensions as a collection of subtype dimensions. To do this easily, it helps if the dimensions have mutually exclusive surrogate primary keys as it requires creation of a new dimension that represents each row in the other dimensions (total rows = sum of row count in each subtype dimension) and contains attributes common among the subtypes. If you have mutually exclusive keys, it allows this new dimension to have the same PK as the subtype dimensions, meaning you only need one FK in the fact. If you need to report equipment in general, use the new dimension and report common attributes. If you need to report a specific type of equipment, use the fact FK to reference that specific dimension. This query would only look at facts for that particular class of equipment since the sub-dimension would not contain rows for other types. There is no snowflaking.

Another alternative is to simply treat the various equipment types in a single 'equipment' dimension. This could be workable if there isn't a big difference between the attributes. You simply have a row with all possible attributes among the types. Attributes would be populated or null depending on the type of equipment. All because they appear separately in the operational system, it doesn't mean the cannot be integrated in the DW. It really depends on how different they are (unique attributes) and the use cases for the information.

If you do not have mutually exclusive keys, you need to carry two FKs on the fact. One for the new common dimension table and the second referencing the type specific dimension table. The common dimension should carry a type code to use as a filter for type specific queries. Queries do get messy, and are prone to error since the type specific surrogate keys may be duplicated in multiple dimension types (queries must filter or consider type when joining). But, still no snowflaking...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Snowflaking and hierarchies

Post  sr123 on Fri May 04, 2012 5:45 pm

Thanks for your response.

I forgot to mention that ALL the dimensions are SCD Type 2. The source system is an implementaion of ServiceNow internally here, which is an object-oriented model, where 8 of the 14 dimensions (entitities that user community would like to continue reporting on independently, like they do today) are basically inherited from another table, with additional columns added to each of them. So yes, there definitely are common columns amongst these tables. Btw, I cannot use this common table (inerestingly, it has a 'type' column, which would be the type code you're referring to below) since the counts do not add up.
After reading up on bridge tables, it seems as if that is not recommended for Type 2 correct?
Also, yes I may have gotten my hierarchies wrong, they are basically different paths inside one hierarchy
Location --> Cage_Room --> Rack and then Rack --> Network_Gear and Rack --> Storage and finally Rack --> Servers
Thing is there is absolutely no relationship between network_gear, storage and servers so it doesn't make sense to have these 3 keys in the same fact and hence just for this reason, I have 3 fact tables, network_fact, storage_fact and host_fact which would in addition have the other common dimension keys.

Since maintaining a combined dimension table (with 2 FKs on the fact, since I don't have mutually exclusive keys, they are all sequence numbers) for type 2 may get complex, any other suggestions on how I could design this without snowflaking them?

Thanks again!

sr123

Posts : 10
Join date : 2012-03-05

View user profile

Back to top Go down

Re: Snowflaking and hierarchies

Post  ngalemmo on Sun May 06, 2012 3:49 am

Why are all dimensions type 2? When have the attributes of a steel rack ever changed? And why would anyone care?

If you have one fact or three, it still doesn't matter. Make location type 1 and create a hierarchy bridge.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Snowflaking and hierarchies

Post  sr123 on Sun May 06, 2012 5:16 pm

All dimensions are type 2 because that's the biggest driver in trying to get the data into DW..to be able to look at historical data, which is the biggest shortcoming in the source system for users today. Rack attribues may not change that often, however the rack install status, for example from 'received' to 'live' is something they definitely want to look at as opposed to the last updated status. Infact, they want to be able to compare all these on a month to month basis, especially counts, so no. of racks in a given location with x status last month etc,.

sr123

Posts : 10
Join date : 2012-03-05

View user profile

Back to top Go down

Re: Snowflaking and hierarchies

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top


 
Permissions in this forum:
You cannot reply to topics in this forum