Star vs Snowflake with many different attributes
4 posters
Page 1 of 1
Star vs Snowflake with many different attributes
My fact table has many different code attributes ( > 20) that need translation descriptions. Would it be best to keep these as individual keys(dimensions) on the Fact table to keep the Star schema or break them off into a snowflake dimension? Either way I would end up with 20 dimensions or should they all be contained in one dimension? I am trying to keep performance in mind as I build this model.
JoyMcCall- Posts : 3
Join date : 2014-04-24
Re: Star vs Snowflake with many different attributes
Are these different attributes or a single attribute that is multi-valued or a combination of the two?
Is there a fixed number of attributes that a fact could have (even if some of them may be null for a particular fact record)?
Do you already have a design for your fact/dimensions - your post suggests that you do but in that case I'm confused why adding text descriptions to codes would result in a change to your design?
If you could provide a bit more detail around exactly what you are modelling it would help when trying to come up with some solutions. Thanks
Is there a fixed number of attributes that a fact could have (even if some of them may be null for a particular fact record)?
Do you already have a design for your fact/dimensions - your post suggests that you do but in that case I'm confused why adding text descriptions to codes would result in a change to your design?
If you could provide a bit more detail around exactly what you are modelling it would help when trying to come up with some solutions. Thanks
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Star vs Snowflake with many different attributes
I am creating the data model now for our OLTP system. it contains a Fact for an Insurance Policy that has attributes such as Billing Method, Company Code, Permission To Order Credit, Reinstatement Type Code, Print Option, Transaction Reason Code. These attributes are codes - many different values, for each transaction in the Fact. They have translations/descriptions associated with them. thanks for your help!
JoyMcCall- Posts : 3
Join date : 2014-04-24
Re: Star vs Snowflake with many different attributes
let me clarify. I am creating a data warehouse from the OLTP system and creating the data model for the new dimensional data warehouse. The model is not for the OLTP system.
JoyMcCall- Posts : 3
Join date : 2014-04-24
Re: Star vs Snowflake with many different attributes
When you have a bunch of codes that have no particular place to go, common practice is to combine them into one or more junk dimensions. The natural key to the dimension is the codes themselves, so you would have one row for each unique combination of codes.
The number of dimension tables is driven by the cardinality and correlation of the codes. One table for each code is too many, and one table for all the codes is probably too few. If the cardinality is very low, combining uncorrelated columns is not a big deal as you would not encounter too many combinations. If cardinality is high, combining it with highly correlated columns will not result in excessive numbers of combinations. You only create rows for combinations that actually exist.
As far as the descriptions of the codes go, you will need a list of values table(s) to support ETL so you can populate the description when adding a new row in the junk dimension, as well as maintaining the dimension should there be a change in a description.
The number of dimension tables is driven by the cardinality and correlation of the codes. One table for each code is too many, and one table for all the codes is probably too few. If the cardinality is very low, combining uncorrelated columns is not a big deal as you would not encounter too many combinations. If cardinality is high, combining it with highly correlated columns will not result in excessive numbers of combinations. You only create rows for combinations that actually exist.
As far as the descriptions of the codes go, you will need a list of values table(s) to support ETL so you can populate the description when adding a new row in the junk dimension, as well as maintaining the dimension should there be a change in a description.
Re: Star vs Snowflake with many different attributes
OK - I have a better understanding now. My suggestions would be:
1) Codes/descriptions - if both are meaningful to your end users then include them both. If only the descriptions are meaningful to end users then whether you include the codes is a decision you would take based on things like technical support being easier for you if they are there - but you would hide them from the end users by using DB Views and/or configuring your reporting toolset
2) There is nothing technically wrong with having them all (codes and descriptions) in a single Dimension - a dimension can have 100s of columns. Whether it makes sense to group them into a single dimension or split them really depends on how correlated/related they are.
One of the examples you give is "company code" - does this mean that there is a concept of "company" in your model with attributes besides this code? If so then it may be a good design decision to split these out into a separate Company Dimension
Billing Method, Reinstatement Type Code, Transaction Reason Code all sounds like attributes of the transaction event and so might go in a Transaction Dimension
Attributes of the Policy itself (type of policy, etc.) might go in a Policy Dimension
Hope this helps?
1) Codes/descriptions - if both are meaningful to your end users then include them both. If only the descriptions are meaningful to end users then whether you include the codes is a decision you would take based on things like technical support being easier for you if they are there - but you would hide them from the end users by using DB Views and/or configuring your reporting toolset
2) There is nothing technically wrong with having them all (codes and descriptions) in a single Dimension - a dimension can have 100s of columns. Whether it makes sense to group them into a single dimension or split them really depends on how correlated/related they are.
One of the examples you give is "company code" - does this mean that there is a concept of "company" in your model with attributes besides this code? If so then it may be a good design decision to split these out into a separate Company Dimension
Billing Method, Reinstatement Type Code, Transaction Reason Code all sounds like attributes of the transaction event and so might go in a Transaction Dimension
Attributes of the Policy itself (type of policy, etc.) might go in a Policy Dimension
Hope this helps?
nick_white- Posts : 364
Join date : 2014-01-06
Location : London
Re: Star vs Snowflake with many different attributes
If the business uses the code to drill down to the fact table, then add as a standalone dimension. If they just use it for reporting (not an entry point), then a junk dimension will work fine.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Snowflake or Star Schema?
» Converting Snowflake to Star
» Employee compensation - Star or Snowflake?
» Customer addresses in a high volume retail environment
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
» Converting Snowflake to Star
» Employee compensation - Star or Snowflake?
» Customer addresses in a high volume retail environment
» Dimension Attributes and Fact attributes storing same data in multiple data marts??
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|