Derived Dimension
3 posters
Page 1 of 1
Derived Dimension
Let's say I have these entities from my source system:
- sales - grain is employee, date, and product
- budget - grain is department, year, quarter
- department
- employee - employee belongs to a department
- product
- product attributes - one product has multiple attributes records for the changes throughout its life, sort of like SCD type 2 tracked in the source system with start and end dates
I'm designing a data mart to track sales, budgeting, and the product attributes changes business processes.
The product attributes are separated because it changes more frequently than the product dimension so it makes sense to have it in a separate dimension. It also makes sense not to flatten department into employee because it is used in the budgeting business process for my conformed dimension.
Now, for my sales fact, do I want to include department key and product attributes? The product attributes is determined by the date the sale occur and the product sold. In my real-life scenario I'll have quite a few of these types of derived date sensitive dimensions.
For the department key, if there is too much columns already in my fact table I know I have the option to snowflake, but how much is too much? Its growing at a linear rate currently at 4M records a year. Is it wise to just have the minimal grain defining dimensions, and use views or OLAP cube to bring in the additional dimensions? Sounds too much relational OLTP design to me.
- sales - grain is employee, date, and product
- budget - grain is department, year, quarter
- department
- employee - employee belongs to a department
- product
- product attributes - one product has multiple attributes records for the changes throughout its life, sort of like SCD type 2 tracked in the source system with start and end dates
I'm designing a data mart to track sales, budgeting, and the product attributes changes business processes.
The product attributes are separated because it changes more frequently than the product dimension so it makes sense to have it in a separate dimension. It also makes sense not to flatten department into employee because it is used in the budgeting business process for my conformed dimension.
Now, for my sales fact, do I want to include department key and product attributes? The product attributes is determined by the date the sale occur and the product sold. In my real-life scenario I'll have quite a few of these types of derived date sensitive dimensions.
For the department key, if there is too much columns already in my fact table I know I have the option to snowflake, but how much is too much? Its growing at a linear rate currently at 4M records a year. Is it wise to just have the minimal grain defining dimensions, and use views or OLAP cube to bring in the additional dimensions? Sounds too much relational OLTP design to me.
rnb- Posts : 6
Join date : 2010-09-21
Re: Derived Dimension
You don't need to include the department key in your sales fact table, as you can join to your Budget table via Department in your Employee dimension.
If your product table has fast changing attributes, then a Product Attributes Mini-dimension is perfectly acceptable.
If your product table has fast changing attributes, then a Product Attributes Mini-dimension is perfectly acceptable.
Re: Derived Dimension
Hi John, thanks for the reply. I think I almost got it. My product attributes dimension will actually not have any reference to product at all. In fact it will only have the distinct attribute values used which means it will be smaller number of records compared to the product attributes table in the source system because of some record reuse. I would not populate all the different combinations a-la mini dimension due to the high cardinality of some of the attribute values. Some are free-form comments. Are there any design guidelines around this issue?
rnb- Posts : 6
Join date : 2010-09-21
Re: Derived Dimension
If some of the attributes in the mini dimension have high cardinality, they might defeat the purpose of mini dimension by making it not so mini. If the free-form comments are specific to the product, they probably should be in product dimension. However if the comments are specific to the sales, create its own dimension and have FK in the fact table especially when they are sparsely populated or rarely used.
The key point is you can always use fact table to reflect the correlations between attributes if any fast changing attribute is the cause of explosive size of an SCD dimension. I guess the purpose of the mini dimension is twofold. It can keep the big main dimension reasonably static, as it moves away all the fast changing attributes. The mini dimensions also provide a light entry points in the fact for those highly useful attributes that would only be obtained otherwise by joining to much bigger main dimensions which is important for high level aggregates from performance perspective.
The key point is you can always use fact table to reflect the correlations between attributes if any fast changing attribute is the cause of explosive size of an SCD dimension. I guess the purpose of the mini dimension is twofold. It can keep the big main dimension reasonably static, as it moves away all the fast changing attributes. The mini dimensions also provide a light entry points in the fact for those highly useful attributes that would only be obtained otherwise by joining to much bigger main dimensions which is important for high level aggregates from performance perspective.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Derived Dimension
The attributes are product specific, but also more frequently changing, that is why I've separated them into a different dimension. They can be for instance, the product description, or the promotional comments at that point in time. And there's a lot of them.
If I decide I want to keep them in a separate dimension, I would not populate them with all possible combinations of the values having high cardinality. Should I at least get the distinct values, or drop whatever the source system has with the reference to the main product dimension?
Here's an oversimplified example with only one attribute:
Example 1 - Simpler ETL
Or
Example 2 - Less space in data warehouse, but no direct relationship to the product dimension
If I decide I want to keep them in a separate dimension, I would not populate them with all possible combinations of the values having high cardinality. Should I at least get the distinct values, or drop whatever the source system has with the reference to the main product dimension?
Here's an oversimplified example with only one attribute:
Example 1 - Simpler ETL
- Code:
ProductAttributeKey ProductKey StartDate ProductDescription
===================================================================================
1 ABC 1/1/2010 This is our best product yet!!!
2 DEF 1/1/2011 This is our best product yet!!!
Or
Example 2 - Less space in data warehouse, but no direct relationship to the product dimension
- Code:
ProductAttributeKey ProductDescription
==============================================================
1 This is our best product yet!!!
rnb- Posts : 6
Join date : 2010-09-21
Re: Derived Dimension
The comments seem to be sales specific as they are about products at the time of sales. I would have a standalone dimension for the comments only and keep distinct values in the table with unique surrogate key (SK). You could have a product nature key (product ID), instead of its SK, in the comment dimension so that you can string comments together by product ID without joining to the fact. However, you don’t need StartDate in the comment dimension as the accurate comment changes for any product are tracked in the sales fact table with FKs for date, product and comment dimensions.
hang- Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia
Re: Derived Dimension
hang,
You are right that they are specific point in time data, but they are not only applicable to sales. For instance, in shipping or any other fact with a date grain we may want to retrieve these attributes. That is the reasoning behind me creating a dimension, instead of placing them directly in the fact table as degenerate dimension.
Even if I have the natural key of product in the comments dimension, I'll still need a fact table in order to get the comment change transactions, where the fact to dimension is 1:1 or very close due to the high cardinality of the comments. I'm still separating them from the main product dimension to separate the relatively static attributes and the more frequently changing ones. This is the solution that I'm leaning towards right now. From what I read in various forums this is acceptable.
Thanks a lot for your help, this is really helpful for me.
You are right that they are specific point in time data, but they are not only applicable to sales. For instance, in shipping or any other fact with a date grain we may want to retrieve these attributes. That is the reasoning behind me creating a dimension, instead of placing them directly in the fact table as degenerate dimension.
Even if I have the natural key of product in the comments dimension, I'll still need a fact table in order to get the comment change transactions, where the fact to dimension is 1:1 or very close due to the high cardinality of the comments. I'm still separating them from the main product dimension to separate the relatively static attributes and the more frequently changing ones. This is the solution that I'm leaning towards right now. From what I read in various forums this is acceptable.
Thanks a lot for your help, this is really helpful for me.
rnb- Posts : 6
Join date : 2010-09-21
Similar topics
» bridge table and junk dimension on customer dimension (bank/credit union)
» Derived fact tables, aggregation and views
» Should rule-derived columns go into the fact table?
» Derived Fact table with additional measures / foreign keys ... ?
» How to create fact table with measures derived from comparing two fact table rows
» Derived fact tables, aggregation and views
» Should rule-derived columns go into the fact table?
» Derived Fact table with additional measures / foreign keys ... ?
» How to create fact table with measures derived from comparing two fact table rows
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|