Derived Dimension

rnb Mon Apr 04, 2011 7:28 pm

Let's say I have these entities from my source system:
- sales - grain is employee, date, and product
- budget - grain is department, year, quarter
- department
- employee - employee belongs to a department
- product
- product attributes - one product has multiple attributes records for the changes throughout its life, sort of like SCD type 2 tracked in the source system with start and end dates

I'm designing a data mart to track sales, budgeting, and the product attributes changes business processes.
The product attributes are separated because it changes more frequently than the product dimension so it makes sense to have it in a separate dimension. It also makes sense not to flatten department into employee because it is used in the budgeting business process for my conformed dimension.

Now, for my sales fact, do I want to include department key and product attributes? The product attributes is determined by the date the sale occur and the product sold. In my real-life scenario I'll have quite a few of these types of derived date sensitive dimensions.

For the department key, if there is too much columns already in my fact table I know I have the option to snowflake, but how much is too much? Its growing at a linear rate currently at 4M records a year. Is it wise to just have the minimal grain defining dimensions, and use views or OLAP cube to bring in the additional dimensions? Sounds too much relational OLTP design to me.

John Simon Tue Apr 05, 2011 10:49 pm

You don't need to include the department key in your sales fact table, as you can join to your Budget table via Department in your Employee dimension.

If your product table has fast changing attributes, then a Product Attributes Mini-dimension is perfectly acceptable.

rnb Fri Apr 08, 2011 12:42 pm

Hi John, thanks for the reply. I think I almost got it. My product attributes dimension will actually not have any reference to product at all. In fact it will only have the distinct attribute values used which means it will be smaller number of records compared to the product attributes table in the source system because of some record reuse. I would not populate all the different combinations a-la mini dimension due to the high cardinality of some of the attribute values. Some are free-form comments. Are there any design guidelines around this issue?

hang Fri Apr 08, 2011 10:48 pm

If some of the attributes in the mini dimension have high cardinality, they might defeat the purpose of mini dimension by making it not so mini. If the free-form comments are specific to the product, they probably should be in product dimension. However if the comments are specific to the sales, create its own dimension and have FK in the fact table especially when they are sparsely populated or rarely used.

The key point is you can always use fact table to reflect the correlations between attributes if any fast changing attribute is the cause of explosive size of an SCD dimension. I guess the purpose of the mini dimension is twofold. It can keep the big main dimension reasonably static, as it moves away all the fast changing attributes. The mini dimensions also provide a light entry points in the fact for those highly useful attributes that would only be obtained otherwise by joining to much bigger main dimensions which is important for high level aggregates from performance perspective.

rnb Mon Apr 11, 2011 1:02 pm

The attributes are product specific, but also more frequently changing, that is why I've separated them into a different dimension. They can be for instance, the product description, or the promotional comments at that point in time. And there's a lot of them.

If I decide I want to keep them in a separate dimension, I would not populate them with all possible combinations of the values having high cardinality. Should I at least get the distinct values, or drop whatever the source system has with the reference to the main product dimension?

Here's an oversimplified example with only one attribute:
Example 1 - Simpler ETL

Code:: ProductAttributeKey ProductKey StartDate ProductDescription =================================================================================== 1 ABC 1/1/2010 This is our best product yet!!! 2 DEF 1/1/2011 This is our best product yet!!!

Or

Example 2 - Less space in data warehouse, but no direct relationship to the product dimension

Code:: ProductAttributeKey ProductDescription ============================================================== 1 This is our best product yet!!!

hang Mon Apr 11, 2011 7:47 pm

The comments seem to be sales specific as they are about products at the time of sales. I would have a standalone dimension for the comments only and keep distinct values in the table with unique surrogate key (SK). You could have a product nature key (product ID), instead of its SK, in the comment dimension so that you can string comments together by product ID without joining to the fact. However, you don’t need StartDate in the comment dimension as the accurate comment changes for any product are tracked in the sales fact table with FKs for date, product and comment dimensions.

rnb Mon Apr 11, 2011 10:49 pm

hang,
You are right that they are specific point in time data, but they are not only applicable to sales. For instance, in shipping or any other fact with a date grain we may want to retrieve these attributes. That is the reasoning behind me creating a dimension, instead of placing them directly in the fact table as degenerate dimension.
Even if I have the natural key of product in the comments dimension, I'll still need a fact table in order to get the comment change transactions, where the fact to dimension is 1:1 or very close due to the high cardinality of the comments. I'm still separating them from the main product dimension to separate the relatively static attributes and the more frequently changing ones. This is the solution that I'm leaning towards right now. From what I read in various forums this is acceptable.
Thanks a lot for your help, this is really helpful for me.

Sponsored content

Derived Dimension

Derived Dimension

Re: Derived Dimension

Re: Derived Dimension

Re: Derived Dimension

Re: Derived Dimension

Re: Derived Dimension

Re: Derived Dimension

Re: Derived Dimension