Textual values in dimension tables

View previous topic View next topic Go down

Textual values in dimension tables

Post  emonchen on Thu Feb 11, 2010 3:25 pm

With much interest I am reading "The Data Warehouse Toolkit", and I stumble upon a design thing that I have done differently in my Data Warehouse design (this is described in the book on page 42). It describes about the dimension table that it can contain a lot of columns with the text values of any categorization, in the books' example the product category, brand and department. In my Data Warehouse, I have created extra tables with a referential integrity so I have the description stored only once. I did not decide this to save space because yes, the dimension tables hardly take up space. But what if for example a department name changes from "Bakery" to "Baked goods"? In my situation I’d have to change only one text field, and in the books' example, all fields that contains 'Bakery' have to be changed to 'Baked goods'.

I also have multiple levels in my hierarchy, so for every level, I have created an extra table with a PK-FK relationship.
I am still in the early stages of the Data Warehouse design, so hopefully someone could tell me if I should change my design to what is illustrated in the book and why that's in much favor not just to save space.

Thanks!

emonchen

Posts : 12
Join date : 2010-02-11
Age : 39
Location : Delft, The Netherlands

View user profile

Back to top Go down

Re: Textual values in dimension tables

Post  VHF on Thu Feb 11, 2010 6:03 pm

One of the hardest things in moving from relational DB design to dimensional DW modeling is letting go of the temptation to apply 3NF design to the DW! I have had to stop myself from doing this repeatedly.

So why should we repeat data in the dimension tables rather than factor it out with relational design? I don't have enough DW experience under my belt to provide a definitive answer, but key reasons include (1.) improving query performance by eliminating joins and (2.) making the DW simple for users to query.

Although query performance can vary by platform, in general a simple star schema--with the fact table in the middle and a single join to each dimension table--produces the fastest results. The star schema keeps queries from needing to be overly complex and allows end users and BI tools to utilize the DW. Not many end users will understand a query that involves multiple relational "hops" (joins), and BI tools that are expecting a star schema may not be able to follow it either.

From a maintenance standpoint, you should not be manually maintaining dimension values in the DW. Values (such as the "Bakery" department name) should either be derived from a source system (via an ETL process) such that changes are propagated automatically--or if that is not possible, add a maintainable reference table in your DW "kitchen" from which the dimension values can be loaded. This would give you a place to maintain the value in a single place.


Last edited by VHF on Fri Feb 12, 2010 3:33 pm; edited 1 time in total (Reason for editing : clarity)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Textual values in dimension tables

Post  emonchen on Tue Feb 16, 2010 3:03 am

Thanks for your reply.

The issue that I am facing at the moment is the following. I have a dimension called "Product", that contains attributes to higher levels called "Product Assortment" and "Assortment Classification". The hierarchy is then as follows:

Assortment Classification -> Product Assortment -> Product

In the dimension table, I have a product key and the surrogate key is the product ID of my source system. The problem is, with some business processes, in the fact table I have the product as the fact, but sometimes the Product Assortment is the lowest level in the fact table. That would mean I have to link the Product Assortment level to the fact table. But that can't be linked to the Product Assortment field in the dimension table because it's just a textual field and not the primary key.

I am doubting if I should keep two dimension tables, one for the Product, and one for the Product Assortment. That way I can link the appropriate tables to the fact tables, but then again I am diverting from the star schema.

Thanks.

emonchen

Posts : 12
Join date : 2010-02-11
Age : 39
Location : Delft, The Netherlands

View user profile

Back to top Go down

Re: Textual values in dimension tables

Post  VHF on Tue Feb 16, 2010 11:08 am

Two fundamental principles of the Kimball method are "star schema" and "declare the grain of the fact table." It appears your main design challenge is avoiding a mixed grain fact table--sometimes you have facts attributable to a specific product, but other times the fact is only attributable to a product assortment.

One approach is to create records in your dimension table specifically to handle facts that are not attributable to a specific product. Let’s look at an example product dimension table:

SKProduct NoProduct DescClassificationAssortment
11234Large Wool Men's SweaterClothesSweaters
21235Small Wool Men's SweaterClothesSweaters
3GeneralGeneralClothesSweaters
45678Large Men's Wool SocksClothesSocks
55679Medium Men's Wool SocksClothesSocks
6GeneralGeneralClothesSocks

Any fact that applies to a specific product would have a FK pointing to the SK for that specific product. Any fact that applies only at the assortment level would use the SK for “General” entry for that assortment. (This can be called “Unspecified” or “N/A” or whatever is appropriate.)

So how does this design affect reporting? Any reports summarized at the Assortment level or above will produce the expected aggregates. Any reporting that drills down to the individual product level will include values for the “General” facts in addition to those that apply to a specific product. It would be wise to review with your business users if this is the desired/acceptable output.

(BTW—just to clarify, in Kimball terminology the SK (surrogate key) is the arbitrary key assigned by the DW as opposed to the business key (or natural key) that comes from the source system.)

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Create Surrogate Keys for Rollup Points in the Hierarchy

Post  Jeff Smith on Tue Feb 16, 2010 6:56 pm

It is possible to have multiple dimension keys in a dimension table. If you have a hierarchy with 3 levels, surrogate or dimension key can be created for each of the 3 levels. This is very helpful when creating aggregate tables.

An example would be a Date Dimension. There would be a surrogate key for each date. But if you need to aggregate data to the Year Month, you could create a surrogate key for each Year month combination. This could also be done for Year Quarter.

The next step is to create views from the Dimension table for the desired levels, using a Select distinct. The views would be the Year Month Dimension View (which would capture the Year Month related columns, the year Quarter, and so forth).

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Textual values in dimension tables

Post  ngalemmo on Wed Feb 17, 2010 1:30 pm

It's not clear from your description what facts you are collecting and what a 'product assortment' represents. Are these sales and is a product assortment multiple products sold as a single unit? If that is the case, if you need to report sales as is AND as individual products you need to treat product as a multivalued dimension with a bridge table containing the product key as sold, the component product and a weighting factor to distribute sales. The product dimension would contain all SKUs and you would use the bridge only when performing component level analysis.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Textual values in dimension tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum