Relationship between scd hierarchies and then between fact tables

View previous topic View next topic Go down

Relationship between scd hierarchies and then between fact tables

Post  Claudia_CR on Mon Jan 09, 2012 6:25 pm

I have some SCD Type 2 dimension to be related hierarchically, as follows:

DIM_CATEGORY
Categ_ID Categ_Code C_DescriptionC_start_Date C_end_date C_active C_Other_data
1 C1CAT1.0 01/01/2011 05/03/2011 0 ZZZ
2 C1 CAT1.105/03/2011 08/20/2011 0 ZXY
3 C1CAT1.2 08/20/2011 1 PPP
4 C2CAT2.0 04/15/2011 1 LLL

DIM_SUBCATEGORY
Sub_ID Sub_code S_DescriptionS_start_date S_end_date S_active S_Other_dataCateg_Code
1 S1 SUB1.0 05/01/2011 10/25/2011 0 DsdsC1
2 S1 SUB1.1 10/25/2011 1 Sass C1
3 S2 SUB2.001/24/2011 02/25/20110 Qidiu C1
4 S2SUB2.1 02/25/2011 06/13/20110 PretC1
5 S2 SUB2.2 06/13/2011 1 LkdjC1
6S3 SUB3.1 04/21/2011 1Wwl C2

DIM_PRODUCT
Prod_ID Prod_codeP_Description P_start_date P_end_dateP_active P_Other_data Sub_Code
1 P1 PROD1.0 06/05/2011 1Rrrs S1
2P2 PROD2.0 07/13/2010 09/10/20110 Gfsr S1
3 P2 PROD2.109/10/2011 1 Pkjkj S1
4 P3 PROD3.005/10/2011 12/06/2011 1 PlllS2
5P4 PROD4.0 01/12/2011 1 Tsfs S3

DIM_TIME
Time_IDMonth Semester Year
1 011 2011
2 021 2011
3 031 2011
4 041 2011
5 051 2011
6 061 2011
7 072 2011
8 082 2011
9 092 2011
10 102 2011
11 112 2011
12 122 2011
13 011 2012

Also I have two fact tables related to Product dimension.

I need to relate the dimensions so that:

1) To see all subcategories by category
2) To see all products by subcategory and category
3) To see the lastest data of each hierchachy related to data in fact tables
4) To see data from fact tables, associated wiht 1, 2 and 3.
4) To show by month, the respective data from dimensions and facts according the "time". For example, for November 2011, if I want to see information (from fact tables) associated with Prod_Code=P1, I need to show the data product wich corresponds to Prod_Id = 1, category data which corresponds to Categ_Id =3 and the subcategory data which corresponds to Sub_Id=2.

I have found information about how to design these associations, but I always find examples where there is a single scd hierarchy level for dimensions, related with fact tables.

I would appreciate any idea how to solve this situation.


Last edited by Claudia_CR on Tue Jan 10, 2012 6:25 pm; edited 1 time in total (Reason for editing : missing prod_code)

Claudia_CR

Posts : 4
Join date : 2012-01-09

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  pcs on Tue Jan 10, 2012 5:58 pm

Hello,
Interesting problem...
I am curious why you have split out your hierarchies. (As opposed to including category and subcategory attribute values in the product dimension) Have you reasons for not consolidating the values into a single dimension?

pcs

Posts : 20
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Re: Relationship between scd hierarchies and then between fact tables

Post  Claudia_CR on Tue Jan 10, 2012 6:49 pm

I have split out my hierarchies because each one has its own history, ie, as I show in the example, the data of the categories may change without the data from the subcategories or products change, and vice versa. Also I can not associate a top-level hierarchy to a hierarchy of lower level, because the ranges of validity of each time may not coincide. For example, suppose that the data of the any category change and effective ranges are as follows:

01/01/2011 - 06/15/2011 --> Category Data 1
06/05/2011 - to present --> Category Data 1.1

And then suppose the data of the associated subcategory is:

03/05/2011 - to present --> Subcategory Data 3

To represent changes in the category for all subcategories and associated products, I would have to create "n" records, and this will not show the data corresponding to every month.

My big problem is, how I can relate the changes between the hierarchies, how to represent this over time (to show the respective data by month) and also associate them with the facts?

Claudia_CR

Posts : 4
Join date : 2012-01-09

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  hang on Wed Jan 11, 2012 2:32 am

Based on Kimball's dimensional modeling, category and subcategory should be denormlised into product dimension, unless it's not hierarchical. The exact business case has be elaborated in the first 2 chapters in his Toolkit 2nd edition.

To sum up, category and subcategory are just two sets of attributes in product dimension which should be type 2 SCD if the hierarchical relationship change needs to be tracked over the time.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  sgudavalli on Wed Jan 11, 2012 4:35 am

This is a snow flake structure where you are trying to normalize the product dimension for a set of reasons.

what i would suggest is populate CAT_ID instead of CAT_CODE in DIM_Subcategory and in this way you are always referring to the proper version of the Category when you are adding\revising the SubCategory table..

and is the same case with PROUDCT, populate SUB_ID instead of SUB_CODE..

its enough for your fact to get associated to a product and time dimensions.... and using PRODUCT_ID instead of PRODUCT_CODE you can tie back proper revisions


sgudavalli

Posts : 29
Join date : 2010-06-10
Age : 32
Location : Pune, India

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  pcs on Wed Jan 11, 2012 7:37 am

OK - I think I follow - It might be easier to manage if you combine them all in one table, however given your constraints let me know if something like this might work:
requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim
additionally -any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement...

DATETIME_IDPROD_IDSUB_IDCATEG_IDMEASURES
1/31/2011 1 5 -1 -1 N
2/28/2011 2 5 -1 -1 N
3/31/2011 3 5 -1 -1 N
4/30/2011 4 5 6 4 N
5/31/2011 5 5 6 4 N
5/31/2011 5 4 4 1 N
6/30/2012 6 4 5 2 N
6/30/2011 6 1 1 2 N
6/30/2011 6 5 6 4 N
7/31/2011 7 4 5 2 N
7/31/2011 7 1 1 2 N
7/31/2011 7 2 1 2 N
7/31/2011 7 5 6 4 N
8/31/2011 8 4 5 3 N
8/31/2011 8 1 1 3 N
8/31/2011 8 2 1 3 N
8/31/2011 8 5 6 4 N
9/30/2011 9 1 1 3 N
9/30/2011 9 2 1 3 N
9/30/2011 9 3 1 3 N
9/30/2011 9 5 6 4 N
10/31/2011 10 4 5 3 N
10/31/2011 10 1 2 3 N
10/31/2011 10 3 2 3 N
10/31/2011 10 5 6 4 N
11/30/2011 11 1 2 3 N
11/30/2011 11 3 2 3 N
11/30/2011 11 4 5 3 N
11/30/2011 11 5 6 4 N
12/31/2011 12 1 2 3 N
12/31/2011 12 3 2 3 N
12/31/2011 12 5 6 4 N

pcs

Posts : 20
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  hang on Wed Jan 11, 2012 8:31 am

sgudavalli wrote:This is a snow flake structure where you are trying to normalize the product dimension for a set of reasons.
True, you also denormalise the hierarchy into the product dimension for a couple of good reasons, minimising the number of joins when you need to access the hierarchy and simplifying the ETL process dealing with three SCD2 dimensions and keeping them in synch. Could you elaborate for what set of reasons you should normalise the product dimension, or a dimension in general.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  Claudia_CR on Wed Jan 11, 2012 5:04 pm

Thank you all for your responses.

According to the recommendation of pcs:

"...any facts recorded at a transaction level will capture the relationships at a given time (or use a daily factless table for this) for the first #4 req
how about a monthly accumulating snapshot like this (captured the last day of each month) for the second #4 requirement..."

Sounds good, but this means that I have to include in all fact tables the relationship between category, subcategory and product at a given time, through references to the id's of each one?

About the other recommendation:

"requirements #1 & #2 & #3 can be handled with a multipass queries or perhaps leveraging a durable surrogate key to each dim"

I had initially thought of designing this way the relations:



If you observe I had added a "durable surrogate key" for each dimension, but I have the following doubts(considering the data from my first example):

1) Dim_Product uses the ID_Subcategory to specify the associated subcategory, and this does not represents all information about the data changes of any subcategory. For example:

- Sub_Id's 1 and 2 in Dim_Subcategory belong to Sub_cod = S1

- Prod_Id = 1 in Dim_Product belongs to Prod_cod = P1, which in turn belongs to the subcategory S1, but what Sub_id, 1 or 2? From 06/05/2011 until 10/25/2011 the Sub_id that corresponds is 1, and from 10/25/2011 until now the Sub_id that corresponds is 2, but in Dim_Product I have only one record, so I'll probably lose the relationship between Prod_Id = 1 and Sub_Id = 1.

2) Considering the above, for each new record in Dim_Subcategory I need to update the latest records in Dim_Product?

3) I have the same situation (1 and 2) with the changes in Dim_Category and as reflected in the subcategories.

Claudia_CR

Posts : 4
Join date : 2012-01-09

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  hang on Wed Jan 11, 2012 7:29 pm

Claudia, I think you have gone too far down the relational track. Could you tell us why you don't denormalise category and subcategory into a single product dimension and handle the hierarchical changes by SCD2. In dimensional modeling, you should think dimensionally, not relationally. That is, denormalise dimension tables and normalise fact tables.

There are some cases you should normalise dimension tables, but only if denormalisation dose not work. So the guideline is, try to denormalise the dimension, and if it works, don't even think about normalising it. I highly suggest to you to read about chapter 1, 2 in Kimball's dimensional modeling book, unless you want to use alternative methodologies for your DW project.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  pcs on Thu Jan 12, 2012 12:08 pm

I have to agree with Hang on this - make sure denormalization cannot work for you before pursuing other avenues...

pcs

Posts : 20
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  Claudia_CR on Tue Jan 17, 2012 12:31 pm

Thanks for your answers.

I understand that you say about denormalise category and subcategory into a single dimension and handle the hierarchical changes by SCD2.

But I am worried because every change in any data in my category, subcategory or product needs to be used in reports, these changes may occur at any time and independently, i.e., category data can be changed without the data from the subcategory or product change, or vice versa. If I design only a dimension that includes the tree entities I will need to create for every data category change "n" * "m" rows (n -> subcategories associated and m-> products associated to every subcategory) and for every data subcategory change "m" rows, so it could have a lot of rows and I dont know if this would be a good practice; besides my dimension table would have at least 60 attributes (columns) that would be the union of the attributes of the three entities.

Maybe my concerns are not unfounded, so I would appreciate if you can help in this regard.

On the other hand, about create a monthly accumulating snapshot like you proposed (captured the last day of each month), how it would relate this table to the dimension tables and fact tables?

Thanks again.

Claudia_CR

Posts : 4
Join date : 2012-01-09

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  hang on Tue Jan 17, 2012 11:45 pm

A typical retail business would stock tens of thousand products which should not be of any concern of explosive size because of SCD2 changes, even for the dimension with hunderes of thousand records. You only need to consider normalisation on a dimension when its potential size will go beyond millions which we call monster dimension.

Think of product dimension this way. Category and subcategory are just classifications of products. So it's logical to treat them as attributes of products. Any change in the category or sub category is the change of product attributes. So in a classic SCD 2 dimension, you add another record to your product dimension for the change, no matter if it is on a normal attribute or an attribute in a hierarchy. It's true that there would be a lot of data redundancy. But comparing with the complexity the single SCD dimension has resolved and the space it has saved in the fact table, it's a good tradeoff.

By the way, even if you normalise the relationship as suggested by sgudavalli, the total number of recrods incured by SCD 2 in product dimension will not reduce. The only data redundancy you would save is the attributes specific to category/subcategory. Again compared to 60+ attributes in product, that saving is ignorable.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  Vishy on Fri Mar 16, 2012 5:21 am

If there are too many attributes then you might want to create a coverage factless fact table, having time,product,cat and subcat details.


Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Relationship between scd hierarchies and then between fact tables

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum