SCD Type 1 and 2

View previous topic View next topic Go down

SCD Type 1 and 2

Post  rcb4561 on Thu Feb 11, 2010 3:34 pm

Would it be wise to design all SCDs with 2 surrogate keys? Let's take a customer dimension for example. There would be one SK to identify the customer and another SK to identify the type 2 records for a customer. Then we would use both keys on all fact tables. This provides the ability to report on facts using the historical or current values for the customer dimension.

This approach seams more flexible. The disadvantages are increased columns and indexing on the facts, and separate objects for history vs. current (due to different joins) in the query tool.

I am asking this as it seams like a good approach, but I have not seen this approach reccomended by the Kimball Group. Rather, the reccomended approach appears to be to add the current columns to all type 2 records as needed.

rcb4561

Posts : 1
Join date : 2010-02-11

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  BoxesAndLines on Thu Feb 11, 2010 5:45 pm

That's how I do it when I need current and historical views of the dimensions.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Interesting Approach but not All Clear

Post  surfinwind on Thu Feb 11, 2010 7:24 pm

That is an interesting approach, but I am not able to understand its logic...

Would such an SCD look like this? Where SK2 is updated every time when there is a new SK1?

SK1 SK2 Current_Row Value
1 3 N v1
2 3 N v2
3 3 Y v3

If so, do you update all fact data for that changed dimension entity?

Currently, I just add Type1 attribute for a Type 2 attribute that requires current value analysis of its historical value.

puzzled

surfinwind

Posts : 6
Join date : 2009-02-03

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  mar78 on Sun Feb 14, 2010 9:19 pm

As i understood from what Kimball Uni recommends: there is no need for your 1st Sk because it's the natural key or the production key of the customer which is the "glue" hlods the historical records of the same customer together. while your 2nd Sk is the type 2 SCD record key (primary key).

But i am asking a question here: how do we manage a "natural key" reassignment in the source systems ?

let's assume such a product dimension :

SK Product_SKU description
1 1 hammer

Following the Kimball Uni recommendations, this reassignment would only be seen as a simple attribute change for SCD 2 and therefor a new row with a new SK is gonna be added.

SK Product_SKU description
1 1 hammer
2 1 pen

if we ever want to have an historical view of the product just by contraining the product_SKU field, we would be misled since the hammer and the pen are diffrent products.

Does anyone have an idea how to deal with that ?

mar78

Posts : 3
Join date : 2010-02-14

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  BoxesAndLines on Mon Feb 15, 2010 10:51 am

You have to hope the IT and business folks are smart enough not to reuse identifiers.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  mar78 on Mon Feb 15, 2010 12:15 pm

BoxesAndLines wrote:You have to hope the IT and business folks are smart enough not to reuse identifiers.

Yeah sure, but unfortunately hope by it's own dosn't prevent us such a burden.

According to kimball, it's even one of the reasons that makes us not to trust natural keys and opt for surrogate ones.

here (you can check the 6 points answer to why should we have surrogates intstead of naturals)

Reusing the natural keys should be predicted in some cases.

And i guess the fact of having 2 surrogate keys for a dimension just like mentionned above in the beginning of this thread could be the solution to that if in the source table we have an exsisting field that indicates the date of creation of a row. That way if we have a match in the source systems of the natural key of our dimension, we have to take a look to that creation date field, if it's a prior date to the last ETL run then we have an SCD. If not, then we have a NEW dimension with a reused natural key.

The 1st SK is the primary key of the dimension. The 2nd is the ID of the dimension that matches all the historical versions of it.

What do you think ?

mar78

Posts : 3
Join date : 2010-02-14

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  BoxesAndLines on Mon Feb 15, 2010 6:39 pm

I think you've got it. New natural key, new surrogate key. Update to existing natural key, insert new row with new surrogate key, keep the current_row surrogate key the same.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

SCD 1 and SCD2(Natural vs surrogate key)

Post  kasi on Tue May 04, 2010 3:43 pm

Based on below can I assume that link (The 2nd is the ID of the dimension that matches all the historical versions of it) 2nd id and create date (composite key) to link with fact table to extract dimension table data to report hisory?
I have a situation where we are creating data warehousing for multi tenant based systems.
Where HR module need to report historical data of employees based on dimensional level changes ex: Deparment names as sales on 1/1/2010, changed to finance 2/1/2010.Business requirement to report same employee as sales before 1/1/2010 and as finace from 2/1/2010.

I am planning to design as given below
Dept_dim (Key - ID+Effective_date)
ID Dept_Nm Effctive_date Exp_date
10 Sales 1/1/210 1/31/2010
10 finace 2/1/210 2/1/9999

Fact
ID Employee_ID Dept_ID Dept_ Effctive_date
1 E1 10 1/1/2010

I thought of using join of fact.dept_id=dept_dimension.dept_id in BI modeling.When I report based on <=1/1/21 date,showing department as sales, >=2/1/2010 shown finance.

Please advise your thoughts on good design of scds to do historical reporting without inserting repeated records into fact.Appriciate your help.

kasi

Posts : 2
Join date : 2010-03-03

View user profile

Back to top Go down

SCD 1 and SCD2(Natural vs surrogate key)

Post  da_2030 on Tue May 11, 2010 2:37 pm

For historical reporting ( point in time ) use
fact.dept_id=dept_dimension.dept_id AND
fact.Dept_ Effective_date = dept_dimension.Effective_date

For latest reporting use
fact.dept_id=dept_dimension.dept_id AND
dept_dimension.Exp_date='2/1/9999'

assuming 2/1/9999 is your highest exp_date ( we normally use 12/31/9999 as exp_date )

da_2030

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  ngalemmo on Tue May 11, 2010 3:15 pm

The two key method is ok. Another approach to consider are two tables, one type 1 and the other type 2. This is a better solution when the number of attributes they wish to maintain type 2 history is less than the total number of attributes. The type 1 table would contain all attributes while the type 2 would only contain those of historical significance. It has the same advantages as the two key method with the added benefit of a much smaller type 2 table and faster/simpler update processes.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 1 and 2

Post  da_2030 on Wed May 12, 2010 10:31 am

Or In case of Oracle we can put historical and latest records on separate partitions for performance reasons.

da_2030

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  ngalemmo on Wed May 12, 2010 12:57 pm

da_2030 wrote:Or In case of Oracle we can put historical and latest records on separate partitions for performance reasons.

Not really sure that buys you much. If you partition, you want to use criteria that is stable (so an update doesn't cause the row to move to a different partition) and gives you a fairly even distribution. If you have Star Optimization turned on, Oracle is going to access the dimension table in one of two ways: either a scan to resolve query predicates or by the primary key from the fact FK. In the case of the scan, having the data evenly distributed is best. In the case of a PK reference, paritioning doesn't help much at all.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 1 and 2

Post  da_2030 on Wed May 12, 2010 1:38 pm

ngalemmo wrote:
da_2030 wrote:Or In case of Oracle we can put historical and latest records on separate partitions for performance reasons.

Not really sure that buys you much. If you partition, you want to use criteria that is stable (so an update doesn't cause the row to move to a different partition) and gives you a fairly even distribution. If you have Star Optimization turned on, Oracle is going to access the dimension table in one of two ways: either a scan to resolve query predicates or by the primary key from the fact FK. In the case of the scan, having the data evenly distributed is best. In the case of a PK reference, paritioning doesn't help much at all.


If I need to query on latest partition I can put additional filter
WHERE expiration_date = '31-DEC-9999'

This will result in Oracle choosing PARTITION RANGE SINGLE picking only latest partition as opposed to PARTITION RANGE ALL

Also when joining on PK I can choose Historical partition by trying additional filter
WHERE expiration_date < '31-DEC-9999'

Also individual partition can be selected using table_name PARTITION ( partition_name ) clause in FROM.

da_2030

Posts : 5
Join date : 2010-05-10

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  ngalemmo on Wed May 12, 2010 3:53 pm

Understood, but I don't believe an arrangement like that would significantly improve performance.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 1 and 2

Post  hang on Thu May 13, 2010 9:10 am

having two keys in fact table requires updates on the fact table whenever the dimension has changed. I would create a view on the fact table having a logical dimension key for current view by joining the dimension on business key with SCD status set to true, so that there is no extra undesirable updates on the fact table in your ETL.

In this construct, the dimension is a role playing dimension on the fly, and any historical aggregates on the physical key (SCD2) are still valid, however any aggregates on the logical key will need to be reprocessed. Cube process will pick it up automatically if configured in OLAP server.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: SCD Type 1 and 2

Post  ngalemmo on Thu May 13, 2010 11:52 am

having two keys in fact table requires updates on the fact table whenever the dimension has changed.

No, thats not the way it works. The dimension has two keys, the primary key, which is the type 2 key and a constant type 1 key and a current flag. When a new entry is created, the type 1 and type 2 keys are the same. When the dimension is updated, the type 2 key changes and the type 1 key value is carried over from the previous current record.

The fact table would contain both the type 1 and type 2 keys. Since the type 1 key never changes (query is qualified by the current flag) you do not need to update the facts. This techique avoids the natural key self join in the dimension table that would be necessary if the fact table only contains a type 2 key. For aggregates you would use either the type 1 key, type 2 key or both depending on the nature of the aggregation.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD Type 1 and 2

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum