Modified SCD 2 implementation

View previous topic View next topic Go down

Modified SCD 2 implementation

Post  Abhiraizada on Fri Jul 05, 2013 3:22 am

We have currently implemented conventional SCD 2 dimensions in our model. Following represent the sample data resulting in our DWH enviroment -

Patient dimension having 2 versions for one patient -

CFDM Patient Dimension

PT_KEY   NAME    GENDER     DOB      PT_ID     CURRENT FLAG
1           JOHN       NULL    3 JULY'13   PT1             N
2           JOHN        M       3 JULY'13   PT1             Y

Med. Administration records for patients -

CFDM Med Admin Fact

Med Admin Key     PT_KEY     LOCATION_KEY      PROVIDER_KEY       CONCEPT_KEY      ATTR_1       ATTR_2        ATTR_N       CURRENT FLAG
    1                      1                   1                      1                         1                    X1                                                     Y
    2                      1                   1                      1                         1                    X2                                                     Y
    3                      1                   1                      1                         1                    X1              X3                                    Y
    4                      2                   1                      1                         1                    X2              X4                                    Y

In the above example, if we write a query to retrieve number of drug administration records for all male patients will result in complex query where we will create a view on top of patient dimension which will then join to med admin fact. This approach is not scalable if we have huge volume of records and queries are being fired dynamically.

Alternative approach simplifies the data retrieval queries and also maintains history of changes to the record. The proposed approach will require composite primary key for dimension consist of – Surrogate key and Version Number.

CFDM Patient Dimension

PT_KEY    VERSION NUM              NAME    GENDER     DOB      PT_ID     CURRENT FLAG
1               1                             JOHN       NULL    3 JULY'13   PT1             N
1               2                             JOHN        M       3 JULY'13   PT1             Y

And relationship between dimension and fact will be created on the basis of this composite primary key –

CFDM Med Admin Fact

Med Admin Key     PT_KEY     PT_VERSION_NUM    LOCATION_KEY      PROVIDER_KEY       CONCEPT_KEY      ATTR_1       ATTR_2        ATTR_N       CURRENT FLAG
    1                      1                   1                       1                      1                         1                    X1                                                     Y
    2                      1                   1                       1                      1                         1                    X2                                                     Y
    3                      1                   1                       1                      1                         1                    X1              X3                                    Y
    4                      1                   2                       1                      1                         1                    X2              X4                                    Y

Above design will help us create simplified queries avoiding complex recursive views. In this approach surrogate key for patient table does not change with every change in the record instead version number keeps getting incremented with each record change resulting single/unique surrogate key across multiple versions. This helps us in writing queries where all med admin records needs to be analyzed/viewed against latest instance of patient and also retains the capability to do conventional BI analytics of SCD type 2.

Similar approach is suggested in one of the design tips by Kimball - http://www.kimballgroup.com/2013/02/05/design-tip-152-slowly-changing-dimension-types-0-4-5-6-7/ ( refer last section describing Type 7).

In this approach fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable/pseudo supernatural key linked to the current row in the type 2 dimension to present current attribute values. This results in following modification to Patient dimension and Med administration fact.

CFDM Patient Dimension

PT_KEY    PSEUDO_KEY                NAME    GENDER     DOB      PT_ID     CURRENT FLAG
1               1                             JOHN       NULL    3 JULY'13   PT1             N
2               1                             JOHN        M       3 JULY'13   PT1             Y

CFDM Med Admin Fact

Med Admin Key     PT_KEY     PSEUDO_PT_NUM    LOCATION_KEY      PROVIDER_KEY       CONCEPT_KEY      ATTR_1       ATTR_2        ATTR_N       CURRENT FLAG
    1                      1                   1                       1                      1                         1                    X1                                                     Y
    2                      1                   1                       1                      1                         1                    X2                                                     Y
    3                      1                   1                       1                      1                         1                    X1              X3                                    Y
    4                      2                   1                       1                      1                         1                    X2              X4                                    Y

We are planning to move from conventional SCD 2 implementation approach to either of the above suggested approach. It will be helpful if i can get any expert advise before I go ahead and do any changes.

Thanks in Advance
Abhishek Raizada

Abhiraizada

Posts : 20
Join date : 2011-05-24

View user profile

Back to top Go down

Re: Modified SCD 2 implementation

Post  BoxesAndLines on Fri Jul 05, 2013 10:40 am

Yes I have done this in the past and it works well.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Modified SCD 2 implementation

Post  LAndrews on Fri Jul 05, 2013 3:07 pm


I find the dual key approach increases the size of the fact table unneccessarily, and the joins become more complex as you need to include the current_flag='Y' criteria to get the correct results.

My preference is to add a new type-1 attribute to the dimension (current_pt_key). This allows for easy snowflaking when current attributes are required.


LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Re: Modified SCD 2 implementation

Post  Abhiraizada on Mon Jul 08, 2013 1:10 am

LAndrews wrote:
I find the dual key approach increases the size of the fact table unneccessarily, and the joins become more complex as you need to include the current_flag='Y' criteria to get the correct results.

My preference is to add a new type-1 attribute to the dimension (current_pt_key). This allows for easy snowflaking when current attributes are required.


I think you are suggesting adding CURRENT_DIMENSION_KEY attribute to the dimension table and use it when ever needed. I understand this will not impact the size of fact table but still will not reduce the query complexity, as every time I want to analyze fact data corresponding to latest instance of dimension I need to make recursive join ( the same is needed in conventional type 2 implementation) also it will considerably impact the ETL for dimension tables as every time new version of dimension record is created we need to update this CURRENT_KEY column to current surrogate value.

Abhiraizada

Posts : 20
Join date : 2011-05-24

View user profile

Back to top Go down

Re: Modified SCD 2 implementation

Post  Abhiraizada on Mon Jul 08, 2013 1:10 am

BoxesAndLines wrote:Yes I have done this in the past and it works well.

Thanks this give me some confidence !!!

Abhiraizada

Posts : 20
Join date : 2011-05-24

View user profile

Back to top Go down

Re: Modified SCD 2 implementation

Post  ngalemmo on Mon Jul 08, 2013 5:22 am

I've used the dual key approach in the past and it works fine. It also gives you the option of having both a type 1 and type 2 version of the dimension should you so desire. It can help performance if most queries only need current values.

The version numbers don't work as well. It is essentially the same thing except your dimension load processes are more complicated than they need to be.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Modified SCD 2 implementation

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum