# How to split the dimension members with SCD Type 2?

## How to split the dimension members with SCD Type 2?

Hi all,

I want to create dimension table, where will be some members

and each of them has its own history (so it has own versions in time),

where at one time is one and only one always valid.

So it is commonly solved by SCD Type 2.

For example DimTable:

SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid

1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0

2 | 1 | A | this is second version of A member | 2010201 | 99991231 | 0

3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1

4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0

5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0

6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1

Where:

- SurrKey = surrogate key = PK of DimTable;

- KeyDurable = durable surrogate key which identiffies each dimension member

(but not its concrete version);

- Code = natural key = business key from source table;

- StartDate = start date of validity of concrete version of dimension member;

- EndDate = end date of validity of concrete version of dimension member;

- IsValid = 1/0 like Yes/No = sign of current validity

There are 2 dimension members (A and B) and each of them has three versions

(first, second and third) and just now is valid the third version of them.

So till now this is the basic SCD Type 2 for dimension attributes like Description.

But my members will vary not only through its Description attribute,

but also it can split during time into more parts. Every dimension member

can split.

So here is my question: How can I do this?

For example A-third version (SurrKey=3) will some day

split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)

[these are two "normal" dimension members, each of them can have its own history,

but they "belong" by its origin to A member].

And then A2 will split into three "sons" like A2.1 (SurrKey=9),

A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.

So DimTable changes now into hierarchy table: so should I

add new column (ParentKeySurr or ParentKeyDurable ?? - which of them, or both of them??)

into DimTable like in ragged hierarchy?

Say I have some FactTable with 3 keys and one measure:

- DimSurrKey (FK into DimTable),

- DimKeyDurable (FK into DimTable),

- DateOfTransaction (date of transaction in source system where is stored Number value),

- Number (measure which is a result of some transaction of source system).

I would like to generate some reports, where I would like to see

some SUMS (for values of Number column of FactTable)

for A member like A1 member + A2 member or

for A2 member like A2.1 member + A2.1 member + A2.1 member.

Please how to solve this?

Thank you very much.

Mirek

I want to create dimension table, where will be some members

and each of them has its own history (so it has own versions in time),

where at one time is one and only one always valid.

So it is commonly solved by SCD Type 2.

For example DimTable:

SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid

1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0

2 | 1 | A | this is second version of A member | 2010201 | 99991231 | 0

3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1

4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0

5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0

6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1

Where:

- SurrKey = surrogate key = PK of DimTable;

- KeyDurable = durable surrogate key which identiffies each dimension member

(but not its concrete version);

- Code = natural key = business key from source table;

- StartDate = start date of validity of concrete version of dimension member;

- EndDate = end date of validity of concrete version of dimension member;

- IsValid = 1/0 like Yes/No = sign of current validity

There are 2 dimension members (A and B) and each of them has three versions

(first, second and third) and just now is valid the third version of them.

So till now this is the basic SCD Type 2 for dimension attributes like Description.

But my members will vary not only through its Description attribute,

but also it can split during time into more parts. Every dimension member

can split.

So here is my question: How can I do this?

For example A-third version (SurrKey=3) will some day

split into two "sons" like A1 (SurrKey=7) and A2 (SurrKey=8)

[these are two "normal" dimension members, each of them can have its own history,

but they "belong" by its origin to A member].

And then A2 will split into three "sons" like A2.1 (SurrKey=9),

A2.2 (SurrKey=10) and A2.3 (SurrKey=10). Etc.

So DimTable changes now into hierarchy table: so should I

add new column (ParentKeySurr or ParentKeyDurable ?? - which of them, or both of them??)

into DimTable like in ragged hierarchy?

Say I have some FactTable with 3 keys and one measure:

- DimSurrKey (FK into DimTable),

- DimKeyDurable (FK into DimTable),

- DateOfTransaction (date of transaction in source system where is stored Number value),

- Number (measure which is a result of some transaction of source system).

I would like to generate some reports, where I would like to see

some SUMS (for values of Number column of FactTable)

for A member like A1 member + A2 member or

for A2 member like A2.1 member + A2.1 member + A2.1 member.

Please how to solve this?

Thank you very much.

Mirek

**mirek.1**- Posts : 3

Join date : 2011-01-10

Location : Prague, Czech Republic, Europe

## Re: How to split the dimension members with SCD Type 2?

One comment: why do you store the so called DimKeyDurable in your fact table?

In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.

In case of dimension splitting you are right: your dimension table will have a new column: the surrogate key of the parent dimension.

**gvarga**- Posts : 43

Join date : 2010-12-15

## Re: How to split the dimension members with SCD Type 2?

Hi gvarga,

I want to have DimSurrKey and DimKeyDurable in FactTable

because it will be easier to get correct version of member for:

1)

in JOIN condition of SELECT

and I will get correct CHANGING historical

version of dimension member (depending on transaction date);

2)

AND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay

in JOIN condition of SELECT

and I will get STATIC version of dimension member

which was valid at 'calendar date' (and not at date of transaction).

User can specify 'calendar date' like his "

This attitude was described by Joy Mundy at its article

http://www.rkimball.com/html/articles_search/articles2007/07012IE.html

(but he uses operational natural key plus the Row Start Date

and I want to use durable surrogate key plus the Row Start Date,

but the principe is the same - using BETWEEN for date).

Mirek

I want to have DimSurrKey and DimKeyDurable in FactTable

because it will be easier to get correct version of member for:

1)

**as of**... I can use FactTable.DimSurrKey = DimTable.SurrKeyin JOIN condition of SELECT

and I will get correct CHANGING historical

version of dimension member (depending on transaction date);

2)

**as was**... I can use FactTable.DimKeyDurable=DimTable.KeyDurableAND some 'calendar date' BETWEEN dimTable.StartDay AND dimTable.EndDay

in JOIN condition of SELECT

and I will get STATIC version of dimension member

which was valid at 'calendar date' (and not at date of transaction).

User can specify 'calendar date' like his "

**reference point of view**" here.This attitude was described by Joy Mundy at its article

http://www.rkimball.com/html/articles_search/articles2007/07012IE.html

(but he uses operational natural key plus the Row Start Date

and I want to use durable surrogate key plus the Row Start Date,

but the principe is the same - using BETWEEN for date).

Mirek

**mirek.1**- Posts : 3

Join date : 2011-01-10

Location : Prague, Czech Republic, Europe

## Re: How to split the dimension members with SCD Type 2?

Hi Mirek,

I try to suggest the following star model:

Dim table

SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid

1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0

2 | 1 | A | this is second version of A member | 2010201 | 20100228 | 0

3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1

4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0

5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0

6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1

Datum table with some rows

DateKey__Datum

1________20100101

2________20100102

3________20100103

32_______20100202 ..

Fact table with some rows

DateKey__DimSurrKey___Fact

1________1_____________N1

2________1_____________N2

..

32_______2_____________N30

Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction

There will be 2 rows concerning the dim A

A.First version of A ________ N1+N2+

A.Second version of A ______N30

SELECT Dim.Description, sum(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')

and Fact.DateKey = Datum.dateKey

and Fact.DimSurrKey = Dim.SurrKey

GROUP by Dim.Description

Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code

A________N1+N2+..N30 (This time you will group the fact data for the Code).

SELECT Dim.code, sum(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')

and Fact.DateKey = Datum.dateKey

and Fact.DimSurrKey = Dim.SurrKey

GROUP by Dim.code

I try to suggest the following star model:

Dim table

SurrKey | KeyDurable | Code | Description | StartDate | EndDate |IsValid

1 | 1 | A | this is first version of A member | 2010101 | 20100131 | 0

2 | 1 | A | this is second version of A member | 2010201 | 20100228 | 0

3 | 1 | A | this is third version of A member | 2010301 | 99991231 | 1

4 | 4 | B | this is first version of B member | 2010101 | 20100131 | 0

5 | 4 | B | this is second version of B member | 2010201 | 20100228 | 0

6 | 4 | B | this is third version of B member | 2010301 | 99991231 | 1

Datum table with some rows

DateKey__Datum

1________20100101

2________20100102

3________20100103

32_______20100202 ..

Fact table with some rows

DateKey__DimSurrKey___Fact

1________1_____________N1

2________1_____________N2

..

32_______2_____________N30

Query1: you should sum facts from 20100101 to 20100202 by Dimensions as they were in the time of the transaction

There will be 2 rows concerning the dim A

A.First version of A ________ N1+N2+

A.Second version of A ______N30

SELECT Dim.Description, sum(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')

and Fact.DateKey = Datum.dateKey

and Fact.DimSurrKey = Dim.SurrKey

GROUP by Dim.Description

Query2: you should count facts from 20100101 to 20100228 by Dimensions and you want to report for the common Code

A________N1+N2+..N30 (This time you will group the fact data for the Code).

SELECT Dim.code, sum(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum between to_date('20100101', 'YYYYMMDD') and to_date('20100202', 'YYYYMMDD')

and Fact.DateKey = Datum.dateKey

and Fact.DimSurrKey = Dim.SurrKey

GROUP by Dim.code

**gvarga**- Posts : 43

Join date : 2010-12-15

## Re: How to split the dimension members with SCD Type 2?

Do a self join on the member dimension on the KeyDurable. Make sure you filter the desired date to be between the Start Date and End Date.

**Jeff Smith**- Posts : 471

Join date : 2009-02-03

## Re: How to split the dimension members with SCD Type 2?

Thank you very much to you gvarga for fully concrete example! It follows

So it suits to standard, natural,

And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date

which can user select like other, ad hoc,

If I use Jeff advice to gvarga example I will get this solution:

SELECT Dim2.Description, SUM(Fact.fact) -- here is Description column for second Dim table

FROM Dim dim, Fact fact, Datum datum, Dim dim2 -- here is second Dim table

WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')

AND Fact.DateKey = Datum.dateKey

AND Fact.DimSurrKey = Dim.SurrKey

AND dim.KeyDurable = dim2.KeyDurable -- here is JOIN condition for self join

AND to_date ('20100215', 'YYYYMMDD') -- here is date of user choice (for second version of A)

BETWEEN dim2.StartDate AND dim2.EndDate

GROUP BY Dim2.Description

And it works nice - it returns one row: total sum (N1+N2+...+N30) with description for

And if I will change '20100215' to '20100105',

I will get one row: total sum (N1+N2+...+N30) with description for

But I have one comment to it.

if I add DimKeyDurable column into Fact table like second foreign key to Dim table:

and number 4 for B member

Then I can reformulate SELECT into simplier form like this:

SELECT Dim.Description, SUM(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')

AND Fact.DateKey = Datum.dateKey

AND Fact.DimKeyDurable = dim.KeyDurable

AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate

GROUP BY Dim.Description

So

User can also

instead of some concrete date (like '20100215').

But there are two minuses:

1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.

2) ETL process will be a little more complicated.

But all we have solved up to this moment was only the beginning, because

we have concerned only

but

The dimension member with code 'A' was persisted during changing of its Description column.

But I am interested

into say two dimension members (like A1 and A2) and later splitting A2 into three dimension members

(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)

for A + A1 + A2 etc., because

This is more complicated task and I think it can be solved only with help of some sort of hierarchy,

maybe with

maybe

Have you any next idea?

Thanks in advance.

Mirek

**classic SCD Type 2**.So it suits to standard, natural,

**"historical", transaction date**, and it works very nice.And thank you very much to you Jeff. Your advice solves "artificial" date, that is the date

which can user select like other, ad hoc,

**"reference" date**of interpretation od dimension members.If I use Jeff advice to gvarga example I will get this solution:

SELECT Dim2.Description, SUM(Fact.fact) -- here is Description column for second Dim table

FROM Dim dim, Fact fact, Datum datum, Dim dim2 -- here is second Dim table

WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')

AND Fact.DateKey = Datum.dateKey

AND Fact.DimSurrKey = Dim.SurrKey

AND dim.KeyDurable = dim2.KeyDurable -- here is JOIN condition for self join

AND to_date ('20100215', 'YYYYMMDD') -- here is date of user choice (for second version of A)

BETWEEN dim2.StartDate AND dim2.EndDate

GROUP BY Dim2.Description

And it works nice - it returns one row: total sum (N1+N2+...+N30) with description for

**SECOND**version of A.And if I will change '20100215' to '20100105',

I will get one row: total sum (N1+N2+...+N30) with description for

**FIRST**version of A.But I have one comment to it.

__I can avoid using self join in SELECT__if I add DimKeyDurable column into Fact table like second foreign key to Dim table:

__DimKeyDurable will be constant__for january, february etc - it will be number 1 for A memberand number 4 for B member

Then I can reformulate SELECT into simplier form like this:

SELECT Dim.Description, SUM(Fact.fact)

FROM Dim dim, Fact fact, Datum datum

WHERE datum.Datum BETWEEN to_date('20100101', 'YYYYMMDD') AND to_date('20100202', 'YYYYMMDD')

AND Fact.DateKey = Datum.dateKey

AND Fact.DimKeyDurable = dim.KeyDurable

AND to_date ('20100215', 'YYYYMMDD') BETWEEN dim.StartDate AND dim.EndDate

GROUP BY Dim.Description

So

__SELECT is easier to write__and - I hope - will be rapidly executed.User can also

__imitate SCD Type 1__, if he uses 'TODAY' (in Oracle SYSDATE)instead of some concrete date (like '20100215').

But there are two minuses:

1) Fact table will be increased by using of durable key (DimKeyDurable) = 4 bytes (INT) for every dimension.

2) ETL process will be a little more complicated.

But all we have solved up to this moment was only the beginning, because

we have concerned only

**process of changing**of dimension attributes (Description column)but

__the identity of dimension member was solid (unchanged)__for all the time:The dimension member with code 'A' was persisted during changing of its Description column.

But I am interested

__how to solve potentional__(like 'A')**process of splitting**of one member dimensioninto say two dimension members (like A1 and A2) and later splitting A2 into three dimension members

(like A2.1, A2.2 and A2.3) etc. Then I want that my SELECT will sum measures columns from Fact table (N1 + N2 + ...)

for A + A1 + A2 etc., because

__A1 and A2__, A1 and A2 are its "sons".**"belongs"**to AThis is more complicated task and I think it can be solved only with help of some sort of hierarchy,

maybe with

**ragged hierarchy**- so I will be forced to add new column into Dim table,maybe

**SurrKeyParent**(or**KeyDurableParent**??). But how to continue with creating of SELECTs?Have you any next idea?

Thanks in advance.

Mirek

**mirek.1**- Posts : 3

Join date : 2011-01-10

Location : Prague, Czech Republic, Europe

Similar topics

» The HSP Dimension is 4 years old today!

» Gallery (Members)

» I like big butts and I cannot lie

» User Guide and Handbook - A must Read.

» Ebay My Vehicles sweepstakes ends 3/15/12 *Usa ebay members only*

» Gallery (Members)

» I like big butts and I cannot lie

» User Guide and Handbook - A must Read.

» Ebay My Vehicles sweepstakes ends 3/15/12 *Usa ebay members only*

Page

**1**of**1****Permissions in this forum:**

**cannot**reply to topics in this forum