Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Modeling Dimensional Parent-Child Relations

3 posters

Go down

Modeling Dimensional Parent-Child Relations Empty Modeling Dimensional Parent-Child Relations

Post  mostafa_mahrous75 Wed Feb 25, 2015 4:10 am

We have a Data Modeling Case where there’s a CUSTOMER Dimension and Account Dimension and both of these two dimensions are SCD Type2. Typically a customer may have multiple accounts but each account belongs ALWAYS to a specific customer
CUSTOMER and ACCOUNT Dimension are referenced by TRANSACTIONS Fact Table. The issue is that we can track the relation between a customer and his account only at the time that a transaction occurs, we need to track that relation regardless of the transactions
We’re thinking of 2 options:

1- Having a Bridge Table which is joined to CUSTOMER and ACCOUNT Dimensions using Natural Keys of both Dimensions (Customer Number, Account Number)

2- Embedding the Customer Natural Key (Customer Number) into Account Dimension as a Snowflake

To get the relation at a specific point of time, we need to filter both of CUSTOMER and ACCOUNT dimension using ACTIVE_FROM_TIMESTAMP and ACTIVE_TO_TIMESTAMP columns

Our question is what is the Pros and Cons of both options, and whether you have any other ideas

mostafa_mahrous75

Posts : 8
Join date : 2011-11-20

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  ngalemmo Wed Feb 25, 2015 4:35 am

"The issue is that we can track the relation between a customer and his account only at the time that a transaction occurs, we need to track that relation regardless of the transactions"

That is exactly what type 2 dimensions are supposed to do.

Your question really has to do with the versions of each entity, not the relationship between entities.

When you design a type 2 dimension you include a 'durable' key, one that does not change across versions of the entity.  This may be the natural key or a surrogate (the latter being more efficient as a key).

The basic way to use a type 2 in your situation would be to perform a self join on the dimension using the durable key to locate the particular version of the entity you wish to use.  Typically this is the 'current' version, but it is also possible to choose a version at a particular point in time.

If you don't like self-joins, an alternate method is to store the durable key on the fact as well (always use a surrogate).

In a bridge you would use the durable keys, so the bridge does not change as entities change (again, always use a surrogate).
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  mostafa_mahrous75 Wed Feb 25, 2015 5:56 am

In case of using the surrogate keys in the Bridge Table, then we’ll have to include ACTIVE_FROM_TIMESTAMP and ACTIVE_TO_TIMESTAMP column in the Bridge Table which will lead to that whenever CUSTOMER OR ACCOUNT dimensions is affected by Type 2 Change, the same effect should be propagated to the Fact record as well, so that ACTIVE_TO_TIMESTAMP column in the old fact record is updated to the change time and a new Fact record is created with ACTIVE_FROM_TIMESTAMP column is set to the change time and ACTIVE_TO_TIMESTAMP is set to infinity. This will surely add a burden to ETL

However, if use Natural Key instead of Surrogate keys, we’ll avoid reflecting Type 2 changes to the Fact Table (of course we must filter both of Account and Customer Dimensions to a specific point of time to get the corresponding versions together)

mostafa_mahrous75

Posts : 8
Join date : 2011-11-20

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  ngalemmo Wed Feb 25, 2015 1:21 pm

No. A durable key is the same as a type 1 key. It has nothing to do with changes to the entity. If you have an active date range it would indicate the time the customer-account relationship existed, not when something about the customer or account changed.

The dates that relate to when a particular version of a customer or account is in effect is stored in the respective dimension table. Not on bridges or facts.

Don't think that a natural key is any different from a surrogate key. You can have type 2 surrogate keys or type 1 surrogate keys. A type 1 key is 1:1 with the natural key. That is the durable key I am referring to. It is an alternate non-unique key to the dimension table. You use the effective timestamps, also carried in the dimension table, to locate a specific row. It is the exact same thing as a natural key, but with additional benefits (smaller, more efficient, stable). Storing the type 1 surrogate on the fact is optional. It is not necessary to handle your situation (i.e. you do a self join on the dimension).
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  hang Wed Feb 25, 2015 6:27 pm

Durable key (DK) is more thoughtful as suggested by ngalemmo, however if your natural key (NK) never gets reused, you may just refer to natural key as durable key for simplicity.

Your option 1 is good for tracking the change of Customer-Account relationship. It would be very efficient to get the current version of both dimensions with current flag bitmap/columnar indexed, but needs to date-range both dimensions to get the historic version. However, surrogate key (SK) in bridge is mainly focused on historic version by straight joins. The approach could grow the bridge alarmingly if any relevant dimension is type 2 frequently changing dimension.

Option 2 only works if an account will never be shared by more than one customers. However you still have the dilemma of NK vs. SK. Since account dimension is likely big, you may just store customer NK in the account dimension, if current customer version is your main focus, for the same reason in bridge scenario. Alternatively, you may include both customer SK and NK in account dimension to cater for both purposes, but need to have type 2 response on account in case of type 2 customer change.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  ngalemmo Wed Feb 25, 2015 6:57 pm

If you are doing a self-join to locate a version of an entity, the natural key is fine. However, if you plan to propagate it to bridges or facts it raises a number of issues. That is why I strongly recommend a surrogate durable key.

It's pretty easy to set up. You set it to the dimension row's primary key when you add the entity for the first time, then simply copy it for any new version of the entity you create.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Modeling Dimensional Parent-Child Relations Empty Re: Modeling Dimensional Parent-Child Relations

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum