Junk Dimension Implementation

View previous topic View next topic Go down

Junk Dimension Implementation

Post  bulu on Tue Sep 16, 2014 1:36 am

Hi,

I am new to dimensional modeling. I have been asked to implement a junk dimension on a fact table that has too many char/varchar columns.

I plan to move many of those columns that are of the type 'flags' or 'status indicator' into the junk dimension table.

The problem however is that many of these columns that I plan to move to junk dimension, are used by ETL jobs when populating the fact table. For example, the source data coming from source system is left joined with the existing fact table to decide what to load to the fact table and some of the columns I plan to move to junk dimension are part of the join, and some are used to filter data further.

Now, if I go ahead with this implementation, it will require significant changes to existing ETL jobs. Even then, how is it to be done? Do I first join the junk dimension with the fact table to make it as it was before and then compare with the source system data?

Regards,
Bulu

bulu

Posts : 3
Join date : 2014-09-15

View user profile

Back to top Go down

Re: Junk Dimension Implementation

Post  nick_white on Tue Sep 16, 2014 12:24 pm

Hi - your existing design sounds odd (not that it is necessarily wrong as there are always reasons for building what might be considered non-standard designs). Why are you joining back to the source data to determine what to load into a Fact table? A Fact is usually an event so when it occurs in your source system you just create a new record in your fact table - so your logic would all be in your system as to whether to create a fact record rather than having to refer to your target.
Comparing source and target is normal for Dimension tables, but not for Facts.

Can you provide a bit more detail around what the Fact record actually holds, why you have to compare it to the source, what these junk columns are that are being used in the ETL process, etc?

Thanks,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Junk Dimension Implementation

Post  ngalemmo on Tue Sep 16, 2014 1:49 pm

Yeah the design sounds odd… but whatever…

The fundamental principle you need to follow is to conform your source data to the fact before you compare the two. That means you need to resolve the surrogate keys. Which won't work if what you are trying to do is detect changed rows. Rows that have changed would have different keys and you will not be able to find the row you are trying to check.

Things would be a lot better if you can identify a stable PK for the fact.

Another approach would be to maintain a source compatible version of the facts in staging and use that as a point of comparison. This staging table need not contain all facts, just those rows for a period of time suitable to cover when such changes may occur.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

reply

Post  bulu on Fri Sep 19, 2014 1:41 am

Hi,

Thanks for the response. Yes, the design is not perfect and it was done with some time constraint.

It's basically a timesheet application where people enter the hours worked and related codes.

The source system transaction table gets records for new entries and sometimes existing entries can be updated too along with a timestamp that says when it wwas updated. Currently ETL jobs fetch the new and changed records with the changed timestamp.

The reason those are joined with the existing fact table in the target system is that, apart from inserting the changed records into the fact table, ETL jobs look for in the existing fact table the older versions of the records that got changed in the source system and mark them as inactive by updating a column. So for the changed records from source system, we keep the history in the target fact table.

Now, if some columns were used in that join, is it alright to move them to a junk dimension? If we can then how to implement it?

Regards,
Bulu


bulu

Posts : 3
Join date : 2014-09-15

View user profile

Back to top Go down

Re: Junk Dimension Implementation

Post  ngalemmo on Mon Sep 22, 2014 11:47 am

But what you have is an ETL solution disguised as a query environment. Have a persistent staging area of transition information that is essentially a model of the source data, and use it to detect differences (change) between versions. Record the time of the change in the fact and store the difference.

The net change approach allows the fact table to be write-only during load. Fact rows are not updated. This allows the user to view historical values based on a point of view, as transaction time and update (to DW) time are both available as dimensions.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Junk Dimension Implementation

Post  bulu on Thu Sep 25, 2014 6:50 am

Hi Nick,

Thanks for your comments. I may not have the mandate to change the existing design at this stage, but I am wondering about what you said below.

"Fact rows are not updated. "

Are you saying that any updating of the Fact rows, instead of an insert, is not a good practice?


Regards,
Bulu

bulu

Posts : 3
Join date : 2014-09-15

View user profile

Back to top Go down

Re: Junk Dimension Implementation

Post  ngalemmo on Thu Sep 25, 2014 11:15 am

No, I am not saying that. I was offering an alternative approach to your situation. From what you describe, it appears the crux of your problem is you do not have a proper primary key in the fact table to perform the updates.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Junk Dimension Implementation

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum