Business keys or Natural keys in the Fact table

View previous topic View next topic Go down

Business keys or Natural keys in the Fact table

Post  ParuD on Fri Sep 16, 2011 5:32 am

As I have read from Kimball methodology the fact table always contains surrogate keys of the dimensions and the measures. My take away on this is when the reporting is done the entry point to the facts is always through dimensions that joins on the surrogate keys in the facts. I was under the impression that we don't need the business keys at all in the fact table.

But very recently I have seen few architects who say there is advantage in keeping the business keys in the fact table as it sometimes avoid joins to the dimensions in the reports. I am not very clear under which scenarios would this happen.

Can you please help me to understand in which scenarios are the business keys included in the fact table and the advantages of doing so?

ParuD

Posts : 2
Join date : 2011-09-13

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  BoxesAndLines on Fri Sep 16, 2011 9:27 am

These people are leading you astray. Joins are not bad. Wide fact tables are bad. First they add the natural key columns, then it will be another column and pretty soon you have 300 columns in your fact table of which 50% is dimension data.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  KS_EDW on Fri Sep 16, 2011 10:06 am

Couple of things…

We maintain some keys for quick reference during deletes. Many times the production system will delete a row for legal reasons (HR, Law Enforcement, courts, etc.). Those need changed immediately in EDW so, having the keys helps us find and delete the fact row within minutes of notification.

Sometimes the attributes alone are not enough to maintain the grain of the fact table, thus the natural key is maintained. (This is extremely very really rare but does happen.)

Our EDW supports large extracts and db links for external systems (organization oversight systems (regulatory), contractors, etc.). In many cases, we need to pass them our business keys. Having those in the fact facilitates that requirement.

Most BI tool semantic/metadata layers will allow a fact to fact connection – using the natural keys is tempting. I would NOT recommend doing this because it creates exceptions in an otherwise standard methodology, i.e. “all these hundreds of facts use a dimension to communicate but these two are the exception”.

Honestly, I would not recommend carrying the key in the fact table as a means to connect the fact tables… that method has “potential regret” written all over it. BUT… with that said, in your situation… it may work. (clear as mud… right? Add to that… BoxesAndLines is correct… they will keep pushing.)

In our shop we avoid it where we can but… it is most important that we meet our user’s requirements… if the users say, “key” (with good reason) we say “yes”.
avatar
KS_EDW

Posts : 20
Join date : 2011-09-07
Age : 42
Location : Kansas

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  ngalemmo on Fri Sep 16, 2011 4:05 pm

The only acceptable reason to put a natural key into a fact table is the case of a degenerate dimension.

Storing the natural key in the fact table, which also appears in a dimension table with the appropriate surrogate FK from the fact, does not improve one's ability to identify a fact row and only serves to hamper performance for no real gain.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  John Simon on Sun Sep 18, 2011 7:29 pm

The only reason you may want to store a natural business key in the fact table is if you need to see the latest version for that particular dimension. This should be very rare - see here for examples of performance using natural keys vs surrogate keys:
http://jsimonbi.wordpress.com/2011/09/24/tracking-history-with-slowly-changing-dimensions/


Otherwise you are simply affecting performance.

Here's a link to an post on my blog that shows the performance impact of storing the natural key and other non-foreign keys in the fact table:
http://jsimonbi.wordpress.com/2011/05/16/dimensional-modeling-worst-practices/

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Business / Natural Key in Fact Table

Post  pitbull mix on Wed Oct 24, 2012 11:03 am

ngalemmo wrote:The only acceptable reason to put a natural key into a fact table is the case of a degenerate dimension.

Storing the natural key in the fact table, which also appears in a dimension table with the appropriate surrogate FK from the fact, does not improve one's ability to identify a fact row and only serves to hamper performance for no real gain.

Hello,

You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.

In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.

All of this relates to keeping the fact table up to date. I'm finding it very useful to place a natural key in the fact table, but am open ideas or another approach to staying in synch with the source systems. Also, this business key is not used in a dimension that connects to the fact table.

Thanks!


pitbull mix

Posts : 8
Join date : 2012-06-11

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  LAndrews on Wed Oct 24, 2012 1:06 pm

In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.

What you are describing here is restating the fact .... dangerous road.

The absence event occured in school A. That is the fact.

When the student transfers to school B, any new absences occur in school B. That is the fact.

If you want to report all the students absences as if they occured in school B, there are many reporting techniques to do this (i.e. reporting using the current school).


LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  pitbull mix on Wed Oct 24, 2012 2:29 pm

LAndrews wrote:
In my scenario I record absence events for students for various source systems. Changes occur when a student transfers from one school to another meaning the absence event is now related to where the student is currently. I don't want to delete the event/fact, but do want to acknowledge that this record or the event has been removed or moved in the source system. I just mark the record to 'N' for not current.

What you are describing here is restating the fact .... dangerous road.

The absence event occured in school A. That is the fact.

When the student transfers to school B, any new absences occur in school B. That is the fact.

If you want to report all the students absences as if they occured in school B, there are many reporting techniques to do this (i.e. reporting using the current school).


Ah yes, this is the adjustment that I need to make in the ETL when loading into the fact table. This will help clear up mixing of events between locations. I agree with you that where the absence that occurs is the fact. I need to revise my ETL. I can report on the current school by using the student dimension which tracks the current school. Thanks!

pitbull mix

Posts : 8
Join date : 2012-06-11

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  ngalemmo on Wed Oct 24, 2012 3:48 pm

cdowney wrote:You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.

Your dimension tables contain the surrogate key and the natural key. Your fact tables only contain the surrogate key. How would you not know the natural key value based on facts? Relational databases allow you to join tables.

There is a 1:1 relationship between natural keys and surrogate keys. If you know one it shouldn't be hard to figure out the other.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Business Key in Fact Table

Post  pitbull mix on Wed Oct 24, 2012 4:18 pm

ngalemmo wrote:
cdowney wrote:You say not to put a natural or business key in the fact table. However, after I created my model I was faced with the task of keeping my fact table updated as related to the source system. If you don't have a way to know what changes happened related to facts how do you keep your fact table up to date? I could put all of the events in a dimension (yuck!), but that creates a one to one relationship that I don't want to maintain and is considered poor design.

Your dimension tables contain the surrogate key and the natural key. Your fact tables only contain the surrogate key. How would you not know the natural key value based on facts? Relational databases allow you to join tables.

There is a 1:1 relationship between natural keys and surrogate keys. If you know one it shouldn't be hard to figure out the other.

Are you saying that I do need to have a dimension that has every absence event? The surrogate and natural key would be in this dimension and related to the fact table, but then it becomes a one to one. So for every absence fact that I enter I will enter a dimension record. Is this poor design and what's the alternative to finding the fact if you want to update , remove? Thanks!


pitbull mix

Posts : 8
Join date : 2012-06-11

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  adypoko on Wed Oct 24, 2012 4:33 pm

Depending on the business key (natural key) data type and the likelihood of it changing over time, or it being recycled, you may decide to use it, or create a surrogate key for it. I always have a unique surrogate key on my dimensions, and that will be a FK in the fact. But in the event the dimension has SCD 2 attributes then I am also considering adding the the business key (if appropriate data type) to the fact table, or create another surrogate key that is one to one to the business key in the dimension (it doesn't change when the dimension record versions) and add that additional surrogate key to the fact table. This allows for 3 types of reporting:
1. As Was (fact joins to the dimension on the unique surrogate key).
2. Current (fact joins to the dimension on the business key or additional surrogate key and a current record indicator being true)
3. As Of (fact joins to the dimension on the business key or additional surrogate key and As Of Date being between the effective dates of the dimension). This makes sense when the As Of date is different than the fact date.
Employee Dimension
Surrogate Key Business Key/ Additional Surrogate KeyEmployee Name Region Eff StartDt Eff End Dt Current
1 107 John Smith East 01/01/2011 07/01/2011 FALSE
2 107 John Smith West 07/02/2011 12/31/2099 TRUE
3 243 Jane Doe East 01/01/2011 12/31/2099 TRUE

Fact table
SaleDate Emp SurrKey Emp Add KeyDollar Amount
02/15/2011 1 107 $50
03/30/2011 3 243 $25
06/30/2011 1 107 $5
08/24/2012 2 107 $10

So, sales by "As Was" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp surr Key = Surrogate Key
group by REGION
REGION SUM
EAST $50 + $25 + $5
WEST $10

sales by "Current" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and Current = TRUE
group by REGION
REGION SUM
EAST $25
WEST $50 + $5 + 10

sales by "As Of 01/01/2012" region :
Select Region, SUM (Dollar Amount)
FROM Fact
Join Employee ON Emp Add Key = Additional Key and 01/01/2012 BETWEEN eff st dt and eff end dt
group by REGION
REGION SUM
EAST $25
WEST $50 + $5 + 10

There may be instances where there is no reason to create that additional surrogate key and just use the Business Key or Natural Key instead: e.g. when data type doesn't impact joining performance and it is not recycled by the business and it is not re-cast, or when none of the drawbacks of using a business key instead of a surrogate key are present. In those instance by just using the business key or natural key will save you some etl processing time, some maintenance of an additional set of surrogate key, and it's readily available in the fact to create metrics such as counts distinct, as count distinct would not work on a true unique surrogate key for SCD 2 dimensions. Also joining to a dimension and getting a measure from there might not work well with some BI tools. E.g. counting distinct business key from the dimension instead of having the business key on the fact table itself.


adypoko

Posts : 5
Join date : 2012-10-24

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  hang on Wed Oct 24, 2012 6:16 pm

I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:

http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/

Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.

I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  Dave Jermy on Thu Oct 25, 2012 8:47 am

We have put the NK into one fact table. The fact is a lifetime summary of a customer's activity and as such has only one row per customer. Since our customer dimension is SCD2, either have to use the NK or go and update the SK in the fact every time there is a new SCD2 change. The downside is remembering that you have to return a specific customer row (usually the current one) when querying the lifetime fact.

Dave Jermy

Posts : 33
Join date : 2011-03-24
Location : London, UK

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  adypoko on Thu Oct 25, 2012 10:12 am

hang wrote:I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:

http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/

Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.

I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.
That is similar to what I suggested above. What Kimball calls a durable key, I was calling it an additional surrogate key and suggesting that depending on the natural key, the natural key may be used instead of the additional surrogate key. I am happy to see that Kimball is on the same page with me. Also, I may sound a bit of a wise guy, but as a subscriber to Kimball's desing tips, I wouldn't call a June 2010 article recent.

adypoko

Posts : 5
Join date : 2012-10-24

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  ngalemmo on Thu Oct 25, 2012 5:26 pm

hang wrote:I think both SK and NK or DK (Durable Key) in fact table make sense based on Kimball's recent article:

http://www.kimballgroup.com/2010/06/21/extreme-status-tracking-for-real-time-customer-analysis/

Before reading this article, I used to rely on self join on dimension to achieve the same result, ie. applying current dimension version for the fact in the past. Self join would work quite well if the dimension is not big. However in case of customer dimension, an extra join would be quite significant for performance, while adding a NK in the fact table is a good tradeoff.

I guess it comes down to educating user to understand what these different types keys in fact table are meant to be used. On the same token, self join approach also involves some education.

This has been around for years... the durable key is not the natural key however (NK's are not very durable), it is a type 1 surrogate key. Works fine and eliminates the need for a self-join to get the current row. Alternately, you field two dimension tables, a type 1 version and a type 2 version.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  ngalemmo on Thu Oct 25, 2012 5:31 pm

cdowney wrote:
Are you saying that I do need to have a dimension that has every absence event? The surrogate and natural key would be in this dimension and related to the fact table, but then it becomes a one to one. So for every absence fact that I enter I will enter a dimension record. Is this poor design and what's the alternative to finding the fact if you want to update , remove? Thanks!
No. Events do not go in dimension tables. If you mean, should you have an event type dimension? Probably. This would be very small and the fact would reference that to get information describing the kind of event.... such as: absence - no notice, absence - sick, etc...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Kimball Group Design tip on Durable Key

Post  jmagana on Fri Nov 09, 2012 3:37 pm

There is a July 2012 article on Durable Keys. Also I do remember reading a posting where ngalemmo recommended this approach for historical lookups.
http://www.kimballgroup.com/2012/07/10/design-tip-147-durable-super-natural-keys/

jmagana

Posts : 5
Join date : 2010-11-29
Location : California

View user profile

Back to top Go down

Re: Business keys or Natural keys in the Fact table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum