Assign default value for a dimension

View previous topic View next topic Go down

Assign default value for a dimension

Post  MLaforge on Wed Apr 07, 2010 4:36 pm

Hi,

We have the following case:
For the product code in the source data, lookup the product natural key in the product dimension and assign the surrogate key returned by the lookup to the fact. If the lookup fails, assign the unknown product whose natural key is "Unknown". The unknown product is entered initially in the dimension and should never change afterwards, we use negative SID for default entries (-1 for "unknown", -2 for "not applicable" and -3 for "not available").

In the ETL design, there is a debate on how to implement this, some want to hard code the SID for all dimension with default values (ex: if the lookup fails put the value -1 in the key field). However, I have doubts since the ETL has to validate at all times that the DB integrity is not compromised (integrity is turned off in the DB). My opinion is that the "Unknown" dimension entry should always be retrieved for a "default" lookup in the ETL to verify this integrity. This would result in the following ETL:
1. Lookup the product SID in the product dimension using the natural key
2. If the product is found, assign the product SID to the fact
3. Else lookup the default SID using the "Unknown" natural key and assign the SID retrieved. (a similarl lookup can be done for "not applicable" or "not available)

This method works for all dimensions and guarantees the integrity of the database. This also protects against "hard coding" a surrogate key which doesn't sound too good imho... Even if the default keys will always be -1, -2 and -3 we should not use this values directly.

Any thoughts on this topic?

Thanks in advance,

Mathieu

MLaforge

Posts : 2
Join date : 2010-04-07

View user profile

Back to top Go down

Re: Assign default value for a dimension

Post  ngalemmo on Wed Apr 07, 2010 5:10 pm

For the product code in the source data, lookup the product natural key in the product dimension and assign the surrogate key returned by the lookup to the fact.

You lookup the product natural key.... so how is it the natural key is unknown?

If you cannot find a natural key in a dimension table you insert a new row WITH THAT NATURAL KEY. You do not throw it away and use some generic value. Now, you may set attributes, such as product description, to 'Unknown', but you need to retain the identity of the dimension row at all costs.

You need to ask yourself why would such a situation occur... assuming the application system has a product master that it refers to in transactions? It usually happens because something went wrong with the dimension update (job failed or wasn't run). You need to retain the product identity in the dimensions so that if and when the source system product master is updated, it would properly populate dimension attributes for that row.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Assign default value for a dimension

Post  MLaforge on Thu Apr 08, 2010 9:04 am

Thank you for your answer.

In that case, the source is not allowed to insert a new product. Products are maintained by a separate master source and an invalid product will be the result of an error in the source system for this ETL. We want to keep the record for the monetary fields but a report will be sent to the source with the invalid product code and necessary keys to identify the record.

I guess my questions could be summarized this way: "Are you in favor or not of hard coding a surrogate key in an ETL?"

Regards,

Mathieu

MLaforge

Posts : 2
Join date : 2010-04-07

View user profile

Back to top Go down

Re: Assign default value for a dimension

Post  ngalemmo on Thu Apr 08, 2010 11:32 am

The only time you may hard-code a surrogate key reference is in the case of a NULL natural key, but from an ETL standpoint it is easier to simply hard code a default natural key value when you encounter nulls and allow the normal surrogate key lookup process to handle it.

But the point I am trying to make is if you receive a transaction that deals with product "XYZ" and you don't know what "XYZ" is (i.e. it is not in the dimension table), you need to create an entry for "XYZ" so, at the very least, you know what product ID the numbers refer to. Without that, how do you reconcile things if there are 10 different facts for different unknown products all pointing to the same dimension row?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Assign default value for a dimension

Post  Joy on Mon Apr 12, 2010 1:42 pm

There is no "one size fits all" for solving this problem. But hard-coding a default value (e.g. -1) is near the bottom of my list. Here is my list of potential solutions to the problem of how to handle RI failures between fact and dimension:

  • Halt processing and abort. Hard to imagine this is a good idea.
  • Throw away the bad fact row. I can imagine a handful of scenarios where this could be what you want to do, but rarely.
  • Divert the fact row to an error table. This is a common, simple technical solution. It should be implemented ONLY if you develop policies and procedures for getting the bad fact rows out of jail.
  • Add a placeholder row in the dimension table. Technically the hardest to implement (but not very hard). A nice solution because once you do receive the correct dimension attributes, you simply update the dimension row. The fact table has the correct surrogate key and hence does not need any correction. Avoiding fact table updates is good. Before implementing this strategy, I like to understand why I might receive a fact row before its corresponding dimension member.
  • Hard code to the "unknown member" (e.g. -1) This is commonly implemented because it's easy. The problem is that you're throwing away the information (transaction system key) needed to fix the problem. I really do not like this solution, and cannot recommend it as a best practice.

Good luck with your project. -- Joy
avatar
Joy

Posts : 20
Join date : 2009-02-03
Location : Kimball Group

View user profile http://www.kimballgroup.com

Back to top Go down

Re: Assign default value for a dimension

Post  sac587118 on Mon Dec 20, 2010 8:02 am

If you are not able to update dimension table, you can create one more reference table which can be inserted and updated and will hold negative running number (like surogate key) for each product which was not found in dimension table. You need to follow below steps
1. lookup on existing product dimension table if not found then lookup on newly added refence table, If lookup failed with respect to new reference table then insert one record with negative running number and product information.
2. When next time same product will come ( which not found in dimension table) that will be found in reference lookup table and negative running number will be passed to fact table.

This solution has 3 benifits
1. Allows every data load into fact table.
2. No need to use single default value which leads confusion in aggregation.
3. No data integrity issues in fact table.

While reprots you can take union of product dimension and new reference lookup table and then join with fact to get full data.

Cheers,
Sachin


avatar
sac587118

Posts : 2
Join date : 2010-12-10

View user profile

Back to top Go down

Re: Assign default value for a dimension

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum