Assign default value for a dimension
4 posters
Page 1 of 1
Assign default value for a dimension
Hi,
We have the following case:
For the product code in the source data, lookup the product natural key in the product dimension and assign the surrogate key returned by the lookup to the fact. If the lookup fails, assign the unknown product whose natural key is "Unknown". The unknown product is entered initially in the dimension and should never change afterwards, we use negative SID for default entries (-1 for "unknown", -2 for "not applicable" and -3 for "not available").
In the ETL design, there is a debate on how to implement this, some want to hard code the SID for all dimension with default values (ex: if the lookup fails put the value -1 in the key field). However, I have doubts since the ETL has to validate at all times that the DB integrity is not compromised (integrity is turned off in the DB). My opinion is that the "Unknown" dimension entry should always be retrieved for a "default" lookup in the ETL to verify this integrity. This would result in the following ETL:
1. Lookup the product SID in the product dimension using the natural key
2. If the product is found, assign the product SID to the fact
3. Else lookup the default SID using the "Unknown" natural key and assign the SID retrieved. (a similarl lookup can be done for "not applicable" or "not available)
This method works for all dimensions and guarantees the integrity of the database. This also protects against "hard coding" a surrogate key which doesn't sound too good imho... Even if the default keys will always be -1, -2 and -3 we should not use this values directly.
Any thoughts on this topic?
Thanks in advance,
Mathieu
We have the following case:
For the product code in the source data, lookup the product natural key in the product dimension and assign the surrogate key returned by the lookup to the fact. If the lookup fails, assign the unknown product whose natural key is "Unknown". The unknown product is entered initially in the dimension and should never change afterwards, we use negative SID for default entries (-1 for "unknown", -2 for "not applicable" and -3 for "not available").
In the ETL design, there is a debate on how to implement this, some want to hard code the SID for all dimension with default values (ex: if the lookup fails put the value -1 in the key field). However, I have doubts since the ETL has to validate at all times that the DB integrity is not compromised (integrity is turned off in the DB). My opinion is that the "Unknown" dimension entry should always be retrieved for a "default" lookup in the ETL to verify this integrity. This would result in the following ETL:
1. Lookup the product SID in the product dimension using the natural key
2. If the product is found, assign the product SID to the fact
3. Else lookup the default SID using the "Unknown" natural key and assign the SID retrieved. (a similarl lookup can be done for "not applicable" or "not available)
This method works for all dimensions and guarantees the integrity of the database. This also protects against "hard coding" a surrogate key which doesn't sound too good imho... Even if the default keys will always be -1, -2 and -3 we should not use this values directly.
Any thoughts on this topic?
Thanks in advance,
Mathieu
MLaforge- Posts : 2
Join date : 2010-04-07
Re: Assign default value for a dimension
For the product code in the source data, lookup the product natural key in the product dimension and assign the surrogate key returned by the lookup to the fact.
You lookup the product natural key.... so how is it the natural key is unknown?
If you cannot find a natural key in a dimension table you insert a new row WITH THAT NATURAL KEY. You do not throw it away and use some generic value. Now, you may set attributes, such as product description, to 'Unknown', but you need to retain the identity of the dimension row at all costs.
You need to ask yourself why would such a situation occur... assuming the application system has a product master that it refers to in transactions? It usually happens because something went wrong with the dimension update (job failed or wasn't run). You need to retain the product identity in the dimensions so that if and when the source system product master is updated, it would properly populate dimension attributes for that row.
Re: Assign default value for a dimension
Thank you for your answer.
In that case, the source is not allowed to insert a new product. Products are maintained by a separate master source and an invalid product will be the result of an error in the source system for this ETL. We want to keep the record for the monetary fields but a report will be sent to the source with the invalid product code and necessary keys to identify the record.
I guess my questions could be summarized this way: "Are you in favor or not of hard coding a surrogate key in an ETL?"
Regards,
Mathieu
In that case, the source is not allowed to insert a new product. Products are maintained by a separate master source and an invalid product will be the result of an error in the source system for this ETL. We want to keep the record for the monetary fields but a report will be sent to the source with the invalid product code and necessary keys to identify the record.
I guess my questions could be summarized this way: "Are you in favor or not of hard coding a surrogate key in an ETL?"
Regards,
Mathieu
MLaforge- Posts : 2
Join date : 2010-04-07
Re: Assign default value for a dimension
The only time you may hard-code a surrogate key reference is in the case of a NULL natural key, but from an ETL standpoint it is easier to simply hard code a default natural key value when you encounter nulls and allow the normal surrogate key lookup process to handle it.
But the point I am trying to make is if you receive a transaction that deals with product "XYZ" and you don't know what "XYZ" is (i.e. it is not in the dimension table), you need to create an entry for "XYZ" so, at the very least, you know what product ID the numbers refer to. Without that, how do you reconcile things if there are 10 different facts for different unknown products all pointing to the same dimension row?
But the point I am trying to make is if you receive a transaction that deals with product "XYZ" and you don't know what "XYZ" is (i.e. it is not in the dimension table), you need to create an entry for "XYZ" so, at the very least, you know what product ID the numbers refer to. Without that, how do you reconcile things if there are 10 different facts for different unknown products all pointing to the same dimension row?
Re: Assign default value for a dimension
There is no "one size fits all" for solving this problem. But hard-coding a default value (e.g. -1) is near the bottom of my list. Here is my list of potential solutions to the problem of how to handle RI failures between fact and dimension:
Good luck with your project. -- Joy
- Halt processing and abort. Hard to imagine this is a good idea.
- Throw away the bad fact row. I can imagine a handful of scenarios where this could be what you want to do, but rarely.
- Divert the fact row to an error table. This is a common, simple technical solution. It should be implemented ONLY if you develop policies and procedures for getting the bad fact rows out of jail.
- Add a placeholder row in the dimension table. Technically the hardest to implement (but not very hard). A nice solution because once you do receive the correct dimension attributes, you simply update the dimension row. The fact table has the correct surrogate key and hence does not need any correction. Avoiding fact table updates is good. Before implementing this strategy, I like to understand why I might receive a fact row before its corresponding dimension member.
- Hard code to the "unknown member" (e.g. -1) This is commonly implemented because it's easy. The problem is that you're throwing away the information (transaction system key) needed to fix the problem. I really do not like this solution, and cannot recommend it as a best practice.
Good luck with your project. -- Joy
Re: Assign default value for a dimension
If you are not able to update dimension table, you can create one more reference table which can be inserted and updated and will hold negative running number (like surogate key) for each product which was not found in dimension table. You need to follow below steps
1. lookup on existing product dimension table if not found then lookup on newly added refence table, If lookup failed with respect to new reference table then insert one record with negative running number and product information.
2. When next time same product will come ( which not found in dimension table) that will be found in reference lookup table and negative running number will be passed to fact table.
This solution has 3 benifits
1. Allows every data load into fact table.
2. No need to use single default value which leads confusion in aggregation.
3. No data integrity issues in fact table.
While reprots you can take union of product dimension and new reference lookup table and then join with fact to get full data.
Cheers,
Sachin
1. lookup on existing product dimension table if not found then lookup on newly added refence table, If lookup failed with respect to new reference table then insert one record with negative running number and product information.
2. When next time same product will come ( which not found in dimension table) that will be found in reference lookup table and negative running number will be passed to fact table.
This solution has 3 benifits
1. Allows every data load into fact table.
2. No need to use single default value which leads confusion in aggregation.
3. No data integrity issues in fact table.
While reprots you can take union of product dimension and new reference lookup table and then join with fact to get full data.
Cheers,
Sachin
sac587118- Posts : 2
Join date : 2010-12-10
Similar topics
» Effective and Expiry date for Dimension Default Row
» Hot swappable dimension and conformed dimension usage (bank/ credit union)
» Having manually inserted 'Default' rows in DIM tables
» bridge table and junk dimension on customer dimension (bank/credit union)
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
» Hot swappable dimension and conformed dimension usage (bank/ credit union)
» Having manually inserted 'Default' rows in DIM tables
» bridge table and junk dimension on customer dimension (bank/credit union)
» Joining Type II Dimension Hierarchy directly to Fact versus through a Dimension table
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|