Early Arriving Facts

View previous topic View next topic Go down

Early Arriving Facts

Post  apc on Wed May 08, 2013 5:26 am

When early arriving facts (fact data comes before the reference dimension data) are a common reallity you have the option of creating a placeholder in the respective dimension in order to avoid rejecting the early arriving row and losing data.

When the fact arrives and the lookup process does not identify the surrogate key, you have basically 2 main options:

- reject the fact (redirect to exception report and handle later);
- accept the fact by creating a placeholder for the incoming dimension and using the new surrogate key just created;

Imagining that we create a new row in the dimension with the incoming source system key (natural key), the new surrogate key and unknown values for dimension descriptive values (example: -1, 'Not Known at the time').
We've created these descriptive values with the default value (example: -1, 'Not Known at the time') so we can accept the fact and later be able to populate the newly created dimension row with the actual descriptive values.

My question is: When the (late) dimension row arrives and the ETL is going to update it, imagine that some of these descriptive values are of Type 2 (SCD). In this case there is the need to create a new surrogate key for this row to handle the SC2 attributes that changed from the default value to their actual value.

Is this a normal procedure? If early arriving facts are common, this situation could lead to a lot of new rows being created in the dimension table due to SCD2 handling.

I know that first it's important to understand why is this early arriving facts situation so common and correct it in the source, but I was wondering if this solution of using dimension palceholders is acceptable.

Can anybody enlighten me?

Thanks.

apc

Posts : 8
Join date : 2012-10-11

View user profile

Back to top Go down

Re: Early Arriving Facts

Post  ngalemmo on Wed May 08, 2013 5:30 pm

Generally, when you infer dimension rows like you describe, they are treated as if they don't exist from the point of view of the dimension update process. In other words, the initial row is updated in place, no history. Subsequent updates would create history as you normally would for a type 2 or type 3 dimension.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Early Arriving Facts

Post  apc on Thu May 09, 2013 3:30 am

Yes, it makes sense. When this dimension row only has surrogate key and natural key and all other values "empty", then I should not keep history of it. Perhaps I should have a conditional check in my ETL to identify this kind of rows.

Thanks.

apc

Posts : 8
Join date : 2012-10-11

View user profile

Back to top Go down

Re: Early Arriving Facts

Post  gilroy on Tue Jun 25, 2013 5:40 am

I know that first it's important to understand why is this early arriving facts situation so common and correct it in the source, but I was wondering if this solution of using dimension palceholders is acceptable.

gilroy

Posts : 1
Join date : 2013-06-25

View user profile

Back to top Go down

Re: Early Arriving Facts

Post  svmayor on Sat Oct 12, 2013 12:45 pm

Yes, placeholder dimension records are the way to go, but use a real surrogate key and assign all the attributes with default values that can be logically identified in the ETL process. You don't want to use a -1 value in the dimension and fact table records. Here's why:

Dimensions

- The surrogate key is the primary key for the dimension. Therefore, you could only have one -1 dimension record. You need a dimension record for each natural key value associated with the fact table record(s).

Facts

- if you use a -1 as a foreign key this will result in a unique key violation error occurring and the process will fail at the database level.

By generating a dimension record with a unique dimension key, you can capture the natural key that didn't resolve to a valid dimension record. Also any hierarchy data along with the associated metrics can be reported from the fact table record, which makes it easier to investigate the data for problem solving.

avatar
svmayor

Posts : 3
Join date : 2013-03-06

View user profile

Back to top Go down

Re: Early Arriving Facts

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum