Fact dates before begin date of Dimension

View previous topic View next topic Go down

Fact dates before begin date of Dimension

Post  Galen Boyer on Fri Mar 04, 2011 9:59 pm

Hi all,

For our Type 2 SCD, we have the begin_date and end_date named as "relevance_begin_date" and "relevance_end_date". (I fought a good fight to make these the names of those fields).

We are currently setting the original relevance_begin_date to a particular date in the source based on our knowledge of the source. For example, dim_employees is set to (original_hire_date - 30). The etl problem is that some of the facts are coming in with dates before the original relevance therefore not landing because the src.ALTERNATE_KEY is found but the fact.DATE BETWEEN RELEVANCE does not find a relevant record. The reporting problem is that this fact happened, it was accepted by a system. Do we hold off the fact and find the fix to the dimension, or do we do runtime surgery on the original dimensional record?

Here's some of my ideas on runtime surgery.
1) Amend the original dimensional record's relevance to be before the fact.
2) Allow the fact in even with this anamoly, but have a quality dimensional key call that anamoly out.
3) Put this fact table into an error table
4) Add another dimensional record that is before the original dimensional record to support this fact.

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  ngalemmo on Sat Mar 05, 2011 2:07 am

If you have bad dates in the system, its a problem that needs addressing, but it shouldn't mess up referencing dimensions. Sure you find the low dates, but what about dates that are wrong but fit into the effectivity period of a row?

When I implement effective dates on dimensions, the first row for a NK usually has an early date, such as 1/1/1900 with the realization that even if dates are bad, doing anything else won't solve the date problem.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Galen Boyer on Sat Mar 05, 2011 10:36 am

Thanks!

I've recommended we do pretty much what you are saying, but I've recommended that we actually create a dimensional record that is for this case, per source key. Whenever the initial dimensional record is created, have a dimensional record created that spans the time period of 1900 to the start date of that record. The reason for this is because it would make an easy "after the fact" query for this problem. SELECT ALL FACTS WHERE THEIR KEYS ARE IN THE PRE-RELEVANCE-BUCKET.

The downside would be the dimension would be doubled in # of records initially. The upside is straightforwardness of this question across all facts. Also, the UI would show these, clearly error, attributes and therefore the EDW team wouldn't find it easy to leave them uncorrected. Those records would always demand attention when facts are tied to them.

The upside of your approach is 2-fold. 1) No doubling, 2) An already designed in way of identifying the "original" dimension. The downside is that this date problem we would need to constantly be vigilant about solving is now a different question per dimension.

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  hang on Sun Mar 06, 2011 12:05 am

Don't mix up SCD date with business date attribute (original_hire_date). The SCD dates are updated by the system to keep track of SCD changes, while business attributes come from source and will be the source of the changes. For the fact SK lookup, use SCD dates, and as ngalemmo suggested, make the SCD start date early enough to cover all the fact references. For business constraints use dimension attributes.

In case of the late arriving dimension in the fact, add a new dimension record that has only SK and NK with other attributes set to null in your dimension, and let ETL to populate the null attributes in due course when the true dimension details arrive.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Galen Boyer on Sun Mar 06, 2011 10:23 am

Thanks. Our actual SCD dimensions are young enough and the number of them are small enough that we can change to that "philosophy".

What I don't like is you say "In case of the late arriving dimension in the fact, add a new dimension record that has only SK and NK with other attributes set to null in your dimension, and let ETL to populate the null attributes in due course when the true dimension details arrive." Shouldn't the dimensional attributes all be NON NULL? Therefore, all attributes of this bucket record would be something like "NOT HERE YET"?

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  hang on Mon Mar 07, 2011 7:46 am

Galen Boyer wrote:Shouldn't the dimensional attributes all be NON NULL?
Why? What if you have a date or numeric attribute. Don't get confused between non-existence and late arriving dimension. The former is about having an unknown dimension record to bucket all NULL valued dimension references in the fact to ensure referental integrity, while the latter is about dimensions that appear in the fact, but not loaded into dimension table yet.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Galen Boyer on Mon Mar 07, 2011 9:02 am

Its stated that way on this website. http://www.ralphkimball.com/html/10dt/DT128SelectingDefaultValuesNulls.pdf.

SECTION -> Handling Null Attribute Values in Dimension Tables
Nulls should also be avoided when we can’t provide a value for a dimension attribute in a valid dimension row.

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  hang on Mon Mar 07, 2011 5:22 pm

The file is not accessible and I am really interested to see the justification.

But anyway, what would you put into a date and numeric attribute that is originally null in the source. Without seeing a convincing argument, I still believe, in many cases, null value has its place in dimension attributes. Especially for monster dimensions, any sparsely populated textual attribute would incur tremendous unnecessary waste of storage, compromising all the savings on the fact tables.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Galen Boyer on Mon Mar 07, 2011 5:47 pm

I put a period after the URL. Try the following.
http://www.ralphkimball.com/html/10dt/DT128SelectingDefaultValuesNulls.pdf

Its also stated in the Data Warehouse Design Solutions (foreward by Kimball) as the "Null avoidance principle", pg 73.
Here is what is said.

Handling Null Attribute Values in Dimension Tables
Nulls should also be avoided when we can’t provide a value for a dimension attribute in a valid dimension row.

There are a several reasons why the value of a dimension attribute may not be available:
Missing Value – The attribute was missing from the source data.
Not Happened Yet – The attribute is not yet available due to source system timing issues.
Domain Violation – Either we have a data quality issue, or we don’t understand all the business
rules surrounding the attribute. The data provided by the source system is invalid for the column
type or outside the list of valid domain values.
Not Applicable – The attribute is not valid for the dimension row in question.

Text attributes in dimension tables usually can contain the actual values that describe the null
conditions. Try to keep in mind the effect on BI tools downstream that have to display your special null
value description in a fixed format report. Avoid tricks we’ve seen, such as populating the default
attributes with a space or meaningless string of symbols like @@@ as these only confuse the business
users. Consider the default values for each dimension attribute carefully and provide as much meaning
as possible to provide context to the business users.

Numeric attributes in dimension tables will need to have a set of special values. A value of zero often is
the best choice because it is usually obvious to the users that it is artificial. Some numeric attributes will
present you with a difficult choice if the business users combine these values in numeric computations.
Any actual numeric value used to stand in for null (say, zero) will participate in the computation but give
misleading results. An actual null value often will cause an error in a computation, which is annoying but
at least does not produce a falsely confident result. Perhaps you can program your BI tool to display null
numeric dimension attributes with “null” so that you can both report and compute on these attributes
without worrying about distorted data.

Finally, these default value choices should be re-used to describe common null conditions across
business processes and dimension tables in your dimensional data warehouse.


Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  hang on Mon Mar 07, 2011 8:26 pm

Galen, thanks for the reference. Although I agree with most of the reasons why attributes should not be left as null, I still think it should be a guideline rather than a rule. There would be some exceptions, like the case of monster dimension in my previous post, or even late arriving dimensions where a dogmatic rule could cause a lot of extra unnecessary work by ignoring a simple effective solution.

Date attribute in dimension is another case that has not been directly addressed by the article. A simply null value in place of date attribute would suffice for most business requirements, instead of a confusing extreme early date or future date, or even snowflakeing your dimension by date dimension, which could be annoying when dealing with many dates.

Just for the sake of argument, sometimes a null value cannot always be replaced by a ‘0’ value for a numeric attribute. A typical SQL aggregate function AVG would most likely work properly with null values. Again some zeroed null values may work better for other aggregate functions.

All I am saying is, let's not use rules to dictate the design, even with dimensional modeling at its core fundamentals. It's a general guideline to denormalise dimensions instead of snowflaking them, but...

Here’s another relevant post with different point of view.
http://forum.kimballgroup.com/t707-null-values-best-practice-digest


hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Galen Boyer on Tue Mar 08, 2011 12:07 am

I guess, what would be most important to me is something in the dimensional row beyond NULL that identified the record as having been created to support a fact vs a the row being created through the normal ETL processes. That's why the NOT NULL rule makes sense to me. Upon creating the row from the fact, set the descriptive attributes to something like, "NOT FROM SOURCE" or something along those lines. If its just NULL, then there is no information from it. Did the NULL come from the source or from the ETL code? On numbers, I can see how 0 or something like that would work, in that you aren't going to be storing facts on the dimensions, so you shouldn't need to do sql avg or median or such. Dates seems to be were the amount of work to make it happen might be more than what Kimball is trying to gain, especially if you follow his advice that the dates on the dimensions actually have FKs back to the date table. But, even there, all you need is a single date on the date dim that is outside the dates of the date_dim and stands for "NOT FROM SOURCE". Create that one date, and that's what all dates for all these records would be set to. So, I think following the advice is quite doable, and I'd rather follow the Kimball advice until I can't.

Galen Boyer

Posts : 7
Join date : 2011-03-04

View user profile

Back to top Go down

Re: Fact dates before begin date of Dimension

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum