How to manage nulls in Fact talble

View previous topic View next topic Go down

How to manage nulls in Fact talble

Post  wizard on Wed Dec 01, 2010 1:07 pm

Hello, I am new with data warehousing. I need some help with following questions

1. should the nulls be left as nulls in fact table or should they be replaced with some number that might never be used like -7 or -1. If they should be replced with -1/-7 what's the logic behind this?

2.Secondly how to handle nulls in a fact table with datatype as date . We can not insert a number like -7. Is it a good idea to change the date data type columns present in fact table to varchar2 or char as they are just there for display purpose this way we will be able to insert -7/-1.

Thaks in advance.


wizard

Posts : 13
Join date : 2010-11-30

View user profile

Back to top Go down

How to manage nulls in the Fact table

Post  warrent on Wed Dec 01, 2010 6:26 pm

First, we have to be clear about what columns in the fact table might have nulls. If the column is a foreign key column to one of the dimensions, it must have a value in it. Leaving it null means the fact row will not be included in the results set in a standard inner join between the fact table and the dimension. The -1 idea comes from the way we generally manage surrogate keys in our dimension tables. We usually start with 1 and increment by 1. That leaves the negative numbers available. You would add a row to the dimension with the key of -1 and fill in the descriptive attributes with appropriate entries, like "Missing Product", and "Missing Region".

Here's a link to an old design tip on this topic: Dealing With Nulls In The Dimensional Model

If, on the other hand, the Null values are measures in the fact table, you have to be very careful what you do with them. -1, or -7 would not be appropriate because they look like real measures and would be included in any aggregates. For example, if the field is SalesQty and you put in -7, clearly any user query will get the wrong answer for Total SalesQty. These should probably be left null. Only put a real number in a measure field after careful deliberation with your business folks.

The third possibility is you have various values in your fact table that are not measures and do not join to dimensions. You mention a date as an example. I would argue that you should try to convert the date to an integer surrogate key and join it back to your Date dimension. That way, you can add a -1 row to your Date dimension as described in the first paragraph above.

If you don't have a dimension join to translate the meaning of a -1 row, substituting any value for Nulls will most likely lead to confusion.

--Warren
avatar
warrent

Posts : 41
Join date : 2008-08-18

View user profile

Back to top Go down

How to manage nulls in the Fact table

Post  wizard on Wed Dec 01, 2010 10:23 pm

Thanks for you prompt reply. This clears lots of ambiguities in my mind.
Regarding third possibility, values in your fact table that are not measures and do not join to dimensions, you have suggested to join these values back to the dimension so that -1 one can be inserted. Here is why I decided to goahead without joining to them to dimension.
These values will always only be displayed and never be queried upon meaning they will never be in a where caluse. Joining them to a dimension will always create an extra join. Secondly granularity level of these dates is to a min which will make date dimension huge as we have dates from 1850 till 2015.

Please let me know if I am missing out anything.

Thanks Again

--Wizard

wizard

Posts : 13
Join date : 2010-11-30

View user profile

Back to top Go down

Re: How to manage nulls in Fact talble

Post  hang on Thu Dec 02, 2010 7:33 am

I would still use date key for these date fields in the fact as they are not measures and degenerated dimensions, so they are normal dimensions. Using smart surrogate integer key yyyymmdd and -1 for nonexistence date can make the key self deriving and avoid null date value. The advantage of FK to the date dimension is that you can show more descriptive information for any odd dates stored in much smaller dimension instead of repeating textual values in the fact.

If 1850 and 2015 are a valid years in the fact table, then extending the date dimension to cover enough history and future dates is quite reasonable approach. A date dimension with tens of thousand records is still a small dimension compared to what we refer to as big dimensions. You don't have to populate all the date attributes for those out-of-range dates as long as referential integrity is ensured. You could incrementally extend your date dimension based on the dates in the fact if that makes the date dimension significantly smaller.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: How to manage nulls in Fact talble

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum