Avoiding Nulls in Dimension Tables

View previous topic View next topic Go down

Avoiding Nulls in Dimension Tables

Post  Jeff Smith on Fri Mar 11, 2011 4:46 pm

I am trying to prevent any of the columns in a dimension table from containing nulls and came across an interesting situation.

I have First Name, Last Name, Suffix. The Suffix is be null most of the time. If the 3 fields have to be concatenated in an report or ad hoc query and the user isn't careful, the results of concatenating the 3 fields can be null.

I should probably add in a field called full name and have the concatenation already done, but besides that, what's the best way of handling nulls in fields that have meaning such as a Suffix or Prefix in a Name? I would like to avoid, "NA. John Doe, NA".

Is it OK to enter in a space? Doing so makes my skin crawl but it's got to be better than "NA. John Doe, NA".

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Avoiding Nulls in Dimension Tables

Post  hang on Fri Mar 11, 2011 8:07 pm

In my way of thinking, any good methodology or best practice could be just as detrimental as bad ones if used dogmatically. NULL value is generally bad for query, but it still has its place in database in general and DW in particular, otherwise it would be deprecated in all new versions of database systems.

I can come up a number of reasons why NULL is irreplaceable by any other values. For numeric field, you could use 0 to replace NULL, but we all know they are totally different things. NULL value can make aggregate functions, including AVG, work properly, 0 will not. In your query, NULL can make calculations like A/NULL work properly, however A/0 will spoil the whole query.

Another example is for strings, again NULL will never be the same as blank string. NULL has its unambiguous meaning, while a blank string could be what, empty string, or one space or many spaces. And it is even more confusing to SQL, as spaces will be automatically trimmed off in some operations.

The last example, but not the least important, is the powerful ANSI standard function COALESCE(), aimed to replace vender specific functions, DECODE in Oracle and ISNULL in SQL Server, and sometimes awkward CASE statements. Bear in mind, COALESCE takes variable number of parameters which can replace a messy nested CASE statements much more elegantly and efficiently. However by throwing out the concept of NULL, you basically give up leveraging the power of COALESCE function within DW. Trust me, once you have started using it, giving it up is a huge pain.

In any types of project developments, I donít like people set rules using the best practice as excuse. I tend to refer to rules as guidelines and try not to rule out giving a go to different approaches. So my question is, have you ever tried relaxing non NULL rule and see how bad it makes your DW. Based on my experience, it would only make my life a lot easier.


Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

add in a field called full name and have the concatenation already done

Post  Mike Honey on Sun Mar 13, 2011 7:16 pm

Hi Jeff,

I'd definitely recommend: "add in a field called full name and have the concatenation already done". I think the dimension designer should be making these decisions and not leaving them to various query/report authors to come up with their own logic. The coding/technical issue you highlight is only one of several - what if one author prefers a "Lastname, Prefix Firstname" format? You have eroded the consistency that your DW/DM is supposed to encourage.

FWIW I rely on "Unspecified" or similar to replace NULLs. If that makes it hard for report authors to code their own derivations, that's a good thing in my view - it will drive them back to the DM/DW to use the data provided or request new derivations in that layer.

Good luck!
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Avoiding Nulls in Dimension Tables

Post  John Simon on Mon Mar 14, 2011 3:14 am

I agree with Jeff and Mike. Showing bad data forces the business to clean up its act.
If you want to get around adding an "Unknown" value, I'd go with a computed column for the FullName.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Avoiding Nulls in Dimension Tables

Post  Sponsored content

Sponsored content

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum