character data in a fact table?

View previous topic View next topic Go down

character data in a fact table?

Post  wjordan on Thu May 06, 2010 7:36 am

I just changed companies. i was accustomed to pure star schemas with fact tables containing only foreign keys to dimension tables and number measures that could be aggregated (sum, average, min, max).

At my new job, I find dims with foreign keys (making more of a snowflake) but still lots of denormalizing, so it isn't an attempt at a 3NF warehouse either. However, most perplexing is character data (names, for example) in "fact" tables. Is this a recognized viable technique that I just haven't seen before? It seems so strange and very wrong to me, but before I wave red flags, I just wanted to verify that I'm not missing the latest trend.

wjordan

Posts : 6
Join date : 2010-05-05

View user profile

Back to top Go down

Re: character data in a fact table?

Post  ngalemmo on Thu May 06, 2010 11:40 am

No, having names and other text fields in a fact table is not good practice. The only character fields should be degenerate dimensions, such as order number, and maybe the occasional flag. But, it will also depend on what database platform is being used. For traditional databases, such as Oracle, DB2 or SQL Server, it is not a good thing. But for column oriented databases, such as Sybase IQ and Vertica, it doesn't really matter. And, if you are using an MPP platform, it will depend.

But as to what to do about it? You would have a difficult time trying to convince anybody to go back and recode existing processes that load existing tables. After all, it isn't broken, right? If users start complaining about performance and someone is willing to fund a restructuring, then you can do something. In the mean time, just follow good practice on anything new that needs to be built.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

good advise received

Post  wjordan on Thu May 06, 2010 12:24 pm

Thanks. I wanted to make sure I wasn't missing some new trend. We are Oracle. And coworkers are still wanting to build new facts with names, etc. So I will challenge it on the new, but not try to rewrite history.
Thanks

wjordan

Posts : 6
Join date : 2010-05-05

View user profile

Back to top Go down

Re: character data in a fact table?

Post  hang on Sat May 08, 2010 6:46 am

Any names or descriptions appearing in the fact is sign of bad normalisation in its source transaction table from a OLTP system. In DW, they should be loaded into their respective dimension tables by using SELECT DISTINCT. The fact table should be highly normalised mostly containing surrogate dimension keys and numeric values. For most of low selectivity flags and indicators, they may either belong to certain dimension tables or can be wrapped into a junk dimension, and hence you end up with a clean fact table.

Sometimes the degenerate dimensions are necessary for some character fields like order number in the fact table if they are of high selectivity. Storing these values in dimensions and having their surrogate keys in the fact table does not help performance nor does it save any space. Very likely the count on them might be sort of fact. In some cases, a numeric value could be both fact and dimension, such as age. In Microsoft term, they are called fact dimension and can be configured in the cube without having a physical dimension table. For other platform, a physical dimension table may be derived from fact by SELECT DISTINCT.

Occasionally datetime fields rather than date keys could exist in fact table, especially when the hours and minutes are of interest. Having a time dimension down to minute may create a massive dimension or make date and time dimensions more complex. I used to use meaningless date key in fact tables based on Kimball's justification in his book "The Complete Guide to Dimensional Modeling, 2nd Edition", however the current trend is to use yyyymmdd integer key as the only exception for date dimension. Using yyyymmdd integer can also cater for non existence date and has the following additional advantages:

a. It's easier to debug Fact table load.
b. Fact table partitioning becomes easier.
c. Changing starting date in date dimension does not need to rebuild the fact.

As best practice and general rule, dimension table should be denormalised especially constructing hierarchies. Snow-flaking dimensions defeats the purpose of dimensional modeling and makes ETL more complex when dealing with more than one slowly changing dimensions in the relationship.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Debate at work concerning text fields in a fact table

Post  mewilliams2001 on Thu Jun 03, 2010 5:16 pm

At my job I am experiencing this debate. I know from 'The Data Warehouse Toolkit' and all the other design tips that I have read that text fields belong in Dimensions and not in Fact tables. They provide entry through filters and they decrease the size of the fact table. However, I find myself in a debate with others who disagree and believe that having the text fields in the Fact table make the text fields more accessible for reporting purposes. The reason given is that there is no longer a need to join to other dimensions to get the text data beyond the initial join into the fact table.

Note a few things about this situation. First, this model does have dimension tables surrounding the fact table. Second, this model has all of the dimension fields included in the fact table as well. Third, this is in Microsoft Analysis Services and not a relational Data Warehouse. Fourth, the developer who created this model says he is okay with the extra spaced taken up by having all the text descriptors in the fact table because he witnesses an increase in performance on the reporting application.

So essentially what we have is a Financial Transaction Fact Table with dimensions like Customer and Product surrounding it. But the Customer and Product fields are also included on the Fact table itself.

I debate that this is still bad practice and should be changed. I debate that any performance gain observed by putting these text fields into the fact table should be available in other ways.

Does this analysis sound accurate or is there a time when text fields in the fact table is a viable solution to reporting performance issues?

Also, could someone provide a detailed set of reasons why text fields on the fact table are inappropriate? Because of the lack of concern over fact table size, I am looking specifically for reasons beyond that.

Thanks,
Marc.

mewilliams2001

Posts : 2
Join date : 2010-06-03

View user profile

Back to top Go down

Re: character data in a fact table?

Post  ngalemmo on Thu Jun 03, 2010 6:59 pm

Don't confuse dimensional design using relational databases with the implementation of cubes in multi-dimensional databases (such as Analysis Services). An MDDB doesn't have 'tables' per se, just measures and dimensions.

The 'one less join' argument just doesn't carry much weight in a dimensional design using a traditional relational database. A fact table is usually many orders of magnitude larger than a dimension table and any additional text fields will usually significantly increase the size of a row compared to one made up solely numeric values (keys and measures). Since analytic queries typically pull large numbers of fact rows, the increase in data volume being accessed more than offsets any gain by eliminating a join.

Besides, with memory as cheap as it is and 64bit processors, in-memory caching of the dimension tables eliminates any presumed overhead a join entails.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: character data in a fact table?

Post  hang on Fri Jun 04, 2010 7:37 am

Using SSAS should not be an excuse of abusing fact table with text fields if they are not degenerate dimensions in nature (high cardinality). Any highly repeated and lengthy textual values in fact table are unnecessary waste of space for gaining nothing but potential performance killer when the table reaches certain size. It's bad practice in both dimensional and relational modeling for any purposes.

In your case, the text fields may be short ones, or the fact table size is not that big (<10 million rows) so the performance and space saving is not a big deal. However, it is the matter of principle when designing a database system in general, and once any kind of loose attitude starts gaining ground, people starts throwing any fields into any tables they like. A single improperly placed lengthy field in fact table can easily double (or triple) the size of the table. It's a simple math when considering the performance impact.

The 'one less join' argument just doesn't carry much weight in a dimensional design using a traditional relational database
In my experience, the performance of joining dimensions is just as fast as they are in the fact tables. It seems convenient by avoiding joining dimensions, but what about trying to provide a drop down list from these textual values for filtering purpose. You would use the costly SELECT DISTINCT from a fact table on the fly. Even with SSAS cube operation, how can you use these fields to filter your measures if you don't put them into dimensions. You might have to configure the cube in a hard way, and at expense of cube processing time, to render these fact dimensions while there is a simple and elegant way to get the structure right in first place.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Debate at work concerning text fields in a fact table

Post  mewilliams2001 on Fri Jun 04, 2010 8:14 am

I would like to thank you both for your quick response and experiance in this. I am afraid that the majority of my experiance in this particular area of database design and software development comes primarily from Kimball's books and kimballgroup.com. So I speak primarily on principal rather than from experiance and full understanding.

I will continue to argue that text fields do not belong in the fact table, especially when the fields are so clearly members of a dimension and even already exist in a dimension.

Thank you,
Marc.

mewilliams2001

Posts : 2
Join date : 2010-06-03

View user profile

Back to top Go down

Re: character data in a fact table?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum