1 to 1 Text Data in a Fact Table

View previous topic View next topic Go down

1 to 1 Text Data in a Fact Table

Post  Brian on Fri Dec 03, 2010 2:57 pm



Hi Everyone,

I just bought two of Ralph Kimballs books and found this forum so was hoping I could get some help on a design issue I have. After a break from BI I have found employment and now have the chance to start doing some BI work in hopes of moving over fulltime to BI projects. My first small project is to review a dimensional model still in the design stage for a department. In my previous work I had worked as a BI developer for 4 years but solely worked on an aggregate data model.

The model I am reviewing now is a completely different one in which measures are not being stored but text type event data. There is no need and there will never be a need to capture numeric data as far as I was told. The source system and database stores all of its data in free text fields, there are no type tables and users can type in different data for the same item already stored. They can even type it in wrong as there is no validation.

I have no control over the source system, it will not be changed.

So in the dimensional model the fact tables are storing text data in place of the measures and are joined by conformed dimensions. The data in the star schemas are all 1 to 1. I suggested that they push their text data out into dimensions and make the fact tables, factless fact tables, and the person I was talking to asked why its better. I mentioned the flexibility of being able to aggregate if needed and keeping it in the form of a best practice and cleaning up the fact tables.

I was trying to think of a logical reason why we could not keep all the text data in the fact table rather than push it out in the dimension tables. The only plus I can think of for moving it is that it only allows keys and measures in the fact table. The downside to this is now we have more keys and joins and the 6,000 text items are now in a dimension table rather than in the fact table. If we left it in the fact table there would be no need for the extra joins to the dimension tables and I thought would make it quicker.

With the above information, would this be a case where we could keep all the text data in the fact table since its one to one data?

Thanks for any insight into this.

Brian


Brian

Posts : 7
Join date : 2010-12-03

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  Jeff Smith on Fri Dec 03, 2010 3:51 pm

The model is not totally clear but there are a couple of things. Is every field a free form text field in the dimension tables a free for text field? What would be the point of staring data that can never be queried at least in any meaningful way?

In any event, I suppose that keeping the fact Table clean and pushing all of the attribute data to dimensions even if it were a 1 to 1 on everything would improve the performance of the table when fewer than all of the dimensions where being queried. Having a wide fact table would be a nightmare to read off the disk.

Now, if the queries were always none with all of the dimensions then I suppose it wouldn't make a difference.

How is such data to be used?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  Brian on Sat Dec 04, 2010 1:22 pm


About 90% of the fields from the source database are free text believe it or not. I was floored when I saw it. I then asked is there anyway we can get them to fix it and was told no. So of course we have to bring that in to the dimensional model which I think is our big problem.

This is why it is basically a one to one fact table with the dimensions. There might be times when itís a one to many but that would be a fluke from what I was told. They want to star the data to make it easier for the end user to report on in an ad hoq way basically.

The fact table if it held the text data would be about 15 fields wide.

So from what your telling me its going to improve performance if I push out all the text data into a dimension table? I then store only the FK to this new dimension in the fact table.

So for each event there will be one record in the fact table with the various keys and then each dimension t able will also have one record joined one to one. There are about 2300 records in the source db right now so its small. So I would guess that each dim table would be around 2,000 rows, with the fact table at 2,300.

The data is used to basically report on events. An incident happened on this date at this time, involving these people with these descriptions. Was anyone charged, if so with what offense? Etc etc, and its all free text entries. I donít know who designed the database.

What the dimensional modeler wants to do is place a key in the fact table that is obtained from the source db that is the key to every table, objected. They would then place this key in each dimension table as well so that when a user selects an event, the same objected is used to link all the data. At first I said this is completely wrong and that we need to push data out and use proper FKs etc. But the more I thought about it, since its one to one it might work. I was told that this was done on a previous project successfully. But I really wanted to get some other opinions as to me it sounds different than my experience has shown.

I hope I have given enough information for a general analysis, if not I can try to add more. I am really interested to see if this can or should be done.

Thanks

Brian



Jeff Smith wrote:The model is not totally clear but there are a couple of things. Is every field a free form text field in the dimension tables a free for text field? What would be the point of staring data that can never be queried at least in any meaningful way?

In any event, I suppose that keeping the fact Table clean and pushing all of the attribute data to dimensions even if it were a 1 to 1 on everything would improve the performance of the table when fewer than all of the dimensions where being queried. Having a wide fact table would be a nightmare to read off the disk.

Now, if the queries were always none with all of the dimensions then I suppose it wouldn't make a difference.

How is such data to be used?

Brian

Posts : 7
Join date : 2010-12-03

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  Brian on Sat Dec 04, 2010 4:32 pm

I guess the part where they want to use the ObjectID everywhere as PK and FK is not required and not a best practice. I'll suggest we go with a surrogate key that is just an integer UID. I doubt it will ever be changed in the source database but it makes more sense to separate the two completely in this regard just in case.

Brian

Posts : 7
Join date : 2010-12-03

View user profile

Back to top Go down

Design Tip?

Post  Brian on Mon Dec 06, 2010 1:47 pm

Hi All,

I was just doing some reading under the Design Tips section and found the topic

Kimball Design Tip #13: When A Fact Table Can Be Used As A Dimension Table.

I was wondering if this tip fits my situation? For example
Quote:
Such a fact table blurs the distinction between a fact table and a
dimension table because this fact table is filled with discrete textual values and non additive numeric
values that cannot be summarized, but are instead the targets of end user constraints.


Seems to mirror what is being modeled in our data mart as we are storing event data. Would I be correct in using this tip document as support for using a textual fact table? Or did I missunderstand anything? I just really want to make sure we implement the best possible model.

Thanks for any help

Brian




Brian

Posts : 7
Join date : 2010-12-03

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  Brian on Wed Dec 08, 2010 1:15 pm

I have to finalize my data mart suggestions by Friday so thanks for the reply and I'll assume that my last suggestion is a Kimball best practice. I just wanted to make sure I was on the right path given our situation before the team actually starts construction.

Thanks again

Brian

Brian

Posts : 7
Join date : 2010-12-03

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  hang on Wed Dec 08, 2010 6:01 pm

Brian,

What Kimball suggested is to use special treatment for special cases where all the textual attributes are treated as degenerate dimensions because of the random nature of data entries. But that's an exception rather than a generally good practice for designing your dimensional model.

Let's face it, many legacy systems were built like spreadsheet without any sense of normalisation and referential integrity. Our job as an data modeller in this case, is not only just build a dimensionally modelled BI system, but also a properly structured operational system to ensure data quality. Without a proper designed OLTP system to guarantee the data quality, the BI/DW system would be of little use. There is another relevant post that addresses similar issue: http://forum.kimballgroup.com/dimensional-modeling-and-data-architecture-f6/star-schema-vs-all-in-one-table-t812.htm#3481.

I would suggest to build an Operational Data Store (ODS) first, to have a normalise database for the purpose of data quality, and then ETLed the data into the Dimensional Data Store (DDS) to load properly designed facts and dimensions, especially with proper SCD process in place, to keep all the history data. It sounds a lot of overhead, but it's really just about adding another data layer to have a separate focus, ensuring data quality by means of 3rd normal form and all the nice RDMBS features.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: 1 to 1 Text Data in a Fact Table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum