case and client. One dimension or two?

View previous topic View next topic Go down

case and client. One dimension or two?

Post  Tim Webber on Thu Feb 21, 2013 3:17 pm

I am working on a public sector "case management" project. A case is open for a period of time (year, maybe two) and receive services. The Fact we are building is like an accumulating snapshot though we only need to store current "state" of the case which makes it easy(ish). My question is, we are "seeing" a case dimension that has lots of attributes like case status and a whole bunch of flags that would seem to belong in a case dimension and we also have lots of person/client oriented attributes like birth date, gender, race and a whole bunch of other person/client flags.

From the operational data, I can see that even though there should normally be only one case per person/client, there are "some" that have two. This is OK with the business and can happen. My question is this....Given that most of the cases are associated with only one person/client and that these two entities seem to be used somewhat synonymously in the business talk, can I merge them into one dimension? Note that at this point, there is not plan to slowly change on any of the attributes that would be contained in either either dimension. The total number of attributes in a combined dimension would be between about 60. Split apart, it would be about a 25/35 split

What are the key drivers to making a decision like this?

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  Mike Honey on Thu Feb 21, 2013 11:04 pm

Hi Tim,

I'd go with two dimensions and present the data warehouse as an opportunity to consolidate the client data and represent the true relationships - added value that would be impractical from the source application.

Once you have delivered a solution like this the business decision makers will be happy that someone has finally got that relationship right, and your unique transformation of the data will make your solution indispensable. In your scenario the business value might be understanding the "high touch" clients with many cases, currently lost in the noise.

Alternatively you could continue to propagate the source application designers or users mistakes and present the same skewed results as everyone else. This probably makes your solution easy to do without.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: case and client. One dimension or two?

Post  Tim Webber on Fri Feb 22, 2013 12:51 am

Thanks Mike. I split them apart in my logical model this afternoon wondering what response I might get and what the advice might be. There's another nuance I would like discuss if there's interest. Many of the "attributes" that the business would like to see are really "data quality" type flags. Let me give you an example. A case may have plans, goals, activities, order and other types of related. The business wants a bunch of flags that describe whether or not a number records/related entities exist or not. The are all Yes/No type of flags on the fact table. These could go into a "case" dimension but I shudder to put them there because they are really derived based on the presence or absence of other entities related to the "case". How about stuffing these things into a junk dimension? There's about 10 of them which would be 1024 row if my math is correct.

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  BoxesAndLines on Fri Feb 22, 2013 10:28 am

Sounds like a great use of a junk dimension.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

performance testing

Post  Tim Webber on Mon Mar 18, 2013 1:42 am

I have a few more junk dimensions to build that are sets of similar Y/N columns. One of my colleagues feels a junk dimension is unnecessary - he would like to see "all" Y/N flags be attributes on the fact table arguing that performance using a DIM will suffer. I argued that this a junk dimension is better because it keeps the fact table clean and small, gives a far better user experience when browsing the dimensions (ie this junk dimension is 1000 rows and the fact will grow to 10 million in a few years) and it gives flexibility to add columns with relative ease (harder) in fact tables. I was not sure about performance so I tested it out in our Oracle 11g DW with a 10 million row fact table. What I found was that it took about 1 second to get a 100,000 row query back using just one join from the fact table to the junk dimension. This was using Oracle's star transformation and bit map indexes on the keys. I tried normal indexes and they were slower.

Next I queried a bit map indexed Y/N attribute on the fact table. The query results were about .05 seconds. Pretty big improvement in query results. Now if you start selecting more flags and reducing the size of the result set returned the fact table the junk dimension joined query started to become as efficient as the Y/N attribute on fact table query.

My question is this. I think my colleague has a point given my test results. Is there anything else I can consider in terms of performance? I can certainly argue the above points and also that it will take the end user 2-3 seconds to get build a lookup/filter using attributes on the fact table which is way slower than a filter/lookup built from my 1000 row junk dim. Is there anything else I am missing? Any other relevant points to consider? Thanks!

Just want to add that I also tried the same two queries as above but adding in a another dimension and filter on that dimension. The query using the junk dimension took 30 seconds while the query using the Y/N attribute on the fact took 18 seconds. Junk dimension query is quite a bit slower.

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  BoxesAndLines on Mon Mar 18, 2013 9:51 am

So you want to create a bit map index on each indicator on the fact table? That will clearly impact load times.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  Tim Webber on Mon Mar 18, 2013 12:38 pm

I am not sure what we 'want' to do but attributes on the fact is one option and put them in Junk dimensions is the other option! ETL load times are a "factor" but so far we have not had too many issues given our data volumes. We have about 30-40 of these flags. Bitmap indexing them on a 10million row table takes about 10 seconds each. 400 seconds is not alot of time if this is the "right" thing to do.


Last edited by Tim Webber on Mon Mar 18, 2013 4:09 pm; edited 1 time in total (Reason for editing : deleted an analysis that didn't make alot of sense :).)

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  BoxesAndLines on Mon Mar 18, 2013 6:54 pm

A lot of the times, indicator columns are really impersonating a metric. That sounds like the case here as well. I would change the datatype from char(1) to smallint to facilitate easy summing. As many have mentioned, disk access is your biggest hit when accessing data. Oracle retrieves the whole row of data regardless of the number of columns specified. Over time, performance will degrade as the fact rows continue to get longer and more data are added. In your case, at 10M rows, you are still better off putting the indicators in the fact. If 10M rows is years away from reality, put the columns in the fact.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  Tim Webber on Mon Mar 18, 2013 7:07 pm

Interesting. 10 million rows is years away. It sounds like you are suggesting putting the indicator/flag into the fact table:
1) as a char if 10 million rows is years away (and it is....) or
2) as a number (dont think oracle has all the types sql server has...) if the table 10 million and growing fast

Sound about right (ie no junk dim)? I tend to agree with attributes on the fact table given the performance hit you take putting them into a dimension. My dilemma "was" that Kimball is pretty clear about his preference for these type of objects in a junk dim. Last comment is about an "indicator columns are really impersonating a metric". This is often true in my case. However, sometimes users will set multiple flags at the same time to get a count --> ie: where attribute_A = 'Y' and attribute_B = 'Y' and attribute_C = 'Y' ect....

I think option 1) above is sounding good right about now.

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  BoxesAndLines on Mon Mar 18, 2013 7:55 pm

My preference is the same as yours. Put the columns in the junk dimension. If the junk dimension fulfills the non-functional performance requirements, I'd go with that. If not, smallint columns with a 1 or 0 in the fact isn't the worst compromise you'll have to make as a dimensional modeler. :-)

Most of the data I'm working with now has billions of rows, so I'm sure the junk dim would be a better approach in this case.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  Tim Webber on Mon Mar 18, 2013 8:08 pm

Thx for working through this with me.

ps. just got off a call center project DW where the model was much clearer but had billions or rows (and they dont have partitioning...yet). Performance was always raising its ugly head and benchmarking and troubleshooting data quality problems was painful (queries take so long...). Nice to have low data volumes for a change.

Tim Webber

Posts : 11
Join date : 2013-02-01

View user profile

Back to top Go down

Re: case and client. One dimension or two?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum