Dimension Size

View previous topic View next topic Go down

Dimension Size

Post  akjason on Tue Oct 09, 2012 1:20 pm

Hello, I have a question about modeling a dimension table to minimize its size. I will have a fact table of applications received. Each applicant applies once per year and must reapply again next year so they will get a new application record. How can I best model the Person dimension? It appears to me that this dimension is going to get very large and I thought dimensions were supposed to be small compared to the fact tables. We receive about 1M apps per year.

Thanks,
Jason

akjason

Posts : 11
Join date : 2012-10-08

View user profile

Back to top Go down

Re: Dimension Size

Post  BoxesAndLines on Tue Oct 09, 2012 2:05 pm

Person data could be independent of application data. They are different entities. If you want to keep the Person dimension smaller, model application data in a different dimension (or fact as this is really an event).
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Dimension Size

Post  akjason on Tue Oct 09, 2012 2:10 pm

Thanks for the feedback. I forgot to mention that the person information can change from application to application and the users want to track this. The address, phone number etc. So does it make sense to add these details to an Applications dimension (although now it would get large) or simply include these text details in the applications fact table?

akjason

Posts : 11
Join date : 2012-10-08

View user profile

Back to top Go down

Re: Dimension Size

Post  BoxesAndLines on Tue Oct 09, 2012 6:31 pm

We don't track textual data in the fact table. We track it in the dimension. You can either move the volatile dimension columns into a "mini dimension" to minimize the impact of the dimension updates, or break the dimension into logical parts, address dim, party dim, and possibly application dim (although this sounds like a one-to-one relationship with the fact table), or partition the person dim based on some column that nicely partitions the data according to most queries (year of application date?).
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Dimension Size

Post  akjason on Tue Oct 09, 2012 8:23 pm

I think I will need an applications dimension, but as you noted it will be a one-to-one with the application fact table which is why I was considering putting the data in the fact table. If all my attributes are going in the applications dimension, what can I put in the application fact table so I can have a count measure?

akjason

Posts : 11
Join date : 2012-10-08

View user profile

Back to top Go down

Re: Dimension Size

Post  BoxesAndLines on Wed Oct 10, 2012 2:52 pm

You put a numeric column with a value of 1 in your fact table.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Dimension Size

Post  akjason on Wed Oct 10, 2012 5:28 pm

I have been reading that it is not a good idea to have a 1 to 1 relationship between a fact table and a dimension table, but I also understand that I should not have text fields in my fact table. If both of these design goals are to be met, how can I model an application that has text data that I will need to allow the users to see in drill down and provide a count measure? Some of these text fields are things like address and phone number but, for the same applicant, they can change from application to application so I can't really put these in the person dimension or that dimension will get multiple records for every applicant, which would just make the person dimension very large like the application fact so I am back to square one. Any advice would be greatly appreciated.

akjason

Posts : 11
Join date : 2012-10-08

View user profile

Back to top Go down

Re: Dimension Size

Post  TheNJDevil on Thu Oct 11, 2012 9:55 am

Keeping text out of a fact table is the only guideline (of the 2 that were stated) that you should stick to. If a dimension ends up as a 1:1, then it ends up being 1:1.

I also believe you are overthinking that dimension of actually being 1:1. Yes, the potential for it to be 1:1 is there, but in reality, will EVERY customer change something EVERY time they apply? That is highly doubtful. Of my 750,000 customers, only about 25% of them change something within a year's timeframe. It won't be as 1:1 as you think.

TheNJDevil

Posts : 68
Join date : 2011-03-01

View user profile

Back to top Go down

Re: Dimension Size

Post  akjason on Thu Oct 11, 2012 12:52 pm

Thanks for the reply. The address and phone number were just two items on the application that might change, there are many more so maybe I will split the application dimension up like DimAppQuestions and DimAppPerson. I will have to experiment a bit and find the combination that makes the dims smaller.

akjason

Posts : 11
Join date : 2012-10-08

View user profile

Back to top Go down

Re: Dimension Size

Post  BoxesAndLines on Thu Oct 11, 2012 3:21 pm

Don't forget the junk dimension. It's great at consolidating a handful of low cardinality columns.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Dimension Size

Post  hang on Sat Oct 13, 2012 3:53 am

BoxesAndLines wrote:Person data could be independent of application data. They are different entities. If you want to keep the Person dimension smaller, model application data in a different dimension (or fact as this is really an event).
Along the line suggested by B&L, I guess application is a fact table, possibly a yearly periodic snapshot, maybe also an accumulating snapshot within a year partition with some application milestones. The person dimension can be type 2 with only very slowly changing attributes, so that the dimension may never grow alarmingly. I doubt address and phone number change would double the dimension size even after a few years. However if you do have some changing attributes affecting most dimension members, you may extract them out and relate them through yearly snapshot fact table so the change will be tracked by the year key.

Your application fact table is growing by the size of person dimension each year, and would be 10 times bigger in 10 years. So even the size of dimension is more than a million records, at least it is very static and will be no where near to the size of the fact table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimension Size

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum