dimension model for many indicators field

View previous topic View next topic Go down

dimension model for many indicators field

Post  j on Sat May 15, 2010 1:58 am

I have 30+ fileds that has either 1 or 0 ( true or false)value in a fact table

Patid -- -- --- PTInd_1 Ptind_2 Ptind_3........ PTind_30 -- etc

1 0 1 1



I need to be able to create a dimension using thse attributes( patind1 ~ 30)..what is the best way to design the dimensional model for these fileds? there could be new indcator filed in the future...
so it has to be easy to add new attribute ...

junk dimension can have too many rows 2^30... and it will be hard to maintain if there are new fields need to be added right?




any suggestions would be appreciated

j

Posts : 6
Join date : 2010-05-15

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  BoxesAndLines on Sat May 15, 2010 10:40 am

How many combinations do you actually have in the data? The junk dimension works since multiple facts all have the same combination of values.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  j on Sat May 15, 2010 2:44 pm

thansk for the reply..
The problem is if we add a new indicator filed,.let's say patind_31.. we have to updte all the fact records.. or not.. how will the junk dimension work in case new columns got added to a fact table and need to add them to a junk dimension ?

right now.. it has 30 fileds.. so it can be multi millions combination.. what is the best way to do this?

j

Posts : 6
Join date : 2010-05-15

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  BoxesAndLines on Sat May 15, 2010 7:14 pm

I'll ask again, how many actual combinations do you have in the source data? Potential and reality are what make junk dimensions a viable solution.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  hang on Sat May 15, 2010 9:00 pm

Storing 31 attributes in a single junk dimension can reduce 31 fields to only one surrogate key in your fact table assuming most of the combinations are highly repeated. You can use SELECT DISTINCT field1, ..., field31 from the fact to figure out the number. If it is a manageable number (say thousands, definitely not millions), and the trend for future facts is clear, then you can incrementally build your junk dimension from the fact. You cannot pre-build the junk dimension by cartesian join in this case as it would exceed 1 billion records which is absolutely not a go for a dimension. Don't forget you might have to cater for null value as well.

Junk dimension is a very effective and elegant dimensional modelling technique when dealing with small number (say less than 10) of low cardinality (say less than 5) attributes in the fact. It avoid having duplicated character fields in fact table if the fields are not classified as degenerate dimensions, so that the fact table becomes normalised.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  ngalemmo on Mon May 17, 2010 11:50 am

If you are really worried about the number of combinations, do some correlation analysis on the data.

As B&L pointed out, most of the time, the number of actual combinations are far less than all possible combinations. But, 30+ attributes may be pushing it (even if they are only yes/no fields). The way around it is to identify attributes that correlate well and group them into smaller junk dimensions.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: dimension model for many indicators field

Post  j on Mon May 17, 2010 12:37 pm

"The way around it is to identify attributes that correlate well and group them into smaller junk dimensions"
thanks.. can you give me some data example?

Alos, If a new column get added in junk dimension, do I have to update all the fact record? what will happened? and how can I handle if there are new columns got added in junk dimension?

j

Posts : 6
Join date : 2010-05-15

View user profile

Back to top Go down

Re: dimension model for many indicators field

Post  ngalemmo on Mon May 17, 2010 2:01 pm

For example, you have a demographics junk dimension containing level of education and salary. These two would have a strong correlation because, most of the time, those with higher education levels have higher salaries. While something like zip code and level of education would have a low correlation, since for most zip codes one could expect a mix of education levels. Whereas, zip code and city and state have a very high correlation. So, you would take these correlations into account when combining attributes into a junk dimension. You may put city, state and zip in one junk dimension and education, salary level into another.

As far as new columns go, junk dimensions are essentially point-in-time. If you need to retroactivly assign values historically, you would need to rebuild the dimesion and rebuild the facts. Not a task one should take lightly. However, it is usually the case that such new attributes do not have historical significance. In which case it is simply a matter of adding the column and assign appropriate keys moving forward. Adding such an attribute to a junk dimension, however, will impact the overall size of the dimension table.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: dimension model for many indicators field

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum