Centipede Ok in this situation?

View previous topic View next topic Go down

Centipede Ok in this situation?

Post  makeitso on Tue Apr 12, 2011 2:21 pm

Hi,

I am attempting to model Public Safety type incidents (span many years up thru current incidents) and have run into the "centipede" design issue. Incidents have about 30 "codes" that can have from 2 to 100 distinct values. I looked at using a junk dimension, but there are just too many possible combinations. Should I build a fact table (incident) with over 30 dimensions, one for each of the codes?

For example some of the codes associated with each incident are:
Heat source code
Area of origin code
Cause of ignition code
etc.

The fact table would then have a HeatSourceId which links to the HeatSourceType table where both the HeatSourceCode and HeatSourceDescription, etc. would be found (?)

Thanks for any help on this...

makeitso

Posts : 3
Join date : 2011-04-12

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  caderoux on Tue Apr 12, 2011 4:02 pm

You could always use several junk dimensions (i.e. a compromise between one junk dimension and 30 dimensions). If you have 30 dimension attributes with 100 values each, then it is 10^60 (i.e. 100^30) combinations in a single dimension, but if you break that up into 3 dimensions, each with 10 attributes with 100 values, then it is 10^20 (100^10) in each - go to 6 and it's 10^10 (100^5).

And there many be combinations which never occur, which can severely cut down on the actual storage in any given junk dimension. Choice of which columns to go in which junk dimension would then also be important - and of course, the closer you start to ascribing any meaning to a particular dimension based upon what you've chosen to go in there, the less it looks like a junk dimension...

caderoux

Posts : 8
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  makeitso on Tue Apr 12, 2011 4:39 pm

Thanks caderoux, that may be what I do. Here's something else I thought of doing....

Since there is not going to be any other information about these "codes" other than their description, I could just make both the "code" and its description attributes of the main incident fact table. Would this be a limit to BI tools when it comes to querying on these attributes from the fact table rather than having them in dimensions? I plan on using MS Analysis Services as the front end slice and dice tool.


makeitso

Posts : 3
Join date : 2011-04-12

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  John Simon on Tue Apr 12, 2011 7:19 pm

Do not include these attributes in your fact table - it will adversely affect performance for both SQL and SSAS.
You will also need to create separate dimensions for these attributes anyway, so save yourself the heart-ache and do it the right way from the begininning.
Caderoux's advice sounds appropriate to me.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Centipede Ok in this situation?

Post  hang on Tue Apr 12, 2011 7:59 pm

makeitso wrote:I could just make both the "code" and its description attributes of the main incident fact table.
I would not do that, simply because it’s not a good practice. although SSAS can be configured to use fact (degenerate) dimensions. It seems to me that all your “codes” are of low cardinality which are unnecessary waste of storage due to high data redundancy on the textual attributes in the fact table. You will eventually pay the price for bad performance of your fact table.

I agree with Caderous and John. Combine low cardinality (<10) attributes into 1 or several junk dimensions and put slightly high cardinality (>10) dimensions into separate dimensions and have FKs in the fact table. Only put attributes as degenerate dimension in the fact table that are off very high cardinality comparable to the level of fact itself.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  makeitso on Wed Apr 13, 2011 12:10 pm

Thanks for the recommendations John and hang. "Junk" dimensions seem to be the way to go.

makeitso

Posts : 3
Join date : 2011-04-12

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  BrianJarrett on Sat Apr 16, 2011 2:25 am

If the only attribute about these codes is their description, couldn't you push them all into the same dimension table, then alias them for each reference in the fact table? So you'd first mash all the codes into a single dimension, including the type. Let's call it CODE_DIM.

CODE_DIM_KEY | CODE_TYPE | CODE_ID | DESCRIPTION
================================
1 | HEAT_SOURCE | A | some a descr
2 | HEAT_SOURCE | B | some b descr
3 | HEAT_SOURCE | C | some c descr
4 | AREA_OF_ORIGIN | X | some x descr
5 | AREA_OF_ORIGIN | y | some y descr
...and so on.

(Your natural key on this table is CODE_TYPE and CODE_ID.)

Your fact table will have 30 discreet code dim key fields, named differently for each type of code. Then alias your code dim table in your queries for each fact table surrogate key:

Code:
SELECT
    HEAT_SOURCE_CODE_DIM.CODE_ID AS HEAT_SOURCE_CODE_ID
    ,HEAT_SOURCE_CODE_DIM.DESCRIPTION AS HEAT_SOURCE_CODE_DESCRIPTION
    ,AREA_OF_ORIGIN_DIM.DESCRIPTION AS AREA_OF_ORIGIN_DESCRIPTION
    ,FACT.QTY
FROM
    YOUR_FACT_TABLE FACT
    INNER JOIN CODE_DIM HEAT_SOURCE_CODE_DIM ON
        HEAT_SOURCE_CODE_DIM.CODE_DIM_KEY = FACT.HEAT_SOURCE_CODE_DIM_PK
    INNER JOIN CODE_DIM AREA_OF_ORIGIN_CODE_DIM ON
        AREA_OF_ORIGIN_DIM.CODE_DIM_PK = FACT.AREA_OF_ORIGIN_CODE_DIM_PK
WHERE
    ...

Then do this aliasing of CODE_DIM for however many different codes you have stored in your fact table. This avoids having to physically build and maintain 30 different code dimension tables. If you have a tool like Business Objects you can create these aliased tables as actual objects in the semantic layer, then it'll build the queries properly for you without having to write the SQL again. We do this where I work now and it works great.

Also where I work we have a fact table with almost 70 surrogate keys. It performs fine, but the reality is that we're not joining to every single surrogate key field in every single query. Centipede avoidance is a good rule of thumb, but in reality it's doable if necessary.

Hope this helps.
avatar
BrianJarrett

Posts : 61
Join date : 2009-02-03
Age : 43
Location : St. Louis, MO

View user profile

Back to top Go down

Re: Centipede Ok in this situation?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum