Number of Columns in Fact Tables vs. Dimension Tables

View previous topic View next topic Go down

Number of Columns in Fact Tables vs. Dimension Tables

Post  mostafa_mahrous75 on Mon Dec 01, 2014 5:37 am

Hi There

Fact Table is assumed to have large number of records, that is why it should include only numeric columns such as Foreign Keys, Measures and Degenerate Dimensions (if any) in order to minimize IO for accessing such large tables, for character columns we should move it to one of the surrounding dimensions in Star Schema (ex. in a Junk Dimension)

On the other hand, Dimension Table should theoretically include all the attributes that describe the entity (numeric, alpha-numeric and character columns) resulting in a wide dimension table and therefore less number of records can be stored in a single data block which consequently increases IO for accessing such wide Dimension Tables. This issue is obvious with Monster Dimensions such as Customer or Account Dimensions

Mini-dimension technique is a good practice to enhance Dimension Table IO for low cardinality attributes

Now I'm thinking of high cardinality attributes such as address, notes/comments in addition to the other less important attributes, what about if we split it into another table that is 1-1 joined to the main Dimension? This way, the number of attributes in the main dimension is less resulting in more rows to be stored in a single data block, however we "snow flaking" the dimension which results in more complex model and low performance query (in case we need to any attributes from the "extra" dimension table

Appreciating your feedback to assess the suggested technique and propose any suitable technique to resolve that dilemma

mostafa_mahrous75

Posts : 8
Join date : 2011-11-20

View user profile

Back to top Go down

Re: Number of Columns in Fact Tables vs. Dimension Tables

Post  nick_white on Mon Dec 01, 2014 7:33 am

Hi,
you seem to be discussing a theoretical situation rather than an actual situation - if that is the case then I would stick to the standard Kimball best practice and not worry about theoretical situations until they actually occur. Splitting some attributes out of the main dimension might improve query performance - but if both approaches result in sub-second queries then why spend time looking at different approaches? Keeps things as simple as possible unless you know you have to make them more complex.
If you do have query performance issues then before changing your dimensional design I would look at all the more usual performance tuning techniques: has the hardware been spec'd appropriately; is the partitioning/indexing strategy correct; have appropriate aggregates been built; etc?

If you do need to change your dimensional model then I doubt there is any best practice or template you could follow - any solution is going to be (experience-driven) trial and error on your part. The reason for this is that performance is so dependent on data structures, data volumes, hardware, types of queries, volumes of queries,etc. that what works for one implementation won't necessarily work for another.
One person might split some attributes out of a main dimension and see a massive improvement while another person do something that appears to be similar but then sees worse performance.

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum