Calculating Fact table sizes and free form text

View previous topic View next topic Go down

Calculating Fact table sizes and free form text

Post  arfoot653@gmail.com on Fri Mar 09, 2012 3:56 pm

Iam looking at the fact table that i have designed and have a few questions.

1) Calculating size of fact table - i assume 4 bytes for integer, 2 for small int, 16 for timestamp, 8 for float measures and add all the sizes up, does that sound right?
2) There is a free form text field varchar(100) that somebody wants to include in the fact table. Argument is, if its not included in the fact table, anyway we need to join to a dimension which will be 1:1 to the fact table and this column will be used often. This is free form text, no way to dimensionalize it.
3) Kimball toolkit says include timestamps in fact table. For Accumulating snapshots with 9-10 date roles, this really increases fact table size.

Any comments?

Thanks,
Arun.

arfoot653@gmail.com

Posts : 9
Join date : 2012-02-01

View user profile

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  ngalemmo on Fri Mar 09, 2012 5:00 pm

It really depends. Consult the documentation that came with your DBMS for data type sizes. Also, many compress things and there is row overhead to consider. It is far easier to load the data then check the sizes. Also, indexes make up such a large portion of the total space used on most DBMS systems that raw data size is almost meaningless. At best you are going to get a ballpark number that you then multiply by 3 or 4 to make up for indexing, transaction logging and other overhead. So, precision isn't really an issue... just make sure the number is pessimistic if you are using it for disk procurement.

As far as free form text goes, it also depends. Some DBMS systems will allow you to place CLOB type data in a separate 'table' with a DB maintained link to the rest of the row. The problem with physically including it in the same row and table is it bloats the table and slows down queries for everybody, not just the ones needing the text. By separating it, only those using the text take the performance hit.

What Kimball means is that if time is important, it should not be part of the date dimension, but rather stored as is in the fact. You would still have FKs to the date dimension. The question is, is time really necessary for all the 9-10 date roles you mention? If there are attributes related to time, you usually have a separate time dimension (i.e. just 24 hours worth) at some level of precision (usually minutes is enough).
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  Vishy on Sat Mar 10, 2012 3:31 am

If your requriement requires accumlated fact then you need to use accumlated fact,there is no hiding.
But you can do one thing is to put date_key there in fact instead of dates ( if you required dates and not very precise timestamps).

ngalemmo -
The problem with physically including it in the same row and table is it bloats the table and slows down queries for everybody

what do you mean by bloating??

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  BoxesAndLines on Sat Mar 10, 2012 8:57 pm

A bloated feeling is what you feel after you eat too much food. A bloated fact table has lots of textual columns which ultimately decrease the performance of the fact table since it takes more bytes to retrieve one row of data. The astute will note that DBMS' don't fetch a row of data, they fectch a block of data. The bigger the row, the less rows retrived per block.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  John Simon on Sun Mar 11, 2012 12:38 am

arfoot653@gmail.com wrote:Iam looking at the fact table that i have designed and have a few questions.

1) Calculating size of fact table - i assume 4 bytes for integer, 2 for small int, 16 for timestamp, 8 for float measures and add all the sizes up, does that sound right?
2) There is a free form text field varchar(100) that somebody wants to include in the fact table. Argument is, if its not included in the fact table, anyway we need to join to a dimension which will be 1:1 to the fact table and this column will be used often. This is free form text, no way to dimensionalize it.
3) Kimball toolkit says include timestamps in fact table. For Accumulating snapshots with 9-10 date roles, this really increases fact table size.

Any comments?

Thanks,
Arun.
1. Sounds like you are using SQL Server as those numbers sound right to me.
2. Don't include it in the fact tavle. It doesn't matter if it will be used often - everytime I've had this kind of freeform text column users want to look at a only a small set of data, generally a hundred records or less. Any performance impact here will be minimal. How often will you be doing sums or other aggregates? This is where you will see a performance hit. See this post on my blog to see the actual impacts on performance:
http://jsimonbi.wordpress.com/2011/05/16/dimensional-modeling-worst-practices/
3. If you already store the timestamp in a transactional fact table, I wouldn't bother putting it in an Accumulating Snapshot unless there was a clear business requirement for it.



John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  Vishy on Mon Mar 12, 2012 4:23 am

John,
I read your article and it is good, but sometime we need to have degenerate dimensions which are of char type, what do you think of that?? Is there any criteria like we should store char type degenerated dimension which are upto 2-3 chars length, anything more then this should take a proper dim shape.

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  John Simon on Mon Mar 12, 2012 5:46 am

Vishy wrote:John,
I read your article and it is good, but sometime we need to have degenerate dimensions which are of char type, what do you think of that?? Is there any criteria like we should store char type degenerated dimension which are upto 2-3 chars length, anything more then this should take a proper dim shape.

That's fine Vishy. If we need to have multiple degenerate dimension columns, then in the pastI have put them in a separate dimension. But if you only have one or two small ones, I don't think you'll notice much of a performance impact.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Calculating Fact table sizes and free form text

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum