Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

2 posters

Go down

Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ... Empty Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

Post  kminboston Fri Nov 13, 2009 5:44 pm

Hi All -

This inquiry could actually be two questions, but the requirement is really presentation of row-wise high volume transactional data in a column-wise (columnar) FACT table.

Here is a sample transactional data set:

Interval Date TimeUnit IDChannelValue
7/2/2009 12:00:00.000000 AM 24781143.75
7/2/2009 12:00:00.000000 AM 24781352.17
7/2/2009 12:05:00.000000 AM 24781152.35
7/2/2009 12:05:00.000000 AM 24781351.98
7/2/2009 12:10:00.000000 AM 24781151.96
7/2/2009 12:10:00.000000 AM 24781356.78
7/2/2009 12:15:00.000000 AM 24781159.77
7/2/2009 12:15:00.000000 AM 24781362.75
7/2/2009 12:20:00.000000 AM 24781165.44
7/2/2009 12:20:00.000000 AM 24781365.53
7/2/2009 12:25:00.000000 AM 24781165.75
7/2/2009 12:25:00.000000 AM 24781364.78
7/2/2009 012:30:00.000000 AM 24781165.76
7/2/2009 012:30:00.000000 AM 24781365.76


Here are some data requirements:
1. Pivot Interval Date Time into columns, each column at a 5-minute increment for up to a day (288 columns); DIM levels in row;
2. Pivot Channel data into columns, Interval data in rows; Consider up to 47 measure columns;


1. Here is a sample of columnar interval dates with channel '1' values (requirement #1):
7/02 12:00am 7/02 12:05am 7/02 12:10am 7/02 12:15am7/02 12:20am 7/02 12:25am 7/02 12:30am
Level Value 143.75 52.35 51.96 59.77 65.44 65.75 65.76
Level Value 2 99.99 ... ... ... ... ... ...

2. Here is a sample of columnar measures with channels, interval date time as rows (requirement #2):
Channel 1 Amt Channel 2 Amt Channel 3 Amt Channel 4 AmtChannel 5 Amt Channel 6 Amt ...
7/02 12:00am43.75 52.35 51.96 59.77 65.44 65.75 65.76
7/02 12:05am 99.99 ... ... ... ... ... ...

With regard to guidelines, I am trying to understand design thresholds with respect to FACT tables supported within a dimensional model (happens to be Oracle 11g RDBMS) - how many FACT measures = 'too much', etc. I understand my alternatives for columnar store DBs / appliances, had to ask this question.

If there are any high-volume gurus out there that have modeled for heavy pivot data and understand these thresholds, I would definitely owe you one! Any referenceable periodical would certainly help too (yes, I've read all of Dr. Kimball's work!).

Take care.

kminboston

Posts : 2
Join date : 2009-11-13

Back to top Go down

Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ... Empty Re: Guidelines for FACT Table Design for High-Transaction Volume and High # of Measures ...

Post  ngalemmo Fri Nov 13, 2009 6:15 pm

The right way to do this is in steps...

The first thing is an atomic fact table for the measures being collected. It would essentially mimic the transaction feed with a date dimension, time dimension, unit dimension, channel dimension and value measure.

The date dimension would be by day and the time dimension would be independent of day and at whatever level of granularity you need (for example, by minute). You may also consider storing the timestamp itself if you need the true time for other purposes.

The pivots, on the other hand, are aggregations of the facts. The question is, do you need to materialize these? Any decent BI tool will do the pivots and perform well with the base fact table as it is. The need to actually build aggregate tables will depend on service level requirements. But, if you have the atomic fact table, building aggregates is a simple task. If you are using Oracle, they have SQL extensions that will pivot the data for you. Building a table with 300 columns is no big deal from a database standpoint... but can be a problem from a BI tool/user interface point of view.

I don't get your comment: "I understand my alternatives for columnar store DBs / appliances, had to ask this question." ... These really don't make a significant difference from a modelling standpoint (other than physical considerations).

By the way, what is "high volume"? 288 data points/day/unit/channel for how many units and channels?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum