Slow changing fact data with an effective date

View previous topic View next topic Go down

Slow changing fact data with an effective date

Post  jfallik on Wed May 23, 2012 4:10 pm

I have run into a run into a modeling issue and I am unsure of what the best approach is to address it.

I am modeling a portfolio management system. In my transactional system the most grainular piece of data is a taxlot. Each tax lot can have multiple closing dates when less then all of the shares are sold. So it is a traditional 1 to many relationship of taxlots to closed taxlots.

To model it dimensional I flattened out the relationship and use a -1 for the date_closed_id for taxlots that are still open. The example below shows a single tax lot of 60 shares opened on date 5480 and 4 shares were closed on 6207:

as_of_date_id security_id date_opened_id date_closed_id current_price_id closed_taxlot_number open shares
6250 4 5480 -1 33888 NULL 1 56
6250 4 5480 6207 33888 1 0 4

The following day an additional 4 shares were sold and the rows look like the following:

as_of_date_id security_id date_opened_id date_closed_id current_price_id closed_taxlot_number open shares
6252 4 5480 -1 33889 NULL 1 56
6252 4 5480 6207 33889 1 0 4
6252 4 5480 6251 33889 1 0 4

Now in the future there is a 2:1 split so the records look as follow:

as_of_date_id security_id date_opened_id date_closed_id current_price_id closed_taxlot_number open shares
6300 4 5480 -1 43001 NULL 1 112
6300 4 5480 6207 43001 1 0 4
6300 8 5480 6251 43001 1 0 8

My problem is the open taxlot is a slow changing fact if there is a split and if there is no split it will never change, but for every day I am creating a new record in my fact_taxlot table. I am storing a lot of redundent data by flattening the taxlots and closed taxlots. If the average portfolio has 15 securities and there are ~250 trading days a year and there are 5000 portfolios (and forecast to increase at 10% per year), that is 18,750,000 records per year in this one table. I have only included a subset of the columns in the table to demonstrate the problem, The fact table includes about a dozen aggregate calculations as well as additional foreign keys.

The system displays portfolio values as of any historical date, if there were no splits and the open taxlot information could not change I would make 2 tables the static open taxlots as a dimension and closed taxlots as the fact, but since splits can occur I am not sure how to model this parent child relationship so that accurate values can be displayed on an as of date basis, but also not store so much redundent data.

My test warehouse database is at roughly 100gb already running data for 8 1/2 years for 210 accounts with 13 securities in each account. I need a smaller foot print.

Any suggestions?

Thanks in advance,
-Jay
[code]

jfallik

Posts : 1
Join date : 2012-05-23

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum