Periodic snapshot: insert when not changed?

View previous topic View next topic Go down

Periodic snapshot: insert when not changed?

Post  jochem_van_grondelle on Thu Oct 15, 2009 10:25 am

Hi there,

Just a little Kimball theory question. This is about my earlier described student dwh, but to explain it easier I'll use the inventory chapter of the DWH Toolkit.

If we would use a periodic snapshot fact table which will store the amount of inventory each week per product per warehouse, then there would be 10000 products and 5000 warehouses. If I understand the book well, every week there would be inserted 10000x5000 rows with inventory facts, together with a reference date.

Another option would be about the same, but we only insert all records that have been changed. So there would be an initial load with the same 10000 x 5000, but after that every week only all products would be added of which something have been changed in inventory records. Together with a date_valid_from and date_valid_to (which could be n/a).

I think in the second option, querying would be a bit more complicated but not too complicated. If necessarry we could always build a view in which we can simulate the first option.

So would it be better to always insert all products/warehouses or only insert the updated ones?

Thank you,

Jochem

jochem_van_grondelle

Posts : 11
Join date : 2009-09-22

View user profile

Back to top Go down

Re: Periodic snapshot: insert when not changed?

Post  ngalemmo on Thu Oct 15, 2009 12:06 pm

For inventory it would be a bad thing, but for something like school/class enrollment, maintaining time based incrementals is not so bad.

The problem with time based incremental snapshots is, to get a total count you need to sum from the last full snapshot up to the point in time you are interested in. For something like inventory, a lot of time may pass since the last full snapshot, making queries run long as well as requiring potentially complex selection logic to get the right count. But, in your case, I would assume you would have semester as one of the dimensions to the snapshot, so it is fairly easy to control when the first (and probably only) full snapshot occurs and the scope of the incremental changes.

Its going to depend on the planned query environment and how much you can hide from the end user. Full periodic snapshots are dead simple to use, whereas incremental snapshots require a subquery to construct the particular point in time desired. If the BI tool you are using has a robust meta layer that makes the difference transparent, go ahead.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum