Daily Snapshot or SCD 2

View previous topic View next topic Go down

Daily Snapshot or SCD 2

Post  leon.panokarren on Thu May 02, 2013 1:59 pm

Hello,

I am trying to understand the different considerations one should look at to determine if a particular process should be modeled as a type 2 SCD or a daily snapshot. Here's what I looked at so far, but if you could add/correct this list, I would truly appreciate it -

1) Frequency of changes to the attributes or measures in scope - if these are going to change daily, daily snapshot makes sense? if these are going to change much less frequently, SCD 2 would result in much more compact table?
2) User guidance - with a daily snapshot, user would run equality joins between the date dimension and the fact table; with the SCD 2, inequality joins (argument date/s in the range) - not much more complicated
3) User queries - if there is trend/time series analysis being performed for almost all/large percentage of the history for the process, then the daily snapshot (since the user queries end up requiring a large percentage of the snapshots) - if not, SCD 2?


All this assumes that the SCD 2's change tracking is on all of the attributes in the fact table's grain.

ANYTHING ELSE?????

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  BoxesAndLines on Thu May 02, 2013 2:11 pm

This isn't a question I ever ponder. A type 2 SCD is a dimension. A snapshot is a fact. If I need history for a dimension, then I make it type 2. If my metrics for my fact table are point in time and not additive (account balances, customer counts, etc), I use a snapshot fact.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Thu May 02, 2013 2:17 pm

Thanks boxesandlines!

BoxesAndLines wrote:This isn't a question I ever ponder. A type 2 SCD is a dimension. A snapshot is a fact. If I need history for a dimension, then I make it type 2. If my metrics for my fact table are point in time and not additive (account balances, customer counts, etc), I use a snapshot fact.

Modeling the non-additive measure as a fact does not preclude a user from adding it up. User guidance is still required, no? Also, if the changes in the measure and the context around it are relatively in frequent, what additional information does the snapshot fact provide over the SCD 2? beside the ready physical records? That is really my question.

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Thu May 02, 2013 5:14 pm

anybody????

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  cjrinpdx on Thu May 02, 2013 6:38 pm

I think you are mixing two separate concepts. Let me attempt to perform a Jedi mind meld. Yes, non-additive facts should be snapshotted. Are you proposing that you update the facts? If you have a bank account fact table and the grain is one record per account per snapshot (month end) you would have...

CustKey AccountInfoKey Balance SnapshotDateKey
4 9 500 03313013
4 9 300 04303013

Are you thinking that you would have one row per account and just update the balance each month? If you have a SCDT2 on the address attributes of the customer dimension, and the customer moves in April, then you will have their accurate address for the 03313013 and 04303013 snapshot. Are you thinking of modeling the facts as a SCDT2? That is essentially a periodic snapshot fact table. Hope this helps.


Last edited by cjrinpdx on Thu May 02, 2013 6:38 pm; edited 1 time in total (Reason for editing : type)

cjrinpdx

Posts : 51
Join date : 2011-07-14
Location : Portland, OR

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Thu May 02, 2013 7:13 pm

cjrinpdx wrote:Yes, non-additive facts SHOULD be snapshotted.
The SHOULD is what I am trying to reason - please bear with me. I recognize your guidance is completely voluntary and hence I truly appreciate the time!


cjrinpdx wrote:Are you proposing that you update the facts? If you have a bank account fact table and the grain is one record per account per snapshot (month end) you would have...

CustKey AccountInfoKey Balance SnapshotDateKey
4 9 500 03313013
4 9 300 04303013

Are you thinking that you would have one row per account and just update the balance each month?
Nope, I certainly understand that there would be a snapshot generated at the frequency defined by the grain. In the case I am trying to model, I am actually seeing the process modeled as a daily snapshot.

cjrinpdx wrote:If you have a SCDT2 on the address attributes of the customer dimension, and the customer moves in April, then you will have their accurate address for the 03313013 and 04303013 snapshot. Are you thinking of modeling the facts as a SCDT2? That is essentially a periodic snapshot fact table. Hope this helps.
That is ALMOST a periodic snapshot except for that a periodic snapshot -
1) is much bigger than the corresponding SCD 2 (in your example, may be not so much since the account balance will change daily, in my case, its the inventory of hardware, so it changes much less frequently)
2) cant easily answer when a specific attribute change occurred
3) wont show changes in measure/attributes between snapshots (unless you set the snapshot frequencies to such a low level that you can now capture most measure changes as well - ONLY at the expense of a lot of redundant snapshots)

while a SCD-2 with change tracking on all of the attributes that could be reported by the snapshot's grain
1) creates versions only when a relevant change occurs
2) still allows for any roll ups that the corresponding snapshot would allow

The ONLY difference I see is the nature of the joins from the date dimension (inequality joins forthe SCD 2 v/s equality joins for the snapshot fact ). Depending on how the daily snapshot fact is physically organized, even this may not be an advantage in favor of the periodic snapshot fact (as date id index scans will still be involved - although, the fact is probably partitioned by snapshot date).

I am trying to determine - ARE THERE ANY QUESTIONS that an SCD 2 cant (as effectively) answer that a periodic snapshot can?


Using an example for the case at hand, what questions does this SCD 2

Asset Data Center Avl Count Effective End Eff
Power Sply1 CHICAGO 2 01012013 01072013
Power Sply1 CHICAGO 6 01082013 12312999

not (as effectively) answer as this periodic snapshot fact

Asset Data Center Avl Count Snapshot date
Power Sply1 CHICAGO 2 01012013
Power Sply1 CHICAGO 2 01022013
Power Sply1 CHICAGO 2 01032013
Power Sply1 CHICAGO 2 01042013
Power Sply1 CHICAGO 2 01052013
Power Sply1 CHICAGO 2 01062013
Power Sply1 CHICAGO 2 01072013
Power Sply1 CHICAGO 6 01082013


Thoughts?

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  BoxesAndLines on Thu May 02, 2013 9:54 pm

I think I got it. The answer is a type 2 scd doesn't have any metrics. So unless you are proposing creating one big table with dimensions and facts you can't and shouldn't model your metrics in your dimensions.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Thu May 02, 2013 10:03 pm

Boxesandlines,

I was looking to model the SCD 2 (much like the periodic snapshot) to have only the dimensional SKs and not all the attributes of all the participating dimensions -

so
periodic snapshot = (all dimensional SKs + measures + snapshot date)
v/s
SCD 2 = (all dimensional SKs + measures + effective date + end effective date) -- where [dimensional SKs, measures] are the attributes to be tracked for changes

I understand measures in the scd 2 dimension table may sound heretical - but taking terminology (& therefore dogma) out of the equation, I am not getting any rational argument why this is any worse than a daily snapshot for a given process

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  BoxesAndLines on Thu May 02, 2013 10:30 pm

Heretically speaking, check out design tip #13. Tip #13
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  cjrinpdx on Fri May 03, 2013 11:44 am

I think it's valid to model your facts this way, just don't call it a SCD. Checkout design tip #145 by Joy about time stamped accumulating snapshots.

cjrinpdx

Posts : 51
Join date : 2011-07-14
Location : Portland, OR

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Fri May 03, 2013 12:34 pm

Thanks cjrinpdx! I recognize unconventional designs can throw some off .. but so do unconventional requirements!

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  leon.panokarren on Fri May 03, 2013 5:57 pm

This is the material I was looking for (Thanks to my colleague Yosuke) -

Provide count of how many days a certain asset has been available in the last 90 days (e.g. 89/90 days available - 1 day downtime)

A) Daily snapshot fact solution
count(*).where status = active and date between (today-90days) and today ---- EASY

B) Pure SCD 2 solution
Select sum( case when end date >= today then today Else end date End - Case when start date <= today - 90 then today - 90 Else start date End) active_days
From scd2
Where status =active and today >= start date and today - 90 <= end date ---- NOT THE EASIEST

C) Hybrid SCD 2 solution -- this essentially materializes the SCD 2 back to a snapshot to answer the question - although this could easily be index scans
count(*)
from scd2 join d_date on d_date.full date between scd2.start date and scd2.end date
where d_date.full_date between today and today - 90 days and scd2.status = active
where status = active and date between (today-90days) and today ---- ALMOST as EASY as the periodic snapshot


Now consider this question

Average lifespan of the asset in active status Assume that asset can repeat statuses

A) Daily snapshot fact solution

Generate the SCD-2 from the daily snapshot and then run the SCD-2 solution below

select
asset,
status,
snapshot_date start_date,
nvl(lead(snapshot_date) over (partition by asset order by snapshot_date), HIGH_DATE ) end_date,
nvl(lead(status) over (partition by asset order by snapshot_date), 'DUMMY EOL STATUS') next_status
from
(
select
asset,
status,
snapshot_date
from
(
select
asset,
status,
nvl(lag(status) over (partition by asset order by snapshot_date), 'DUMMY SOL STATUS') as prev_status,
nvl(lead(status) over (partition by asset order by snapshot_date), 'DUMMY EOL STATUS') as next_status,
snapshot_date
from snap_fact
where asset = argument asset
)x
where asset not in (prev_status, next_status)
)
status = active and status != next_status

---- EASY.... SURE!!!!

Pure SCD 2 solution
Select sum(end date - start date + 1 )/count(*)
From scd2
Where status = active for the asset


Each design responds to different questions with significantly degree of efficiency. Thoughts??

leon.panokarren

Posts : 7
Join date : 2013-05-02

View user profile

Back to top Go down

Re: Daily Snapshot or SCD 2

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum