How to handle situations where the data is deleted from the source system?

View previous topic View next topic Go down

How to handle situations where the data is deleted from the source system?

Post  larus on Tue Nov 13, 2012 2:27 pm

How to handle situations where the data is deleted from the source system? How can you check for this deleted data? Should this data also be deleted (or marked as deleted) from the DW?

larus

Posts : 5
Join date : 2011-03-01

View user profile

Back to top Go down

Re: How to handle situations where the data is deleted from the source system?

Post  ngalemmo on Tue Nov 13, 2012 2:40 pm

Detecting deletes in a source is a challenge. Unless there is an explicit transaction to work from (or DB logs), the only way to detect them would be to match/merge the source population against the DW population.

On the DW side, you normally flag them as 'deleted'. Physically deleting a row, particularly a dimension, will mess up relational integrity with the facts.

Usually, this type of process need only be run periodically, such as weekly or monthly, depending on the need to know. You don't need much source information other than the natural keys of current members. If the need to know is more immediate, you may consider implementing triggers on the source tables of interest to write key information to a log table when a delete occurs, then drive the DW update process using the log.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to handle situations where the data is deleted from the source system?

Post  Mike Honey on Wed Nov 14, 2012 3:33 am

Hi larus,
I assume you are trying to maintain a Dimension table? I usually handle this requirement by a kind of "reverse lookup" step in the ETL.

In the course of delivering the current data from the source system, I cache the set of source system business keys (e.g. in an SSIS Lookup cache). Then I add a post-step that reads all the Dimension table "current" business keys and compares them to the cache. Any rows that dont match are candidates to be "deleted". If you don't use SSIS I guess you could use a Staging table for this.

The "delete" would actually be to turn off a "Row Is Current" flag and close an Effective Date range.

HTH
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: How to handle situations where the data is deleted from the source system?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum