how to identify and remove faulty records from dimension and facts

View previous topic View next topic Go down

how to identify and remove faulty records from dimension and facts

Post  ramakrishna on Wed Sep 07, 2011 12:14 pm

There is a difference of records between source table and the dimension table. From that we found that this is either due to the record may be not present in source as it may be deleted in the source. As source tables are loaded with full load truncating the table and dimension table are loaded with incremental
load.So if there is a duplicate record in dimension table we can delete the record but how come we can update the sids either in dimension, even the same sid being using in fact, so how to handle the effect or update the sid in both dimensions and facts.And at last how to cleanse the faulty records.

Thanks
Rama

ramakrishna

Posts : 5
Join date : 2011-09-07
Age : 28
Location : chennai

View user profile

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ngalemmo on Wed Sep 07, 2011 4:07 pm

As source tables are loaded with full load truncating the table

I assume you are taking about staging the source data.

So if there is a duplicate record in dimension table we can delete the record

Why would you do that? Why not simply update the dimension row?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ramakrishna on Thu Sep 08, 2011 1:33 pm

why would you do that? Why not simply update the dimension row?
as i said that we are not getting the source and target counts of the dimension equal. so for that we found the records that are prsent in target but not in source, so we want to delete those records. so is it good to delete or how to solve the above thing to get equal counts.
thanks for ur reply

ramakrishna

Posts : 5
Join date : 2011-09-07
Age : 28
Location : chennai

View user profile

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ngalemmo on Thu Sep 08, 2011 2:49 pm

Deleting rows from the dimension table? You never delete rows from a dimension table (unless you are carefully purging old data from DW), otherwise you lose referential integrity with the facts.

So if there is a duplicate record in dimension table we can delete the record but how come we can update the sids either in dimension, even the same sid being using in fact, so how to handle the effect or update the sid in both dimensions and facts.

You avoid the problem you are trying to solve by not deleting dimension rows. SIDs don't change.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ramakrishna on Sat Sep 10, 2011 4:00 am

thanks for ur reply

ramakrishna

Posts : 5
Join date : 2011-09-07
Age : 28
Location : chennai

View user profile

Back to top Go down

An example

Post  Zoran Milokanović on Tue Mar 20, 2012 5:56 pm

Could you please be more precise?

Let’s say that I have a contact SCD2 dimension.

ID (surrogate)
Address
Valid_From
Valid_To
Is_Current_Flag

And I want to track record deletion as well. As I understood this can be done via additional attributes like Is_Deleted_Flag and if necessary Delete_Date.

So let’s say that I have (history) situation like this one:

ID Address Valid_From Valid_To Is_Current_Flag Is_Deleted_Flag Delete_Date
1 Palm Street 01/25/2012 01/27/2012 N N null
2 Baker Street 01/28/2012 01/01/3000 Y N null

Deletion on production is performed on 02/15/2012. Did I understand well that the second record will end up like

2 Baker Street 01/28/2012 01/01/3000 Y Y 02/15/2012

?

Thanks in advance,
Zoran

Zoran Milokanović

Posts : 3
Join date : 2012-03-20

View user profile

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ngalemmo on Tue Mar 20, 2012 9:35 pm

If you are going to have a delete flag, why not set the valid to date to the delete date, as the row would no longer be valid? You really don't need the extra column.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  Zoran Milokanović on Mon Apr 16, 2012 7:56 pm

ngalemmo wrote:If you are going to have a delete flag, why not set the valid to date to the delete date, as the row would no longer be valid? You really don't need the extra column.

Yeah, this is a really good question that I even asked myself. ;DD
First of all I accidently did a mistake posting a question here instead of inside topic “Question on deleting records…”

http://forum.kimballgroup.com/t1181-question-on-deleting-records-from-dimension-tables

Anyway, the thing is that I didn’t find in any of Kimball books that I have at home how to deal with (source) record deletion when having a dimension with valid from/to attributes (the one that have at least one SCD 2 attribute).
With SCD 1 dimension (the one with all SCD 1 attributes) I found few suggestions that include additional attribute to mark record (in)activity. And If I want to know exact dateTime when record became inactive or deleted that would mean one more attribute as well (Delete_Date).

With, let’s call it, SCD 2 dimension I found zero suggestions and after reading the conversation from link above I made a wrong conclusion and asked a question in order to clarify it. So, my question would be… How to handle it in SCD2 dimensions?
Common sense says exactly what you suggested….that Valid_To has to be set to delete dateTime. I guess I would need attribute Is_Deleted_Flag to simply distinguish those records that are regularly closed with SCD2 attribute change from those that are deleted on the source side. Maybe attribute Change_Reason is also something good to have but let’s say that is less important. A small mystery could be what to do with Is_Current_Flag in case when record is deleted. I would say that it has to be set to value N and three reasons come up on my mind why. First, current records are those that I could currently find on the source and the one deleted I cannot find there. Second, current records are those with MAX DATE TIME value contained in Valid_To attribute and deleted record don’t satisfy that condition. Third, in general if I want to maintain SCD 2 attribute change I'd create a new record with next surrogate key and I'd try to find previously most current record in order to update it to N value and to update Valid_To attribute. Let's say that deleted record apears on the source after some time in the future. The record than was initially deleted is already properly set (delete time, current flag = 'N') and it would be wrong to handle it as a regular SCD 2 attribute change (second activation/entry date minus 1 second or 1 day, current flag = 'N'). Am I right?


Last edited by Zoran Milokanović on Wed Apr 18, 2012 5:03 am; edited 1 time in total

Zoran Milokanović

Posts : 3
Join date : 2012-03-20

View user profile

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  ngalemmo on Tue Apr 17, 2012 12:11 pm

In a typical type 2 update, the pattern is to expire the current row and create a new current row. Note, 'current row' is not the same thing as the last known row for a particular natural key. So, if you soft-delete a type 2 row by setting the expire date and making it no longer current (set current flag to false), the process to handle a future resurrection is the same as any other type 2 update. You would not find a 'current row' and would simply treat it as a new row with a new effective timestamp. There would be no update to the prior row. No special logic necessary.

As for having a delete flag to distinguish between updates and source system deletes, that's fine.

avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  Zoran Milokanović on Wed Apr 18, 2012 5:02 am

Exactly that was my assumption. So, current records in dimension table (flag Y) represent current picture of the source system NOT the last known one, meaning that deleted row has to have current flag set to N. Many, many thanks. :DD

I just wanted to add that I understand why I didn’t find (much) articles about deleting records that are used as a source for SCD 2 dimension. In general this shouldn’t be possible due to referential integrity in a source system. However in practice, especially in a large (bank or telco) systems it happens during the regular DB maintenance where old records are backed up and purged. I know that in most cases this affects fact records but sometimes old services and products that are not part of current/active company portfolio could be purged as well. In that case by purging some of the records from lookup tables it affects corresponding dimension tables that already have those records.

Zoran Milokanović

Posts : 3
Join date : 2012-03-20

View user profile

Back to top Go down

Re: how to identify and remove faulty records from dimension and facts

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum