How to avoid duplicate Records in the fact table?

View previous topic View next topic Go down

How to avoid duplicate Records in the fact table?

Post  JimBeam on Fri Sep 23, 2011 9:20 am

I have a periodic snapshot fact table to capture the business process "real estate sales". Then we have one row per property per buyer per day, that to say a property can be sold at most once in a day to one buyer.

Now I want to ensure data quality and avoid faulty duplicate records coming into the fact table (bad data everywhere, you never know). If the Dimension tables (Property, Buyer, Date) are not slowly changing, then I could put in the fact table a unique index on these 3 Surrogate FK Keys (PropertyKey, BuyerKey, DateKey), and that would be enough.

But what would you do, if one of the Dimension Tables is of SCD2 type? Then that unique index will be failing to prevent entering duplicate fact rows, because the new dimension Record for the same entity (property, buyer) gets a new Key in the fact table, although its still the same entity in real life. For example suppose the Buyer with the same Natural Key that is slowly changing (moving to a new address), gets a new BuyerKey but is still the same Buyer.

----------------------------------
DimBuyer
----------------------------------
BuyerKey (PK)
BuyerID (Natural Key)
City (SCD2)
----------------------------------


JimBeam

Posts : 2
Join date : 2011-09-20

View user profile

Back to top Go down

Re: How to avoid duplicate Records in the fact table?

Post  VHF on Mon Sep 26, 2011 5:34 pm

JimBeam wrote:I have a periodic snapshot fact table to capture the business process "real estate sales".
That sounds like a transactional fact table rather than a periodic snapshot.

If the BuyerID (Natural Key) is the only durable identifier that spans SCD2 rows for a given buyer, then you would have to use it to uniquely identify the buyer.

You could build a table that you would keep in your staging database (kitchen) which has an identifier for buyer, property, and date. You can then put a unique index on this table (could even be the clustered primary key).

DateKey, BuyerID, PropertyID*

During ETL, first insert into this table. If there is a duplicate it will fail. If it passes the load into this table, then load into your actual fact table.

*this could be PropertyKey if property is SCD1

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: How to avoid duplicate Records in the fact table?

Post  VHF on Mon Sep 26, 2011 5:39 pm

Alternatively, you could use this query after-the-fact to see if a duplicate has crept into your fact table:

SELECT f.PropertyKey, b.BuyerID, f.DateKey, COUNT(1) AS [RecordCount]
FROM FactRealEstateSales f
INNER JOIN DimBuyer b
ON b.BuyerKey = f.BuyerKey
GROUP BY f.PropertyKey, b.BuyerID, f.DateKey
HAVING COUNT(1) > 1

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: How to avoid duplicate Records in the fact table?

Post  nmorgan96 on Wed Jan 11, 2012 10:51 pm

thanks for the tip's guys....

nmorgan96

Posts : 1
Join date : 2012-01-11

View user profile

Back to top Go down

Re: How to avoid duplicate Records in the fact table?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum