Dealing with deduplication, merging records, etc in the warehouse layer

View previous topic View next topic Go down

Dealing with deduplication, merging records, etc in the warehouse layer

Post  Juan on Tue Dec 06, 2011 1:32 pm


From my experience and Kimball ETL Architecture training I know that there are (not cast in concrete) 4 layers:
1) Staging 2) Transform 3) Conform 4) Warehouse (might be named differently by others)
The dedup/merging/cleaning, etc are all done in the first 8 subsystems ie. the first two layers.

What are the pros and cons of doing client merging, client deduplication, and client key management in the warehouse layer and not in the "kitchen" layer?

Your opinions will be appreciated



Posts : 1
Join date : 2010-07-15
Location : Cape Town, ZA

View user profile

Back to top Go down

Re: Dealing with deduplication, merging records, etc in the warehouse layer

Post  ngalemmo on Tue Dec 06, 2011 8:25 pm

I am not a proponent of merging, de-duping in any layer other than at the reporting level. Key management (if you mean the assignment of dimension keys) occurs in the transformation process prior to loading into the warehouse.

It all has to do with business keys. Clients in the source system are identified by a business key (whatever form that takes). The transactional source, which would contain the business key, needs to be related to the dimension using through the dimension's surrogate PK. It is important that the relationship between a fact and dimension is based on the natural key in the transaction rather than base it on some "merged" picture of the client. This correctly reflects the true nature of the relationship under which the transaction took place.

The "merged" client is simply a representation of a client. You handle this by including additional attributes to reflect merged client. From an end-user standpoint, you can present them with a merged view or an as-is view or both based on the attributes you decide to expose. Also, if the merge associations change (as the merge rules change) you can easily reflect the new merged view by simply updating the merged view attributes.

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum