Surrogate Key in Stage table

View previous topic View next topic Go down

Surrogate Key in Stage table

Post  dwh.arvind on Thu Nov 15, 2012 4:11 am

Is it good idea to have Surrogate Key in stage table ? if yes then obviously why?

dwh.arvind

Posts : 4
Join date : 2012-10-25

View user profile

Back to top Go down

Re: Surrogate Key in Stage table

Post  Jeff Smith on Thu Nov 15, 2012 3:40 pm

I don't think the question is "why would you add surrogates to the staging tables". I think the more appropriate question is why wouldn't you?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Surrogate Key in Stage table

Post  ngalemmo on Thu Nov 15, 2012 7:42 pm

Do you mean to assign surrogate key values to stating table rows, or do you mean should natural keys be converted to surrogates as part of the staging process?

Generally, no to the first and yes to the latter.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Surrogate Key in Stage table

Post  dwh.arvind on Thu Nov 15, 2012 10:38 pm

The surrogate key is a meaningless key but still it is good to have with the many reason; as may times the data coming from a source has no unique identifier or sometimes the unique identifier is a composite key; in such cases when data issue is found with any of the row it is very difficult to identify the particular row or even mention it. When a unique row number is assigned to each row in the staging table it becomes really easy to reference it.

dwh.arvind

Posts : 4
Join date : 2012-10-25

View user profile

Back to top Go down

Re: Surrogate Key in Stage table

Post  min.emerg on Fri Nov 16, 2012 5:51 am

Surrogate keys are necessary if you are wanting to use delta loading (where only updated records are loaded into the warehouse when the load is performed) and/or slowly changing dimensions.

Pretend that in our HR system we have an Employee table, and in our data warehouse staging database we have another table that matches it.

HR.dbo.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)
DWStaging.HR.Employee (EmployeeId INT PK, Name NVARCHAR(128), UpdatedDate DATETIME)

The first time we load data from HR.dbo.Employee into DWStaging.HR.Employee, all records should come through. The second time we load data, only records in HR.dbo.Employee that have been updated since the last load should come through (because we're doing delta loading). The problem is that you could potentially updated the same record in HR.dbo.Employee every day, which would result in the new version of the record being loaded into DWStaging.HR.Employee (because we don't delete data from DWStaging.HR.Employee). This is a problem if we want to make EmployeeId the primary key in DWStaging.HR.Employee because it could potentially not be unique (because we have multiple versions of the same record because the record is updated in HR.dbo.Employee).

What you can do is update the staging table to have a surrogate key:
DWStaging.HR.Employee (ImportKey INT PK, EmployeeId INT, Name NVARCHAR(128), UpdatedDate DATETIME)

ImportKey is now our auto-incrementing primary key in DWStaging,HR.Employee. We can use the UpdatedDate field to check which version of the record was the one that was 'active' at a particular point in time.

Hope this helps.

min.emerg

Posts : 39
Join date : 2011-02-25

View user profile

Back to top Go down

Re: Surrogate Key in Stage table

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum