"Previous" attributes

View previous topic View next topic Go down

"Previous" attributes

Post  obiapps on Fri Oct 22, 2010 12:34 am

I am working on a HR Enterprise datawarehouse. I have a requirement for a number of "previous" attributes. For example, an employee's previous business unit, supervisor, last name. Business Unit, Manager already have their own Type 2 SCD Dimension table that has a key that joins to the event fact table. last name is a part of the employee dimension which also has a key to the event fact table. Questions I have are: Is it better to have a "previous" key on the event fact table or does this need to be done on the dimension table itself (possibly a column that captures the previous value? For example, I can put a "previous" key for Business Unit and Supervisor ID on the event fact table. However, for last name, it seems this would have to go on the Employee Dimension table..not the fact. Not sure of this makes sense but was wondering if someone can guide me from a modeling perspective. Is there a best practice when it comes to assigning previous attributes such as this? Thanks.

obiapps

Posts : 21
Join date : 2010-09-28

View user profile

Back to top Go down

Re: "Previous" attributes

Post  ngalemmo on Fri Oct 22, 2010 12:02 pm

I assume the event you are referencing is an employee action of some sort. Previous should be part of the event fact as an additional FK with an appropriate role name. If there is no change for that particular job, position, business unit, etc.

However I would not do that for supervisor, as a change in supervisor is typically not an employee action. Usually supervisor is part of the position hierarchy and changes to that structure should be dealt with by maintaining a hierarchy history. Of course, if an employee changes position, that should be reflected in the action event fact. With that said, I would probably also have an current supervisor attribute in the employee dimension to simplify reporting if there is a common need for it.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: "Previous" attributes

Post  obiapps on Fri Oct 22, 2010 5:52 pm

Thank you for your response. For items that should not have a previous FK as they are not related to an event, would these them be modeled as a extension column on the dimension table? For example, lets suppose I have a large Employee Dimension which has several columns as type 2 SCD. Suppose I want to know what the PREVIOUS last name was for any given employee. Would this be done by adding another column called 'Previous Name' and embedding the logic in the SCD ETL logic? or can we just use a "Previous Last Name" flag of some sort? I have an exisisting CURRENT flag but as this is changed anytime ANY of the SCD type 2 columns get updated, I cannot use this to identify only last name changes. Any thoughts on how to design this?

obiapps

Posts : 21
Join date : 2010-09-28

View user profile

Back to top Go down

Re: "Previous" attributes

Post  ngalemmo on Fri Oct 22, 2010 7:01 pm

If you are modeling employee as a type 2, then you have what you need (sort of). However, what you describe (storing a current and previous attribute value on the same row) is a type 3 dimension. If you have a situation where the business is only interested in the previous (and maybe the previous previous) value, doing it as a type 3 is much simpler than implementing a type 2 dimension.

The other thing to consider is, what changes on an employee that isn't reflected in a fact (employee action) fact table? And how much history of those things do you really need?

Any movement (change of job, position, etc), raise, hiring, transfer, termination and so forth are all reflected in the action history (facts). There isn't much left on the employee... name, # dependents, phone numbers, union status, addresses, tax information... that warrants a type 2. If there are only a handful of attributes that need type 2 treatment, you may consider junk dimensions.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: "Previous" attributes

Post  obiapps on Fri Oct 22, 2010 7:12 pm

well explained, thank you. Will proceed with a type 3.

obiapps

Posts : 21
Join date : 2010-09-28

View user profile

Back to top Go down

Re: "Previous" attributes

Post  hang on Fri Oct 22, 2010 9:02 pm

I don't think type 3 will adequately address the issue and it is rarely used in dimensional modeling since type 2 predominantly resolves most, if not all, change tracking issues. All you need to do is to apply type 2 on any attribute that you want to keep changes and let query to find out the desired attribute version that is current, prior, prior minus 1, prior minus 2 etc.

The supervisor should be an employee in a parent-child relationship in employee dimension. The employee-business unit (org) correlation can be reflected in the event fact table. So there would be some kind of self join on natural keys in both dimensions in order to get the desired version of correlation.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: "Previous" attributes

Post  ngalemmo on Mon Oct 25, 2010 12:04 pm

Hang,
I disagree. While a type 2 will most certainly provide the most complete history of dimensional change, most of the time it is overkill. In this case, where they are concerned about previous values, type 2 is not the right solution... as it begs the question: How do you find the previous value in a type 2 dimension?

The previous value is not the value on the previous row. A new type 2 row may be triggered by a change in any attribute, so the previous value for any one attribute may be in a row many generations prior to the current row. The SQL to locate such a value is very cumbersome at best. It is further complicated if the query requires previous values from more than one attribute, as the previous values for these attributes may be on different rows. In addition, the FK may reference any version of the entity, while previous is a concept usually associated with the current value. I would challenge anyone to come up with reasonable SQL that can locate current and previous values of more than one attribute in a type 2 dimension.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: "Previous" attributes

Post  hang on Mon Oct 25, 2010 8:27 pm

I agree that type 3 is a simple solution if only last previous SCD value is needed. However for a complete BI solution, you may still need type 2 to have a complete picture of all the historical data. So it may well be a hybrid SCD dimension in this case.

In case you do need to retrieve any version of changes, or for sake of argument, it is challenging but still achievable if you utilise row_number() in ETL and store in a table a unique list of changed attributes ordered by history number. In TSQL it would be as follows:

Select EmployeeSK, EmployeeNK, LastName,
row_number() over (partition by EmployeeNK order by EmployeeSK desc) as PrevNum
from
(
Select EmployeeSK, EmployeeNK, LastName,
row_number() over (partition by EmployeeNK, LastName order by EmployeeSK desc) as PrevNum1
from dimEmployee
) a
Where PrevNum1=1

So with this table in place, you could retrieve any version of changes on LastName by filtering on PrevNum. In my test, it only takes less than 1 min for nightly ETL to store the list of couple of million rows.


Last edited by hang on Mon Nov 01, 2010 5:03 pm; edited 3 times in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: "Previous" attributes

Post  ngalemmo on Tue Oct 26, 2010 1:05 pm

"it is challenging but still achievable if you utilise row_number or rownum in ETL and store in a table a unique list of changed attributes ordered by history number."

Could you elaborate. And, how would I implement something like this if my BI front end uses a generic tool such as BO or Cognos?

I mean, it's not that you couldn't do it, but wouldn't it be simpler to just have two dimension tables, a type 3 and a type 2 if you need a full historical record?

The thing is, its is always necessary to have a full, robust dimensional history? My experience has been that no, not really. Most of the time, the business just doesn't care because most of the time such a history has little or no value. Sure, they always say they need history, usually because they read about it in an article or white paper, but when you dig deeper, you find that the scope of the historical perspective is much smaller than "everything". Focusing on what is really important can signficantly reduce development as it allows for alternatives to a type 2, as well as improve clarity and useability of the environment.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: "Previous" attributes

Post  hang on Tue Oct 26, 2010 8:10 pm

The script is purly back-end process and has nothing to do with front-end semantic layer. It is is something I would use in my ETL process to have a convenient table ready to join with the master dimension in order to get the desired version of attributes. I could have a view to wrap up the entire logic. The reason I implement it in ETL process is really for performance consideration. Say the table is called EmployeeLastName, I would have the following script to retrieve last previous version:

Select e.*, eln.LastName as PrevLastName
From dimEmployee e
left join EmployeeLastName eln
on eln.EmployeeNK=e.EmployeeNK
and eln.PrevNum=2 -- 1: current version, 2: current -1, 3: current-2 etc.

In Oracle, Itís even better as I would just build the logic into a materialised view with scheduled update based on load frequency. I could, in theory, wrap the logic using multiple left join with multiple in-line sub-queries into a single materialised view to cater for type 3 requirements on all other attributes.

I agree that historical view is not what the business normally wants to see. In my current project, the business only wants to see the current view, but demanded the historical data be ready to be needed for future exploration, or even just for audit and verification purposes.

The point is, one of the main focuses of data warehousing is about storing historical data in a dimensionally modelled format, and type 2 SCD is big part of dimensional modelling. Before dimensional modelling, there was no standard type 2 SCD and its relevant methodical ETL process around. There were a lot of ad-hoc approaches such as simplistic type 1, snapshot based dimension, and of course type 3. I guess the term type 3 SCD is only used to stress the importance of type 2 SCD.

I donít agree that type 2 SCD is an overkill for most of business requirements. On the contrary, to build a future proof data warehouse, the thought process should be like this, always use type 2 SCD for dimension change tracking by default, only combine type 2 with other types (eg. Type 3) if necessary. Sooner or later, you will find type 2 is unavoidable; why avoiding it If it is so effective and implementing it is really not that hard, given the full elaboration by Kimball.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: "Previous" attributes

Post  hang on Sun Oct 31, 2010 7:58 pm

I have done some research over weekend and proved the SQL scripts in my previous posts are compliant with ANSI stadard, meaning you could use them in all major SQL environments including SQL Server and Oracle, if the version is new enough to comply with ANSI SQL-99. BTW, I have also fixed a typo and updated the script in my first post by adding a missing FROM clause in the subquery.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: "Previous" attributes

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum