Implementing Employment SCD2 Dimension

View previous topic View next topic Go down

Implementing Employment SCD2 Dimension

Post  at710 on Wed Feb 26, 2014 4:50 am

Hi Community,
just to check my concept i'd be interested in your opinion regarding the following requirement / approach.

My customer does have a lot of temporary employees. Those are typically employed for the duration of customer projects they are assigned to (which may last from several days up to a couple of years). I do distinguish between persons and employments, since one person may be employed multiple times for several distinct projects. During one employment however attributes like working hours, etc. of the employment do change. I track those changes like scd2 within my employment dimension. Persons do hold only name and address and are kept as a non historized dimension. Employments relate n...1 to persons and do have the attributes entry-date and leave-date.
However at the start of an employment the leave-date is almost always empty, so i interpret this as ongoing employment. During my SCD2 handling logic (SQL stored procedure) i detect attribute changes and create a new history record whose effective-date is set to the last edit timestamp from the frontoffice system. Now it regularly occurs that on the next ETL run an leave-date is provided which is BEFORE the effective-date of one (or even several) most recent history rows.

My employment dimension history only ever holds records which are valid from the entry-date to the-leave date of each employment. I guess i do not fully understand the consequences of this approach. Is this common practice or a definite no go?

Thanks in advance

at710

Posts : 10
Join date : 2012-11-02

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  nick_white on Wed Feb 26, 2014 5:47 am

Hi,
if you could provide some examples of what your reporting requirements are then it may make it easier to comment. One thought I have had: isn't employment a fact (possibly a factless fact) rather than a dimension? You have Employee/Person, Project, Start Date, End Date, ... Dimensions all related via an Employment Fact?

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  BoxesAndLines on Wed Feb 26, 2014 10:00 am

The effective and end dates identify when a change occurred. The employment end date identifies when the employment is scheduled to end. There is no dependency between the columns.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  at710 on Wed Feb 26, 2014 11:41 am

@ nick_white
you are right employments can also bee seen as fact data. Please correct me it i am wrong but as far as i understend this is more of an academic question since the decision wether to handle an entity/event/process as a dimension or as a fact is just a matter of how often the data will be updated.

in fact i handle employments in two ways:
1) inside my dwh (ms sql server db) i have one table with history entries which can be seen as a dimension. History is only tracked for two fields (contract-type, weekly-working-hours) all other fields (except of course surrogate-key, business-key and validity dates) are updated retroactively for all the history rows.

2) In the reporting frontend (qlikview) this table is split into one dimension table (1:1 copy of dwh table) and a second fact table which holds one row for every day of the employment from entry-date to leave-date. This fact table is populated during load time which results in a non accumulating daily snapshot fact table (linked to dimensions calendar, employment, person). Employments as well as derived facts of it are not linked to projects, since one employee may be assigned to several projects at a point in time. I understand this as a different business event/case. Project assignment is stored in the same central fact table within qulikview with only confirmed dimensions as common fields among the different fact types. However this approach is qulikview specific and i think those two types of fact data would be stored in two distinct fact tables in almost any other reporting tool.

My main reporting requirement is to show "# of employments" and "sum of full time equivalents" the latter is calculated in the frontend based on the weekly-working-hours

@BoxesAndLines
thank you for this valuable hint, after a somewhat lengthy discussion with it staff, we discovered that the practice of limiting the tracked history by the two attributes entry-date and leave-date leads to data inconsitency if a history change is recorded for an ongoning contract and at the following ETL run an leave-date prior to the latest history record effective-date is delivered. We agreed to abandon the logic binding between the leave-date attribute and the expiration-date. Still we want to make sure that the entry-date matches the effective-date of our first historic crecord. Which in turn requires that effective-date of our first record has to be updated in case of an entry-date change. Anyhow i have the bad feeling that this is apitfall too. (how to handle if entry-date is moved past the effective-dates of several already recorded history rows ...)

However this would mean that it is perfectly legal to have history entries with effective dates before the entry-date and after the leave-date of an employment.
In my reporting frontend however i would still plan to only populate snapshot facts wetween entry-date and leave-date because this way i just have to count distinct employment-business-key for a selected period in time. My goal is not a full fledged point in time reporting, my time axis is only to enable correct FTE calculation over time. Of course i could also write daily snapshot facts for already discontinued emloyments to track how many employees already have left the company (if this will ever be of interest).

Maybe all my confusion roots in lack of understanding how to properly implement scd2 dim and derived fact table. I feel i'm driving in the right direction however still not have the final clue on how to proceed. I guess this will take some hands on elaboration with my data model.

Thanks for your input and any further hints.

at710

Posts : 10
Join date : 2012-11-02

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  ngalemmo on Wed Feb 26, 2014 1:37 pm

It is not an academic discussion, it is fundamental to dimensional modeling… an employee is a dimension, employment is a fact. What you are doing is not a dimensional design.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  at710 on Thu Feb 27, 2014 2:54 am

Ok after some sleep and some more thinking i assume the root af all evil in my case is that i try to handle emploments as scd2 dimension although this type of informaiton is dynamic data refelcting a process. What i would need is a consistent representation of an employment containing the whole history of contract-type and weekly-working-hours changes over time. The frontoffice system however does not track history at all, it just delivers a last-edited-timestamp. What i have done until today is trying to build up this history of changes artificialy within my db table by implementing an scd2 like approach which is in fact not my ultimate goal and in any case can only lead to a fuzzy depiction of the actual employment because if i detect a change of weekly-working-hours i have no way to distinguish if this is a correction of false input or a true change and if so from which date onwards this change is valid. For example someone changes the weekly-working-hours of an employment at 05.02.2014 from 20 to 30 hours but the empoyed person in reality did start to work for 30 hours a week by 01.02.2014 rather than the 5th. This can be seen as a best guess estimation.

Since i do not want a true point in time reporting it also makes no sense to implement scd2 at all. Unfortunaltely the frontoffice system does only deliver one (last) tuple of this "history".

Ok i agree on employments are fact data, but also in this case i do not know how i would handle this situation differently. My db table containing the best guess history of employment changes can be seen as a staging table which helps me populate a non accumulating daily snapshot fact  via slicing my intervals into days as described above. Wether i do this within qulikview or a separate db table does not greatly affect the final result. Splitting out a seperate employment dimension in my data model should also be okay, i interpret this as a junk dimension to reduce number of fields in my central fact table. Interesting on this approach however is the granularity of this dimension. I mean it could only hold one row per employment, or it could (as is currently) one temporal section of an employments contract-type, weekly-working-hours validity.

Current implementation:
dimension table:

sk-employmentbk-employmentjobcontract-typeweekly-working-hoursentry-dateleave-dateeffective-dateexpiration-daterecent-flag
1101DeveloperA2001.01.201403.01.201401.01.201403.01.20141
2102ConsultantA4001.01.2014null01.01.201401.01.20140
3102ConsultantB1001.01.2014null02.01.201404.01.20140
4102ConsultantAA2001.01.2014null05.02.2014null1
(derived) fact table:

datesk-employmentcontract-typefte
01.01.20141A0,5
02.01.20141A0,5
03.01.20141A0,5
01.01.20142A1
02.01.20143B0,25
03.01.20143B0,25
04.01.20143B0,25
05.01.20144AA0,5
...4AA0,5

at710

Posts : 10
Join date : 2012-11-02

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  at710 on Thu Feb 27, 2014 3:12 am

I think my scd2 like approximate tracking of employment changes is just a kludge to somehow overcome the shortcomings of our frontoffice system. If i want to truly handle employments as facts the only correct way i can think of is to abandon all artificial history tracking for employments and instead rebuild my fact table during each ETL run based upon entry-date leave-date attributes of currently extracted data rows (i currently already rebuild the whole fact table, but based upon inproperly historized db table rather than just current extract). However this would mean my numbers won't be a guess any more but i am afraid nevertheless further away from the "truth" than if i would artificially track relevant changes... i know i am not responsible for inflowing data but still i have to do my best to do whatever to improve report usefulness... beside de crux that i am actually doing "data astrology" and therefore masking existing shortcomings of our frontoffice sys... i'd be grateful for any hints how to better handle this unfortunate situation. Or is this issue just a question of proper data handling on my side?

Thank you guys!

at710

Posts : 10
Join date : 2012-11-02

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  nick_white on Thu Feb 27, 2014 4:43 am

Hi,
I think your problems are due to the fact that you are treating employment as a dimension and not a fact. If you put the start/end dates of the employment as FKs on your employment fact table, pointing to your Date Dim then I think most of your issues go away - especially around SCDs.
Obviously you need to consider how to deal with changes in your employment fact: you could just update the existing fact table when information changes (which doesn't allow you to track history) or you could change the end date FK on the existing fact record to "now" and then create a new fact record with a start date of "now". If you take this 2nd approach then you'll also need to consider carefully how your fact measures can be aggregated in queries - are they non-, semi- or fully additive?

Regards,

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  at710 on Thu Feb 27, 2014 5:21 am

Ok so your proposed 2nd method would add validity dates for fact rows which would allow point in time reporting (= see how the numbers of last week were at any point in time). I think this is adding more complexity than i actually do need.

Splitting the topic of wether my history estimation kludge makes sense or not from the question about how to update my fact table and focusing solely on the latter in the first place seems very promising.

Starting with a set of fact rows (wether these are derived upon an history tracked staging table or just upon current extract should not matter) i will focus on wether to completely replace all fact rows for one employment at once, or instead insert/update/delete some of them.

Thank's to all for your time, effort & brains!

at710

Posts : 10
Join date : 2012-11-02

View user profile

Back to top Go down

Re: Implementing Employment SCD2 Dimension

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum