SCD2 and partial information

View previous topic View next topic Go down

SCD2 and partial information

Post  exhortae on Mon Jul 21, 2014 4:03 pm

Hi,

Im trying to build a dimension from a database and a flat file.
Say I have the dimension consisting of network nodes.

My source data consist of:

- Node_id (business key)
- Node_label
- Node_type

There is only 2 attributes in this dimension and I want to enrich it by adding the following attributes provided in a flat file:

- DSLAM
- City
- And of course the Node_id (so I can join the data from the database with the flat file)


The resulting dimension would be:

- ID_DM (surrogate key)
- Node_Id (business key)
- Node_Label SCD1
- Node_type SCD1
- DSLAM SCD2
- City SCD2

So far so good.

The problem comes from the fact that there is 20 000 nodes and for each node the DSLAM and the City need to be entered manually in the flat file. So at the beginning the flat file will only contain partial information.
First flat file provided

Node_id | DSLAM | City
N1 |DSLAM1 | Chicago
N2 | DSLAM2 | Not Available

The concern that I have is with the 2nd line (notice the not available value in the city attribute)

After the first load of the dimension I will have something like this

Dm_NODE

ID_DM | Node_label | Node_Type | DSLAM | City effect_date | expiry_date
.
3 | N2 | Interface | DSLAM2 | Not available | 2014-01-01 | 9999-12-31


After the first load, the flat file is updated with the following information (the flat file always contain all the nodes, not just the delta).

New flat file:

Node_id | DSLAM | City
N1 | DSLAM1 | Chicago
N2 | DSLAM2 | Los Angeles
N3 | DSLAM3 | Houston


So basically in the second load and considering Im using SCD2 for the City column I end up with :

DM_Node

ID_DM | Node_label | Node_Type | DSLAM | City effect_date | expiry_date
.
3 | N2 | Interface | DSLAM2 | Not available | 2014-01-01 | 2014-03-20
.
6 | N2 | Interface | DSLAM2 | Houston | 2014-03-20 | 9999-12-31
.

Which of course is incorrect because I wasnt supposed to create a new line for the N2 node but only update the first row with the information made available. The problem is that the city attribute is an SCD2 attribute (because the city of a node can change say from Houston to Philadelphia) .

To solve this, I have added in the ETL process the following processing.

- During the scd check, if I have generated a new line for an already existing node, I check if in the old line the attribute city is set to no available, if it is the case, I discard the new line, and just update the old line (I change the scd2 with an scd1 behavior). If it is not the case I close the old line and insert the new line (scd2 behavior).

But In all honesty I dont like this solution, because I find it to complicated (I always consider that if the solution is complicated for a simple problem, it is because I missed a more elegant solution).

I would be glad if someone more experienced could come with some insight.

Thank you for reading me.

exhortae

Posts : 30
Join date : 2010-08-01

View user profile

Back to top Go down

Re: SCD2 and partial information

Post  ngalemmo on Tue Jul 22, 2014 12:43 am

The question is: what do you want to happen?

There is nothing wrong with special circumstances that require a deviation from the normal pattern, as long as you represent things the way you need to represent them.

You have two dependent attributes: DSLAM and City. You have been maintaining history for DSLAM. As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: SCD2 and partial information

Post  exhortae on Tue Jul 22, 2014 3:27 pm

ngalemmo wrote:The question is: what do you want to happen?

There is nothing wrong with special circumstances that require a deviation from the normal pattern, as long as you represent things the way you need to represent them.

You have two dependent attributes: DSLAM and City. You have been maintaining history for DSLAM. As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?

Hi,

I'm sorry but I don't understand this part :

As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?


could you please elaborate a little.

thank you

exhortae

Posts : 30
Join date : 2010-08-01

View user profile

Back to top Go down

Re: SCD2 and partial information

Post  BoxesAndLines on Wed Jul 23, 2014 8:07 am

If I understand your post, you want to treat city as a type 2 attribute except when its value is 'Not Available' in which case you want to treat it as a type 1. Works for me.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: SCD2 and partial information

Post  exhortae on Wed Jul 23, 2014 3:04 pm

BoxesAndLines wrote:If I understand your post, you want to treat city as a type 2 attribute except when its value is 'Not Available' in which case you want to treat it as a type 1. Works for me.

Yes that's what I want to do. Do you think the method I describe above to do it is OK ?

thanks

exhortae

Posts : 30
Join date : 2010-08-01

View user profile

Back to top Go down

Re: SCD2 and partial information

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum