Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

SCD2 and partial information

3 posters

Go down

SCD2 and partial information Empty SCD2 and partial information

Post  exhortae Mon Jul 21, 2014 4:03 pm

Hi,

I’m trying to build a dimension from a database and a flat file.
Say I have the dimension consisting of network nodes.

My source data consist of:

- Node_id (business key)
- Node_label
- Node_type

There is only 2 attributes in this dimension and I want to enrich it by adding the following attributes provided in a flat file:

- DSLAM
- City
- And of course the Node_id (so I can join the data from the database with the flat file)


The resulting dimension would be:

- ID_DM (surrogate key)
- Node_Id (business key)
- Node_Label – SCD1
- Node_type – SCD1
- DSLAM – SCD2
- City – SCD2

So far so good.

The problem comes from the fact that there is 20 000 nodes and for each node the DSLAM and the City need to be entered manually in the flat file. So at the beginning the flat file will only contain partial information.
First flat file provided

Node_id | DSLAM | City
N1 |DSLAM1 | Chicago
N2 | DSLAM2 | Not Available

The concern that I have is with the 2nd line (notice the not available value in the city attribute)

After the first load of the dimension I will have something like this

Dm_NODE

ID_DM | Node_label | Node_Type | DSLAM | City effect_date | expiry_date
….
3 | N2 | Interface | DSLAM2 | Not available | 2014-01-01 | 9999-12-31


After the first load, the flat file is updated with the following information (the flat file always contain all the nodes, not just the delta).

New flat file:

Node_id | DSLAM | City
N1 | DSLAM1 | Chicago
N2 | DSLAM2 | Los Angeles
N3 | DSLAM3 | Houston


So basically in the second load and considering I’m using SCD2 for the City column I end up with :

DM_Node

ID_DM | Node_label | Node_Type | DSLAM | City effect_date | expiry_date
….
3 | N2 | Interface | DSLAM2 | Not available | 2014-01-01 | 2014-03-20
….
6 | N2 | Interface | DSLAM2  | Houston | 2014-03-20 | 9999-12-31
….

Which of course is incorrect because I wasn’t supposed to create a new line for the N2 node but only update the first row with the information made available. The problem is that the city attribute is an SCD2 attribute (because the city of a node can change say from Houston to Philadelphia) .

To solve this, I have added in the ETL process the following processing.

- During the scd check, if I have generated a new line for an already existing node, I check if in the old line the attribute city is set to no available, if it is the case, I discard the new line, and just update the old line (I change the scd2 with an scd1 behavior). If it is not the case I close the old line and insert the new line (scd2 behavior).

But In all honesty I don’t like this solution, because I find it to complicated (I always consider that if the solution is complicated for a simple problem, it is because I missed a more elegant solution).

I would be glad if someone more experienced could come with some insight.

Thank you for reading me.

exhortae

Posts : 30
Join date : 2010-08-01

Back to top Go down

SCD2 and partial information Empty Re: SCD2 and partial information

Post  ngalemmo Tue Jul 22, 2014 12:43 am

The question is: what do you want to happen?

There is nothing wrong with special circumstances that require a deviation from the normal pattern, as long as you represent things the way you need to represent them.

You have two dependent attributes: DSLAM and City. You have been maintaining history for DSLAM. As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

SCD2 and partial information Empty Re: SCD2 and partial information

Post  exhortae Tue Jul 22, 2014 3:27 pm

ngalemmo wrote:The question is: what do you want to happen?

There is nothing wrong with special circumstances that require a deviation from the normal pattern, as long as you represent things the way you need to represent them.

You have two dependent attributes: DSLAM and City.  You have been maintaining history for DSLAM.  As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?

Hi,

I'm sorry but I don't understand this part :

As there is a direct relationship between DSLAM and City, what's wrong with updating all rows in the dimension setting the City based on the DSLAM value?


could you please elaborate a little.

thank you

exhortae

Posts : 30
Join date : 2010-08-01

Back to top Go down

SCD2 and partial information Empty Re: SCD2 and partial information

Post  BoxesAndLines Wed Jul 23, 2014 8:07 am

If I understand your post, you want to treat city as a type 2 attribute except when its value is 'Not Available' in which case you want to treat it as a type 1. Works for me.
BoxesAndLines
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

Back to top Go down

SCD2 and partial information Empty Re: SCD2 and partial information

Post  exhortae Wed Jul 23, 2014 3:04 pm

BoxesAndLines wrote:If I understand your post, you want to treat city as a type 2 attribute except when its value is 'Not Available' in which case you want to treat it as a type 1.  Works for me.  

Yes that's what I want to do. Do you think the method I describe above to do it is OK ?

thanks

exhortae

Posts : 30
Join date : 2010-08-01

Back to top Go down

SCD2 and partial information Empty Re: SCD2 and partial information

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum