Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Sql Server Integration Services (2008) - Sort Data Flow Task / Advanced OLE Source .....

Go down

Sql Server Integration Services (2008) - Sort Data Flow Task / Advanced OLE Source ..... Empty Sql Server Integration Services (2008) - Sort Data Flow Task / Advanced OLE Source .....

Post  ian.coetzer Wed Mar 23, 2011 9:54 am

Hi,
In SSIS I have a VERY strange phenomenon that I have not been able to reproduce which is resulting in me wasting a lot of time by removing data flow tasks and tweaking the advanced settings of my data flow tasks ....

The Situation:
==========

I have say 2 OLE_DB source tasks within the same data flow tasks.
I need to merge join these two results.

Now to do this as you know both sources have to be sorted.

There are 2 ways to sort your data sources.

1) (Easiest) use a Sort data flow task to sort the incoming records,
this always seems to work - HOWEVER it consumes a LOT of memory on the server when working with several million records.

2) in the OLE Source task use an ORDER BY sql clause to order both sources in the same manner.
then one has to of course set the advanced properties of the ole output to IsSorted = True
and then set the SortKeyPosition ofthe output fields accordingly.


Now .... this is all great and well, BUT when I make use of the 2nd option it does not always work ...
What I mean is that it does not properly join all records sometimes some records are missing in the output ...

I have to everytime test all my results by writing complex stored procedures to do exactly the same thing:
- selecting all records from source A and then source B then doing my JOIN etc. and then at the end compare th row count of total joined records and also do spot checks of the values.
(I usually use FULL OUTER JOINS)

Let's say source A looks like this:
Key CustomerName
1 "AAA"
2 "BBB"
3 "CCC"

and source B looks like this:
Key CustomerAge
2 55
3 45
4 43

After a full outer join the results may be:

Key_A Key_B CustomerName CustomerAge
1 Null "AAA" Null
2 2 "BBB" 55
3 3 "CCC" 45
Null 4 Null 43

I then derive the new key so that i write the following results to my tables:
NewKey CustomerName CustomerAge
1 "AAA" Unknown
2 "BBB" 55
3 "CCC" 45
4 "Unknown" 43

Now sometimes using method 2 this last resultset does not look like this ...
it usually happens when i have more than one field making up the key and when one of them is a string,
but this is also not consistent

any ideas? do you guys use the sort task often and just assign 10's of thousands of gigs of ram to the server?


several million records
ian.coetzer
ian.coetzer

Posts : 57
Join date : 2010-05-31
Age : 43
Location : South Africa

Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum