Kimball Forum
Would you like to react to this message? Create an account in a few clicks or log in to continue.

Using 3rd party Sort packages in ETL stream

2 posters

Go down

Using 3rd party Sort packages in ETL stream Empty Using 3rd party Sort packages in ETL stream

Post  juz_b Wed Jul 15, 2009 5:13 pm

I was wondering if anyone can share your experience with using a 3rd party Sort package (CoSort, SyncSort etc) in your ETL Stream.

I have been using Business Objects Data Integrator for the last 5 years and never had a chance to integrate a 3rd party Sort package into the ETL stream. My understanding is that it is always faster to process Flat file (source and target), compared to a database table. My dilemma is that by processing using a flat file, you lose the ability to query in the event you need to troubleshoot the data stream. I have on rare occasions used a Flat file staging for dumping my initial extract from source, followed by db tables downstream. All my aggregates are performed in the DB engine.

I guess my question really is when does it make sense (performance and storage) to integrate the use of a Flat file (staging, target?) with a 3rd Party Sort Package in the ETL stream? What are the disadvantages? This sound like an open ended question but any comment would be appreciated.

Thanks!

Brandon

juz_b

Posts : 17
Join date : 2009-02-07

Back to top Go down

Using 3rd party Sort packages in ETL stream Empty Re: Using 3rd party Sort packages in ETL stream

Post  ngalemmo Thu Jul 16, 2009 1:51 am

It really depends upon how much data you are dealing with. I have used CoSort in the past (about 8 years ago) with DataStage and it provided significant improvement in sorting times.

Generally speaking, sort capabilities within an ETL tool are pretty poor compared to CoSort or SyncSort. However, with 64 bit CPUs and many GB's of RAM, the ETL tool's internal sort functionality will do a decent job if they can keep everything in memory.

But the most common practice today is to do the sorts in database queries. Sort performance in many database systems is comparable to external tools in most applications. But, if your data is not in a database, is of significant size and you really need to reduce your processing time, the sort tools are well worth considering.
ngalemmo
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

http://aginity.com

Back to top Go down

Using 3rd party Sort packages in ETL stream Empty Re: Using 3rd party Sort packages in ETL stream

Post  juz_b Mon Jul 20, 2009 2:36 pm

Thanks for your input! This really helps clear some confusion.

juz_b

Posts : 17
Join date : 2009-02-07

Back to top Go down

Using 3rd party Sort packages in ETL stream Empty Re: Using 3rd party Sort packages in ETL stream

Post  Sponsored content


Sponsored content


Back to top Go down

Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum