Why extract?

View previous topic View next topic Go down

Why extract?

Post  indignatz on Wed Apr 18, 2012 10:29 am

Hi all,
I question for you which will hopefully generate some discussion.

Most of the source systems we extract from are Oracle. There are mirrors set up for the production databases that are currently refreshed nightly, but can be refreshed on a more frequent schedule as it takes a matter of minutes to do the refresh. It's been suggested that instead of extracting the data to our staging database, we could pull directly from the mirrors and bypass the extract step altogether. I'm wondering what every thinks would be the pros and cons of this approach?

I think it would be more difficult to audit where the data came from, what you are loading each night etc, as well as maintaining a copy of history, but beyond that I'm having a hard time seeing what the downside(s) would be. It would certainly speed up the loading process if we didn't have to extract data every night.

What are your thoughts?

Thanks,
-Mark

indignatz

Posts : 4
Join date : 2009-03-13

View user profile

Back to top Go down

Re: Why extract?

Post  ngalemmo on Wed Apr 18, 2012 11:43 am

bypass the extract step altogether

Do you mean staging? After all, you are still extracting are you not?

If you have a true mirror available, and the system admins let you do it, it shouldn't hurt pulling from the mirror instead of the live system. However, admins usually don't like the idea because the mirror is usually the failover system and they don't want any accessing it so it is ready to go should the need arise.

But, regardless of where the data comes from, staging still has a purpose.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Why extract?

Post  indignatz on Wed Apr 18, 2012 11:51 am

Sorry, yes, the extract would still occur, but instead of dropping the data in a staging database, we would transform and load it directly from the mirror.

What are the potential gotchas of not staging the data?

indignatz

Posts : 4
Join date : 2009-03-13

View user profile

Back to top Go down

Re: Why extract?

Post  ngalemmo on Wed Apr 18, 2012 1:20 pm

One reason to stage is to isolate a unit of work. It provides clarity to a process and a recovery point. Also, if you are dealing with multiple sources for the same data, it allows for soft transformations into a common staging view, allowing a single load process regardless of source.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Why extract?

Post  Jeff Smith on Wed Apr 18, 2012 2:24 pm

You're still staging it. The difference is that you will be "pulling" data instead of having the data "pushed" to you.

Pros and cons to both. A lot of times, the people creating the pushed data know the source system really well and the people involved in pulling the data know the DW really well. If you pull the data, your knowledge of the source system has to go up. Imagine pulling data from multiple source systems in multiple types of database platforms.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Why extract?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum