ETL from Oracle to SQL Server 2008 Data Warehouse

View previous topic View next topic Go down

ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Sun Jan 30, 2011 2:41 am

Hello,

Our data warehouse runs on SQL 2008 servers. One of the transactions systems we have runs on an Oracle database and managed by a separate team.

What are the recommended methods to extract source data from an Oracle database and loading to a SQL Server 2008 data warehouse?

One option I can think of is to export data from Oracle to a set of flat files (CSVs) and deliver them to the Data warehouse team. The data warehouse team can use an ETL tool like SSIS to load these CSVs to SQL Server 2008. Is this practical?

Any other methods?

Thank you





sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  BoxesAndLines on Sun Jan 30, 2011 3:20 pm

Use a tool like Informatica. Different databases or servers are not an issue.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  hang on Sun Jan 30, 2011 7:21 pm

You could also use OLE DB connection in SSIS to extract data from Oracle database directly. A typical approach is to load all the relevant tables into staging area in SQL Server and let ETL process carry out all the necessary down-stream activities either by stored procedures or by tools.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  John Simon on Sun Jan 30, 2011 7:26 pm

You can use SSIS to load directly into the Data Warehouse via an OLEDB connection. However the issue with this is that if there is a problem on either end, you may lose the data at that point. I think it's safer to get an extract as a csv file and load with SSIS.

Boxes and Lines, why use Informatica when you get SSIS for free?

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Sun Jan 30, 2011 11:08 pm

Thanks John Simon for the reply.

How about getting a back up of Oracle database, restore in an staging Oracle environment in Data Warehouse team and use in ETL tool to extract data and load into SQL Server data warehouse? Obviously, this relieves the work load on Oracle DBA team (Transaction system), but adds more work for the ETL team in the Data warehouse team. Any thoughts on this approach?

sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  hang on Sun Jan 30, 2011 11:22 pm

You could use the SQL Server's data replication facility to move data from Oracle to SQL Server for initial load and then incremental load by cvs files.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Change Data Capture

Post  sman on Mon Jan 31, 2011 12:42 am

Thanks Hang.

Do you know whether types of Change Data Capture stratergies are available in Oracle?

I have come across various CDC methods like relying on created/updated datatime columns in transaction system tables, triggers, transaction logs, hash values etc.

This is what I tend to think as a possible solution. Adding a hash column to every table in data-warehouse that is calculated based on all the columns of the table so that updated records can be identified by comparing hash columns. inserted and deleted records can be identified by left and right outer joins respectively between transaction-extract and existing-data-warehouse.

Does this sound sensible / practical?


Last edited by sman on Mon Jan 31, 2011 12:44 am; edited 1 time in total (Reason for editing : fixed a typo)

sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  hang on Mon Jan 31, 2011 1:39 am

There is a relevant topic on the forum: http://forum.kimballgroup.com/t519-fact-table-incremental-load#3546

The built in CDC feature in SQL Server 2008 would be only suitable when OLTP is in SQL Server 2008. Hopefully the post would give you some clue about loading data stored in Oracle. I think the issue is to identify and export only delta data into csv or staging area for incremental load.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  John Simon on Mon Jan 31, 2011 4:05 am

Keep it simple.
The transaction system would have dates in it for new/updated records.
You can store the date of your last process in a control table, the use SSIS to pull data from Oracle where the insert_update date in the OLTP system is greater than in your control table. The update the control table to the latest date.
This way you only pull the deltas.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Mon Jan 31, 2011 6:48 am

John Simon,

In this approach, how are the deleted records identified?

sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  John Simon on Mon Jan 31, 2011 6:19 pm

Why would you delete anything? You want to keep a history. What kind of record are you thinking of?
If a reference record is deleted, then it can remain valid in your SCDType2 and it doesn't matter. Why would the source system be doing physical deletes?

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Mon Jan 31, 2011 7:56 pm

Transaction system deletes records that are no longer valid. It does a physical delete rather than a soft delete.

When this happens, data warehouse has to mark the record as deleted by setting warehouse_end_date column. The data warehouse stages Transaction system data with time banding (ie. every table has warehouse_start_date, warehouse_end_date columns).

Eg.
Active records have '9999-12-31' in warehouse_end_date column. When a record is deleted in the Transaction system, the deleted date is entered into 'warehouse_end_date' column.


Last edited by sman on Mon Jan 31, 2011 7:59 pm; edited 1 time in total (Reason for editing : typo)

sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  John Simon on Mon Jan 31, 2011 8:52 pm

You could run a piece of SQL that does a comparison between your extract and your dimension e.g.

UPDATE a
SET a.EndDate = GETDATE() -1
FROM DimProduct a
WHERE NOT EXISTS(SELECT 1 FROM SrcProduct b WHERE a.ProductCode = b.ProductCode)


John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  ngalemmo on Tue Feb 01, 2011 3:12 pm

Oracle maintains transaction logs and there are ETL tools that can read these logs to capture change information. SSIS is not one of them.

Alternately you can modify the Oracle database and add a DELETE trigger on the tables of interest and write code to capture the key information of the row being deleted into another table.

If none of those options are available, all you have left is the old fashioned way... extract all rows from the source and compare them to the rows in the DW to find ones that are no longer in the source.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Wed Feb 02, 2011 12:31 am

Thanks everyone for the replies.

sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  herman reid on Fri Feb 04, 2011 4:38 pm

Our Data Warehouse is being hosted on our Servers. The only problem is it is off site. I have no problems with the database. I have issues with backup. I find it difficult to query strings on our domain controlled environment. I am not sure if this issue is localized on my end or on the domain itself. Need to find answers

Herman Reid
http://www.blenderreviewer.com

herman reid

Posts : 2
Join date : 2011-02-04
Location : USA

View user profile http://www.blenderreviewer.com

Back to top Go down

Oracle to SqlServer - CDC Options

Post  GMELHAFF on Tue Mar 01, 2011 3:01 pm

Feel free to contact me if you want detail. I have made I think pretty much every mistake (although always looking for new ones) that can be made and willing to share experience.

I would recommend thinking this through for your situation and not putting too much weight on one specific proposed solution, or mentioned lack of solution, in this thread. Questions like "why buy a tool if SSIS is free?" is a valid question only in certain circumstances as SSIS isn't AbInitio or Informatica or Datastage. There are more considerations than just cost. Think about your larger strategy (hopefully you have one) in terms of what you're building and its longevity and support levels. There are many options for CDC, depending on your source system and objectives so unfortunately it is one of those classic "it depends". And some of the "depends" is how much experience one has in trying different options in general and then specifically with the combination of specific ETL tool and source/target database types.

Re. some of the replies...Log scraping (cdc tools) can work but there are very significant caveats. No, SSIS/SqlServer's built in CDC doesn't work for Oracle source, but Attunity has a CDC add-in to SSIS that does. Oracle's CDC may be an option but only for certain circumstance (a SqlServer DW not being one of them). All log scraping tools require supplemental logging on Oracle and the ability to make backups wait for the log scraper to get the redo/archived redo before its compressed or moved. No you don't have to extract to flat files. No, don't just add triggers to the source tables to do custom CDC. These are all options but again, it depends. And it's these and other "depends" are what keeps you from pooping your CDC pants, as it were. Sorry couldn't resist.

Considerations that will make or break your CDC solution:
  • What is the total volume of data in your source system?

  • Is the source system COTS application or custom - what are your options for adding triggers to the source tables?

  • Are there "update timestamps" on your larger source tables and are they 100% reliable?

  • Does every source table have a primary key? (some CDC tools require them)

  • Do you need every change to rows or just state of data at specific points in time?

  • What frequency of update do you need for your DW (and are you hoping to build an active DW?)?

  • How many developers will be working with your ETL tool? (SSIS has no built-in checkin/checkout like informatica does, SSIS requires remote desktop to develop/debug using server service, etc.)

  • What's your budget for ETL tools? How important is saving money on the tool?

  • What's the cost/availability of training in the ETL tools under consideration?

  • Does your shop already have an ETL tool and expertise in that tool?

  • What's your metadata strategy? (Technical metadata is heavily dependent on ETL tools these days)

  • Does your source system have Blobs/Clobs or other hard to handle datatypes like SDO_GEOMETRY? Nearly all CDC and ETL tools do not support them)

  • Are you building a datamart or an enterprise-wide solution?


This is not a comprehensive list but should get you a good start in making sound judgement on next steps. And they should not be difficult questions to answer. If they are, then warning signs should go off and I would recommend getting sufficiently confident in the answers before making final decisions on approach and tools.

Gary

GMELHAFF

Posts : 3
Join date : 2011-03-01
Location : Olympia, WA

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  Mike Honey on Tue Mar 01, 2011 8:24 pm

My preferred method is to extract using SSIS, transform in the SSIS Dataflow and load directly into your target SQL tables. Intermediate CSVs or staging tables will slow the process dramatically and add to the testing and maintenance challenges IMO. Almost all scenarios where you might consider intermediate storage can nowadays be resolved in the SSIS Control or Data Flow.

I prefer to connect to Oracle using the Oracle Provider for OLE DB which comes with the 11G Oracle Client install in either x32 or x64 flavours. This only falls down on Oracle BLOB columns, where I revert to an ADO Net Source (.Net Providers\OracleClient Data Provider).

I retire old SCD type 2 rows using a similar UPDATE statement to the one above.

The big gotcha with Oracle-to-SQL ETL is often the extreme date issue. Oracle DATE datatype can handle more extreme dates than SQL datetime (e.g. 1/1/1693), with ugly results. I recommend using SQL datetime2.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  sman on Wed Mar 23, 2011 8:10 pm

Garry and Mike,

Thank you very much for your detailed and informative insights.


sman

Posts : 22
Join date : 2011-01-30

View user profile

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  AndersonDarrel on Mon Mar 28, 2011 5:21 pm

good post, thanks for post

AndersonDarrel

Posts : 1
Join date : 2011-03-28

View user profile http://essay-writer.org/writers.php

Back to top Go down

Re: ETL from Oracle to SQL Server 2008 Data Warehouse

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum