Extracting data when there is no timestamp in the source system

View previous topic View next topic Go down

Extracting data when there is no timestamp in the source system

Post  DWH on Sat Feb 20, 2010 9:00 am

Guys, what are your strategies to extract data when the source system does not have any timestamps. Is there any way to still get deltas ?

Thanks

DWH

Posts: 2
Join date: 2010-02-18

View user profile

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  beyeguru on Thu Feb 25, 2010 7:01 pm

Your question is too general. Are you trying to extract transaction or dimension records?

beyeguru

Posts: 5
Join date: 2009-08-03

View user profile

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  DWH on Sat Feb 27, 2010 8:37 am

You are right beyeguru, it is very general. But I wanted to see how people have solved this problem. It can be for transactions or dimensions. We can always use CRC to do this but than we would have to load all the data in the staging everyday and then compare CRC. Has anyone used other ways ?

Thanks

DWH

Posts: 2
Join date: 2010-02-18

View user profile

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  BoxesAndLines on Sun Feb 28, 2010 12:49 pm

I use a checksum if I cannot get a delta feed.

BoxesAndLines

Posts: 231
Join date: 2009-02-04

View user profile

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  mselway on Fri Mar 19, 2010 4:18 am

So long as the table has a primary key you can left join the destination table to get only the new records.
Example:

Select s.Attribute1, s.Attribute2, s.Attribute3......
From SourceTable s
Left JOIN TargetTable t
ON s.PrimaryKey = t.PrimaryKey
Where t.PrimaryKey IS NULL

this will retrieve the records not already in the target.

mselway

Posts: 4
Join date: 2009-10-14
Location: London, England

View user profile http://www.swiftcover.com

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  mugen_kanosei on Mon Apr 12, 2010 12:14 am

I don't know if other database systems have something similar, but for Oracle, I used the ora_rowscn psuedo column because I didnt trust the application controlled modify time field. Its a psuedo column that stores the system change number that gets generated for each transaction. Unfortunantly it applies the SCN to all the rows in the block unless you use rowdependencies. There is an article at OraFaq about it. I keep track of the last ora_rowscn transaction number in my warehouse, and then pull anything greater than that. That will pull all the modifications as well as any other rows that were in that database block. From that I find out the delta to only get the actual modified columns. Oracle provides a better way with Change Data Capture, but due to reasons outside my control, I'm unable to implement.

mugen_kanosei

Posts: 11
Join date: 2009-02-04
Age: 28
Location: Japan

View user profile

Back to top Go down

Re: Extracting data when there is no timestamp in the source system

Post  ngalemmo on Mon Apr 12, 2010 10:21 am

Interesting... I wasn't aware of ora_rowscn. It is certainly useful when there is resistance to modify the OLTP system to support DW extracts... which is usually the case 99% of the time.

Unfortunately, you probably can't index it... so you wind up doing table scans. But, for lack of anything better, it will significantly reduce the number of rows you need to evaluate in the ETL process.

ngalemmo

Posts: 557
Join date: 2009-05-16
Location: Los Angeles

View user profile http://dimensionaldw.com

Back to top Go down

View previous topic View next topic Back to top


Permissions of this forum:
You cannot reply to topics in this forum