Automating dimension and fact loading ETL at the database level - Good or Bad?

View previous topic View next topic Go down

Automating dimension and fact loading ETL at the database level - Good or Bad?

Post  js2794 on Wed Jul 27, 2011 7:46 pm

I got started on a major data warehouse project a few years ago with a lot of experience in programming, SQL, and report development but very little experience in data warehousing. Although I would have preferred to go with Microsoft, funds were non-existent and we already had an Oracle database that wasn't being used. Unfortunately, the only ETL functionality Oracle offers with the RDBMS is the "Basic" version of Oracle Warehouse Builder (which is basically limited to non-heterogeneous, non-OLAP mappings with limited access to some basic transformations)... Having worked with OWB's buggy UI in the past, I didn't see any reason to start using it again.

I went the "home-grown" route with the goal of making it easy to build out future data marts based upon templates I put together early on. Having very little experience in data warehousing, I was quite surprised to see just how similar my templates ended up looking to the Excel spreadsheet code generators developed by the Kimball Group (mine were also Excel spreadsheets).

One major difference between what I ended up putting together and just about every thing else I've seen out there is that the code my templates would generate essentially did away with the need to perform complex ETL to load/update dimensions. The reason I went this route was because I thought it would be much easier if I could update my dimensions the same way I could update any other normal table--with plain, old SQL DML. I could also still leverage the "basic" OWB features this way, too (but later decided it would actually be easier to evolve the code templates to handle the basic ETL as well).

How it works is simple: Each dimension has a database package associated with it along with some triggers and other database objects which perform a lot of the "behind-the-scenes" work--this is all generated based upon the specifications in the Excel document. The end product is a dimension table that can take a SQL DML command like:

UPDATE dim_table_name
SET some_type1_attr_column = 'New Overwritten Data',
  some_type2_attr_column = 'Data Triggering A New Record'
WHERE some_other_column = 123

Note that the "WHERE" clause is not limited to have the natural keys--in fact, I could issue the command without a WHERE clause at all if I wanted, but that would probably upset quite a few clients. I could also issue commands like:

DELETE FROM dim_table_name
WHERE some_column = 'Some String'

Obviously, this command wouldn't actually delete the rows--it would just expire them (I left a "backdoor" method open to allow me to actually delete rows: Before issuing the DELETE statement, I would first need to update a specific "control" column which would then disable the automatic SCD processing for those specific rows).

So--now to my question! Is there a reason why something like this isn't being used more often (or in commercial products)? I am just now beginning to work with some real ETL tools to replace my code templates-based solution and beginning to realize that some of these ETL products really stink at data warehouse-specific functionality. Obviously, it's a good idea to use a standardized ETL tool over home-grown code for maintainability but beyond that factor--are there any other reasons why these custom trigger-based dimensions should be avoided?

Thanks in advance for the help!



Posts : 1
Join date : 2011-07-27
Location : California

View user profile

Back to top Go down

Re: Automating dimension and fact loading ETL at the database level - Good or Bad?

Post  robber on Fri Aug 12, 2011 8:35 am

The concept is great and taking it a step further rather than using excel capture your specifications in a database (metadata repository) and then generate all your objects based on those specifications. Start with your source system definitions, then staging, then dims and facts. Now you have the capability to generate full documentation including data lineage and impact analysis. There are a handful of commercial products out there that use this metadata driven approach to data warehousing...


Posts : 41
Join date : 2009-02-28
Location : Canada

View user profile

Back to top Go down

Re: Automating dimension and fact loading ETL at the database level - Good or Bad?

Post  Tootia on Wed Aug 31, 2011 2:04 am

What you are writing (your templates) is the core functionality of ETL, however ETL is not only about updating the current records, normally we need to keep the history of changes in records, that's where using an ETL tool are useful. These tools make is easier for us to do table comparison and history preservation, something that need lots of effort to do using pure SQL scripts.

Still we can use the SQL scripts to test the jobs you demonstrate the mapping rules for each table in DW.


Posts : 7
Join date : 2011-08-30
Location : Australia

View user profile

Back to top Go down

Template driven ETL by saas

Post  Amentzer on Wed Sep 07, 2011 1:20 am

For template driven ETL based on MS SSIS check out this SAAS tool

"Leganto" is a saas solution which accelerates the building of data warehouses. By using predefined dynamic templates leganto can generate your complete business model from staging and data warehouse tables to ETL packages


Posts : 1
Join date : 2011-09-07

View user profile

Back to top Go down

Re: Automating dimension and fact loading ETL at the database level - Good or Bad?

Post  Sponsored content

Sponsored content

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum