First DW and hopelessly out of my depth

View previous topic View next topic Go down

First DW and hopelessly out of my depth

Post  theboyholty on Mon Dec 16, 2013 7:09 am

Hi. A brief bit of history - my background is in SQL Server database development including SSRS and SSIS and I have done reporting from a data warehouse, so when I was asked in the interview for my current job whether I would be able to redesign their existing data warehouse, I said yes but everyone was aware that I'd not built one before. I read some of the Kimball book and most of what I read seemed so blindingly obvious, that I decided in my wisdom to scrap the book and just get on with building the warehouse the way I thought was best, basing most of my architechture on the existing DW and my own knowledge of building databases and applications, plus common sense.

The whole point of building a new warehouse when they already had one, was because it was designed by someone who had no experience of DW design and therefore didn't really know what he was doing and as a result its hard to manage and full of holes, not to mention hundreds of unknown products and customers (but more on that later). The fact that they've employed me with no DW experience to build them a better one is an irony utterly lost on the management here.

Anyway it turns out that a data warehouse is a whole different beast as I'm sure you're all aware and having managed 90% of the journey from blank piece of paper to working DW, I'm struggling with some of the finer points.

The basic architechture of my warehouse is as follows. We sell things, so I have a central SalesFact table. We sell products to customers so I have a ProductDimension and a CustomerDimension. We operate in global divisions so I have a RegionDimension which contains the DivisionName, Currency, ExchangeRate etc. I also have a DateDimension table. I would imagine that most of this is pretty standard stuff.

I've even built the SSIS packages to bring all this together. It works and all my components are the same sizes and lined up neatly so it looks lovely, especially when its all green.

My problem is with products, specifically handling unknowns.

We have literally tens of thousands of products because we make these big complex machines and they have loads of piddling little parts so its quite common to have new parts come through the process and often they aren't listed in my source tables for one reason or another so they get assigned as unknown products. I've added a -1 "Unknown Product" to my ProductDimension table and these rows in the SalesFact table pick these up but I don't know what is supposed to happen after that. Kimball's book says to assign these -1s but are these products supposed to remain unknown forever? How do I go about adding a new Product manually to the ProductDimension? You see my ProductDimension table is truncated daily and re-populated from the source tables so if I add a Product in, it just gets wiped every morning.

I'm thinking that I could build a table for manually added products and use a UNION ALL to add these to the ProductDimension as a part of the import process but that seems like using a sticking plaster, when a more robust solution would be better. Another option would be to use the MERGE function instead of a TRUNCATE and total repopulation of the table but again that's messy because I'd have to put the data in a staging table first.

Does anyone have any advice you can give me to better handle these unknowns?

Many thanks.

Nick

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  Jeff Smith on Mon Dec 16, 2013 5:06 pm

Why is your Product dimension getting truncated and reloaded each day?

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  ngalemmo on Mon Dec 16, 2013 7:29 pm

Truncating and reloading a dimension is really bad form. You are supposed to assign surrogate keys to dimension rows to ensure stability with the facts. This requires a process that inserts new dimension rows and updates to changed rows. Facts reference rows using the surrogate keys, that should not change.

As far as new items coming in from facts, you should infer new dimension rows. In addition to the surrogate key, there should be a natural, or business, key that identifies the product. Assuming the fact information provides the business key, you create a new row with the business key and a new surrogate key (not -1) and have the fact refer to that row using the new surrogate key. When the source system is eventually updated with that new product, you update the dimensions row, based on the business key, so that the attributes for that product are populated and become available to the fact.

The '-1' row is only for a situation where you have absolutely nothing (no business key). Such an assignment cannot be changed, unless you get an update to the fact itself.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  theboyholty on Tue Dec 17, 2013 6:15 am

Jeff Smith wrote:Why is your Product dimension getting truncated and reloaded each day?

Good question. Truth is its because this is what the previous DW did and as I know my predecessor was a Kimball disciple I assumed it was common practice. We live and learn, so I'm now in the process of fixing this. Thanks for your input.

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  theboyholty on Tue Dec 17, 2013 6:24 am

ngalemmo wrote:Truncating and reloading a dimension is really bad form.  You are supposed to assign surrogate keys to dimension rows to ensure stability with the facts.  This requires a process that inserts new dimension rows and updates to changed rows.  Facts reference rows using the surrogate keys, that should not change.

As far as new items coming in from facts, you should infer new dimension rows.  In addition to the surrogate key, there should be a natural, or business, key that identifies the product.  Assuming the fact information provides the business key, you create a new row with the business key and a new surrogate key (not -1) and have the fact refer to that row using the new surrogate key.  When the source system is eventually updated with that new product, you update the dimensions row, based on the business key, so that the attributes for that product are populated and become available to the fact.

The '-1' row is only for a situation where you have absolutely nothing (no business key).  Such an assignment cannot be changed, unless you get an update to the fact itself.

First of all thanks for your feedback and help, I can't explain how valuable this is.

I get the surrogate keys thing and I already have those implemented (I just didn't know that was what they were called). I'm working on scrapping the truncate and reload feature and I can now see what I should be doing with the ProductDimension.
I was planning to use a stored proc and the SQL MERGE function (whch is one of my favourite new SQL features) to handle inserting new dimension rows and updates to changed rows. However, I'm using SSIS for moving all my data around, doing lookups etc, so is there a specific SSIS component that can do this?

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  ngalemmo on Tue Dec 17, 2013 9:14 am

I've not used SSIS. My knowledge of it is fairly high-level. I would suspect there is a component that would do inference, as it is a common technique.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  theboyholty on Tue Dec 17, 2013 9:18 am

ngalemmo wrote:I've not used SSIS. My knowledge of it is fairly high-level.  I would suspect there is a component that would do inference, as it is a common technique.

Well just as a matter of interest then, if you don't mind me asking, what do you use if not SSIS (which I, in my limited experience, had assumed was the industry standard for getting data into a warehouse and processing it)?

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  ngalemmo on Tue Dec 17, 2013 9:40 am

SSIS is a Microsoft product that comes with SQL Server.  There are a host of other ETL products (Informatica, Data Stage, Ab-Initio, etc…) that work with any database system.  They vary in how things are done, but overall they are all decent products.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  BoxesAndLines on Tue Dec 17, 2013 9:47 am

If you have SQL Server, then the vast majority use SSIS.  I've only seen one that does not.  If you have a different database(s) then here you go, .

As you can see, SSIS is a niche player since it is only used on SQL Server.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  theboyholty on Tue Dec 17, 2013 9:59 am

Ah ok. I'd naively assumed we were all Microsoft SQL Server users as it the only DB system of note that I've ever used & probably ever will.

Thanks all.

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  ngalemmo on Tue Dec 17, 2013 10:28 am

Oracle was the most used DBMS for data warehousing for a long time, but I would guess SQL Server has probably surpassed them in number of implementations.  SQL Server implementations tend to be smaller than Oracle implementations.  Combined they are probably 60-70% of the market by # of implementations.  Other players include (in no particular order) IBM DB2, Teradata, Netezza (aka IBM PureData), Vertica, Greenplum, ParAccel, and many others.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  Jeff Smith on Tue Dec 17, 2013 12:07 pm

We're on SQL Server but use Cognos Data Manager as the ETL tool. We are planning to switch to Infomatica in 2014/15.


Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  BoxesAndLines on Tue Dec 17, 2013 12:21 pm

#2. :-)
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  Mike Honey on Tue Dec 17, 2013 10:24 pm

Hi Nick,

I use SSIS, and it does have a "Slowly Changing Dimension" wizard which builds components to deliver dimension data. However it has a quite a few flaws and quirks, and architecturally it processes row-by-row, so I wouldnt recommend it.

I normally build an SSIS Data Flow task to populate a Staging table, then follow that with an Execute SQL task to run a SQL MERGE statement.

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  theboyholty on Wed Dec 18, 2013 3:13 am

Mike Honey wrote:Hi Nick,

I use SSIS, and it does have a "Slowly Changing Dimension" wizard which builds components to deliver dimension data.  However it has a quite a few flaws and quirks, and architecturally it processes row-by-row, so I wouldnt recommend it.  

I normally build an SSIS Data Flow task to populate a Staging table, then follow that with an Execute SQL task to run a SQL MERGE statement.

Good luck!
Mike

Cheers Mike, that was initially my thinking only I don't like to use staging tables too much as they end up cluttering the database. I'll probably have a play with the Slowly Changing Dimension wizard but I expect I'll end up using MERGE, as I'm sure you'll agree its one of the best features to be added to SQL Server in years.

theboyholty

Posts : 6
Join date : 2012-12-18
Age : 44
Location : Bury, Lancashire

View user profile http://www.mannyroadend.co.uk

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  Mike Honey on Wed Dec 18, 2013 6:44 pm

Hi Nick,

Yes SQL MERGE is effective, but I wish they had implemented it as an SSIS component - that might've avoid needing staging tables and the pages of custom code required to drive a SQL MERGE.

If you are trying the SCD Wizard, here is some info on its limitations:
http://forum.kimballgroup.com/t1772-microsoft-ssis-bids-scd-type-2-development
http://connect.microsoft.com/SQLServer/feedback/details/574910/ssis-slowly-changing-dimension-scd-transformation-with-fixed-attributes-never-detects-changing-attributes
http://connect.microsoft.com/SQLServer/feedback/details/586000/ssis-slowly-changing-dimension-component-spurious-error-inferred-indicator-column-must-be-of-type-dt-bool
http://connect.microsoft.com/SQLServer/feedback/details/658605/ssis-slowly-changing-dimension-component-does-not-detect-historical-changes-in-dt-dbtimestamp2-datetime2-columns

Good luck!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: First DW and hopelessly out of my depth

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum