Warehouse Structure

View previous topic View next topic Go down

Warehouse Structure

Post  jryan on Thu Sep 27, 2012 11:07 am

Hi,

I'm working with SQL Server and am about to embark on creating a DW that will hit many subject areas. I've created many SQL DWs in the past and my approach has been normally to create 1)One or more sql staging dbs 2)The SQL DW. It's the latter I want to get some help on.

In the past I've always gone with the kimball modelling approach, creating star schema facts and dimensions in a denormalised design. I've typically taken data in from 4 or 5 systems, spanning 2 or 3 subject areas, so there's been say 10 fact tables and say 30 dimensions. A bit bigger (in terms of # tables) than adventure works DW, for example.

This time around, the number of systems will be much bigger and consequently the # dim and fact tables will be much larger. I'm curious what to do from an organisation point of view. Should it be all one SQL Server database with all the dims and facts for the whole business? That could get pretty big. Should I split it up into seperate databases? Or seperate schemas? Is there a cut off point? I'm currrently favouring all in one schema, one db.

Arguabley, aside from the modelling approach, this isn't Kimabll and definitely isn't Inmon. As I understand with true Kimball there should be a number of seperate data marts and with Inmon the tables would be based on the pure data model. Is this approach of Kimball but all tables in one database a bad idea? There will be one or more SSAS cubes or tabular models on top.

Thanks

jryan

Posts : 33
Join date : 2010-09-27

View user profile

Back to top Go down

Re: Warehouse Structure

Post  min.emerg on Thu Sep 27, 2012 12:06 pm

Hi jryan

My initial apprehension of splitting the data warehouse between several databases is that it will be difficult to use conformed dimensions. If your employee dimension is in database 1, you won't be able to have a foreign key reference to it from your fact tables in database 2 (that is, a foreign key constraint enforced by SQL Server). I have never split a warehouse between multiple databases before, so I'm uncertain as to whether there's an easy way to overcome this challenge that I'm just not aware of.

There obviously are ways to split data between databases (think of the NASDAQ and Wallmart data warehouses - there's no way they can sit on a single database instance). If I were you I'd start off using a single database that contains all facts and dimensions. You can leverage schemas to segregate data, but I have typically only used schemas to support data segregation using security (i.e.: a user can see all tables in a specific schema). If you come across a good enough reason for the warehouse to spill over onto another server, do it - don't over-engineer it right at the beginning. You could always turn to SQL Server clustering if performance becomes a concern.

Just my 2c, hope it helps.


min.emerg

Posts : 39
Join date : 2011-02-25

View user profile

Back to top Go down

Re: Warehouse Structure

Post  jryan on Mon Oct 01, 2012 1:08 pm

Thanks for the reply,

I'm not too concerned over performance, more managability. E.g. how long should you let the list of tables get? If I ended up with say 400 dimensions and 50 fact tables, would that be too many objects to be manageable in one database? I'm not concerned about performance, there are ways to manage that (e.g. filegroups, partitioning, indexing etc). SQL can physically handle that amount of tables no problem, so it should be fine. I'm more thinking from an organisation point of view, does it make sense to have that many tables in one place?

I was also under the impression that Kimball advocates seperate data marts, in the sense of having seperate databases containing copies of the same dimensions. Maybe I've misunderstood that bit?

As I said though, I've always gone with the approach that you've said, which is to put everything in one database. Just wondering if that still applies for the whole Enterprise.

jryan

Posts : 33
Join date : 2010-09-27

View user profile

Back to top Go down

Re: Warehouse Structure

Post  ngalemmo on Mon Oct 01, 2012 5:56 pm

No, I can't imagine Kimball recommending each mart in its own database. A 'mart' essentially is a single star schema.

Generally you want to try to keep as much as you can on a single server. They all permit subdividing things into logical groupings (schema/database), but still allow you to integrate across them very easily. Servers can get very large, with many CPUs supporting large disk arrays.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Warehouse Structure

Post  jryan on Tue Oct 02, 2012 4:07 am

Thanks all for the help,

That clears it up - I'll stick with keeping everything in one database

jryan

Posts : 33
Join date : 2010-09-27

View user profile

Back to top Go down

Re: Warehouse Structure

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum