Start Schema vs Snow flake schema

View previous topic View next topic Go down

Start Schema vs Snow flake schema

Post  indy on Wed Mar 10, 2010 5:57 am

In my organisation we are on a way to implement data warehouse and data mart for sale.
we have one dimension City/District/Store.
1. Store name can be split/merge from one district to other in next quarter.
2. there is possibility of changing City name e.g. london to east london, west london.

issue :
My team have two different opinion to design this dimension.
1st view is to create three separate table join them CITY->District ->Store and another view is to create a single dimension with surrogate key and implement SCD2.

could you please add your opinion to solve this issue

regards

indy

Posts : 3
Join date : 2010-03-10

View user profile

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  ngalemmo on Wed Mar 10, 2010 12:54 pm

I would go with the single dimension, be it either a 'junk' dimension or a type 2.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  indy on Thu Mar 11, 2010 3:49 am

Thanks !
In your point of view, does following statement having any sense "It is good idea to create snowflake schema (normalise dimension.. City is further divided into 3 tables) and one dimension in reporting mart.
In that way if there is change in name of city or merger of branch, it will be easy to manage."
?

Regards

indy

Posts : 3
Join date : 2010-03-10

View user profile

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  ngalemmo on Thu Mar 11, 2010 1:13 pm

If you are building a dimensional data warehouse, the warehouse and the reporting mart are one in the same thing. The mart is simply one piece integrated with the rest of the marts through conforming dimensions.

To manage description changes where multiple values are denormalized in a single dimension, what I do is define simple list of value support tables. These tables are not part of the warehouse per se but are very useful to manage maintenace of dimensions as well as support list of value functions in various reporting and query tools.

I would have the denormalized dimension with the codes and descriptions of store, city and district as well as three separate LOV tables with code/description (and other related attributes) for each. I would have an ETL process that pulls the code/description from the source and update the LOV tables. I would use these LOV tables as lookups in the ETL process that updates the dimension table itself. I would also have an update trigger on the LOV table so when a description changes, it updates the descriptions in the appropriate dimension tables. I would not expose these LOV tables to the end user except as a list of values when selecting parameters for query filters.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  indy on Fri Mar 12, 2010 11:02 am

thanks !
It make sense.

indy

Posts : 3
Join date : 2010-03-10

View user profile

Back to top Go down

RNF in Staging Area

Post  Jeff Smith on Fri Mar 12, 2010 3:08 pm

I think it's a good idea to keep things as normalized as possibly in the Staging Area.

One way to look at a Data Warehouse is that it is made up of 2 types of databases - one type is the Data Marts and the other type is the Staging Area. I use the term "Types" because the Staging Area can be one or more databases and the Data Marts can be in one or more databases. The key being the use of conformed dimensions.

I think a good way Staging Area Vs Mart is a good way of separating work. The Staging Area is where the Source Data enters the stream. This database can be more IT-centric, using 3NF, with the focus being on getting the data accurately into the DW enviornment. We even have a Reference Database that contains the Hierarchies that don't exist in source systems as well as the values for goals or targtet scores all in 3NF. Keeping the staged data in 3NF makes building the dimension tables much easier.

The Marts are more Business Unit Centric. The design of the marts are intended to be easy to use and fast. This is also the place where you might want to create aggregate tables and modeled data. For example, we have a "Household Type" that is based on products purchased by the household that exists in the marts.

Some organizations make a clear delineation between these 2 types or areas of the DW with 2 different groups supporting them. Others have one group doing it all. In either case, staging data in 3NF is a good idea.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  hang on Sat May 15, 2010 9:56 am

Hi, I am a new comer to the forum and just started going through the topics.

I found this thread is very relevant to what I have been through in my past five yeas of learning and practising dimensional modelling in particular and data warehousing in general. The term Staging Area sometimes has been loosely used to represent any upstream data stores compared to the Dimensional Data Store. I would call the type of Staging Area described by Jeff as Operational Data Store (ODS) as it is normalised relational area.

In one of my experiences building data warehouse, part from loading data from a main OLTP source in Oracle, I was asked to create an area where some Excel data can be incorporated into data warehouse directly. In addition, users want to enter the data into the data warehouse through front-end application. In four of my Kimball books, ODS was introduced as a less desirable part of data warehousing and I had no idea how it can become a justifiable component of data warehouse, until I found this book "Building a Data Warehouse: With Examples in SQL Server" by Vincent Rainardi. Armed with both Kimball's visionary theory and Vincent's generous give away of his valuable works in form of full sample scripts, I have successfully delivered two DW projects with ODS as a necessary component. In Vincent's book, he also touched on the concept of Master Data Management, which I guess is similar to Ngalemmo's elaboration on managing LOV tables.

In my understanding, the Staging Area is supposed to be highly volatile and should not be used to enforce your normalisation heavily for data quality or ER purpose. If the source data system is not properly normalised or does not exist in form of database, you may have to create an separate ODS in 3NF as a firewall to protect data quality. Yeah, you almost have to create an enhanced OLTP system. I believe a good dimensional modeller must be a good relational modeller to be able to create real life data warehouses. Actually in an idea dimensional model, all fact tables should be as normalised as possible while keeping all dimensions as denormalised as possible. The normalisation on fact and denormalisation on dimension is not a hybrid arrangement but rather a guiding principle for dimensional modeling.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Start Schema vs Snow flake schema

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum