Normalization in DWH environment

View previous topic View next topic Go down

Normalization in DWH environment

Post  kapoor_dh on Tue Dec 08, 2009 1:24 am

Hi All,

Recently I came across a question,would like to post it here for the views of the members.

What normalization form are the Fact and dimension tables and why (with examples aligning to the Normalization definition)

Thanks,
Dheeraj

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Tue Dec 08, 2009 2:01 am

A fact table is 3nf while a dimension table is not (usually 2nf, sometimes 1nf, but rarely 3nf).

Other than an academic discussion, it really doesn't matter.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  kapoor_dh on Tue Dec 08, 2009 2:27 am

Can you provide any example with defnitions applied to them...or provide me the link which talks about the fact and dimensions and the mormalization

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Tue Dec 08, 2009 3:22 pm

Codd & Date covered the definitions of normal forms a long time ago. I'm sure the same is available on Wiki...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  kapoor_dh on Wed Dec 09, 2009 1:11 am

Most of them are on the typical transaction tables...none of them so far I have seen on the Fact/Dimensions (typical star schema),do let me know if you find else leave that to me ...;)

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

-normalizaion in dw

Post  jagadish on Fri Jan 15, 2010 8:07 am

hi
basically with respect to data organized in Datawarehousing

its not fully normalized boss...i think so,
if you look at the database design in OLTP application , its fully normalized

basically in the datawarehouse we bother about how much data we are storing,
we store the same data some times in morethan one place as it is against in OLTP design
becase we need it for analization,
the primary goal we need it for analisations,we dont bother how redundantly store the data.

it doesn't mean that its completely denoramized, it is normalised but not fully.ok
may be up to 3rd NF

better you go through with database designing techniques by CODD

but to me its not possible to fully normalize the data in DWH
thanx
jagadish

jagadish

Posts : 2
Join date : 2010-01-15

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Mon Jan 18, 2010 4:58 am

kapoor_dh wrote:Most of them are on the typical transaction tables...none of them so far I have seen on the Fact/Dimensions (typical star schema),do let me know if you find else leave that to me ...;)

Which is why it is pointless to discuss normalization in the context of dimensional design. A normalized schema is one designed using normalization rules. There is a very rigid, mechanical process by which you achieve various levels of normalization. Dimensional design, on the other hand, does not use normalization to achieve a data model. Creating a dimensional model is more art than science. While such a model, when implemented in a relational database, can be categorized to be in a particular normal form (based on the relationship between the row's contents and its key), such categorization is meaningless since normalization (i.e. ER modeling) was not used to develop the model.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  BoxesAndLines on Tue Jan 19, 2010 1:00 am

I think you discount the value of understanding normalization rules in building a dimensional model. Many of the questions posted here have roots in normalization or traditional data modeling. People new to modeling dimensional or normalized models always struggle with where do I put this attribute, should this column be null, should I collapse this hierarchy, etc. Experienced modelers fly through the modeling process in their heads without even giving a second thought to what they are doing. This is why I believe people often struggle with proper dimension design. It's irrelevant if you are a Kimball or an Inmon kool aid drinker, at some point you need, at a minimum, a normalized model to understand the data relationships and dependencies. It is an understanding of these dependencies that drive a quality dimensional model. It would be interesting to see which of Ralph's books have sold the most copies. My guess is the the Complete Guide to Dimensional Modeling.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Tue Jan 19, 2010 2:37 am

BoxesAndLines wrote:I think you discount the value of understanding normalization rules in building a dimensional model.
Yes I do. Normalization is used as a crutch in Dimensional design. Normalization is a specific methodology to achive a data model for a specific purpose, that being efficient and effective OLTP. Dimensional design, as described in DW Toolkit is a completely different methodology to achive a different data model for a different purpose (i.e. data warehousing).

To get a good dimensional design, you need to think differently than you would trying to design an OLTP database. I've seen, on numerous occasions, dimensional models getting balled up because of modelers falling back on normalization principles and second guessing themselves.

Read page 11 of Ralph's book.

And, as far as DW books go, DW Toolkit is by far the largest seller. When I asked him about it around 6 years ago, it was around 70,000 copies, which is a huge number for this type of book (10,000 is considered a best seller). It's always been at the top of the Amazon list.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  kapoor_dh on Tue Jan 19, 2010 4:15 am

"Experienced modelers fly through the modeling process in their heads without even giving a second thought to what they are doing"

Dimensional Modeling if not considered for Normalization,which is true in most of the cases is less of Modelling but more of the success comes from the fact how good are the requirements gathered and then it goes to building aggregate creation for performance tuning and more of a straight forward thing as you know what kind of stcructure you are going to come up with (Star Schema).While I am myself into Datawarehousing I see the more efforts are involved in ETL (maintaing SCDs,Error tracking etc) and Performance tuning either thru reporting tools or SQL tuning.While a lot is desired out of Modeling in OLTP and normalization scenarios,and there is another group of Modeling that lies in between 2 of them(which corporates call for real time reporting),Foundation Layer or ODS Layer.This is the kind of scenarios where you need to decide between Normalization or De-normalization or hybrid.

kapoor_dh

Posts : 24
Join date : 2009-12-08

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Tue Jan 19, 2010 1:45 pm

kapoor_dh wrote:This is the kind of scenarios where you need to decide between Normalization or De-normalization or hybrid.
Excuse me while I break into a rant... I really do not like the term 'hybrid' as if somehow the designer has discovered some unique balance between Kimball and Inmon that addresses all issues and handles everything in some magical solution. What it usually means, particulalry if they have somehow managed to put together a mix of ER and Dimensional concepts into the same model, is that they have basically produced a mess that requires a savant to understand.

There is no such thing as a 'hybrid' approach between Kimball and Inmon. The Inmon approach IS a hybrid approach. Under Inmon you build a somewhat 3NF foundation layer from which you publish dimensional data marts for end user consumption. Under Kimball, the foundation layer is dimensional, which can be queried directly by end users. So, you do one or the other... there is no 'in-between'.

There is no argument that different approaches apply to different applications (althought I would argue that a normalized model is necessary for real time DW applications), but the choice is not between a normalized model or a de-normalized model... because those are the same thing. They simply defer in a matter of degree (a model in 3NF is a de-normalized model when compared to a model in 4NF). It's all about the methodology you use to achieve the model... you are either developing an ER model or a Dimensional model. Both require different thought processes to accomplish.

To give you an example... a General Ledger Chart of Accounts is an example of a dimensional model (Note: the chart itself, I'm not referring to anything dealing with computers or databases). The concept of categorizing a value based on time (accounting period), business unit, cost center, natural account, project and so on, is an example of dimensional thinking. You cannot design a chart of accounts using ER modeling techniques and come up with a solution as simple and elegant as a COA can be.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  BoxesAndLines on Thu Jan 21, 2010 12:28 pm

Here's the rub for Kimball EDW's. The dimensional model is well proven for high performance and understandability. What it doesn't do well is reflect the underlying data relationships and dependencies. If you do not want that information then you are golden. If you do want that information, you will need a normalized data model.

Examples:
Dimension Column optionality - I need to model all of my dimension columns as NULL since I normally include a default row for not applicable.
Dimension Optionality - A fact table will have mandatory relationships to all dimensions. I can not tell whether a given dimension is mandatory or optional for a given metric.
Dimension Hierarchies - These are constructed by collapsing or denormalizing the OLTP data models. I can no longer see the hierarchy structure via the dimension data model.

I personally am interested in these things so I encourage my clients to build a normalized data model to minimally help the DW team build the dimensional models. Whether you expose this metadata to business community is personal preference. So to clarify my original response, normalization skills beyond 1NF are not necessary for building a dimensional model, however, I believe the dimensional data warehouse needs to be supplemented with a normalized model to address the previous examples.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Thu Jan 21, 2010 5:05 pm

I disagree. There is no such thing as 'dimensional optionality' for a measure in a correct dimensional model. A fact table should contain measures appropriate for the grain of the fact table. Otherwise, you are mixing grains, which is not proper modeling.

How do you model hierarchies in an ER model that are any clearer than what happens in a dimensional model? Most hierarchies in a normalized model are represented by recursive structures. In a dimensional model you could retain that (if your query environment supports it) or more often, by an explosion of the recursive structure or a flattening of the hiearachy. The latter form pretty much spells out a hierarhcy as clearly as anyone could make it.

As for dimensional column optionality... so what? That is a data entry issue, which is not what dimensional models are for. When somebody does a query, the responsibility of the data warehouse is to ensure that the data being provided is accurate.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  BoxesAndLines on Thu Jan 21, 2010 9:48 pm

ngalemmo wrote:I disagree. There is no such thing as 'dimensional optionality' for a measure in a correct dimensional model. A fact table should contain measures appropriate for the grain of the fact table. Otherwise, you are mixing grains, which is not proper modeling.

Take the ubiquitous Promotion dimension. It certainly won't apply to all facts. What gets stored in the fact is either the N/A row or the 'No promotion' row. Sometimes you know the difference, sometimes you don't. The best case scenario is the 'No Promotion' row is stored. The only way you know that Promotions are optional is by examining the data within the dimension. The dimensional data model provides no insight here.

So what!? (LOL) Maybe the business doesn't care, but the developers, modelers, and testers, all care. A data model is a great place to capture this metadata. No can do for dimensional models.

Great discussion, BTW.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Normalization in DWH environment

Post  ngalemmo on Fri Jan 22, 2010 3:28 pm

But the need to know this in a systematic manner (i.e. through database constraints) is a transactional application requirement. From a dimensional design standpoint the dimensional reference is always required (the FK cannot be null)... it is just that in some instances it is referencing a row for the not applicable case. If it needs to be documented in the metadata, adding a comment will do the trick.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Normalization in DWH environment

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum