Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Page 2 of 2 Previous  1, 2

View previous topic View next topic Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  BoxesAndLines on Tue Apr 12, 2011 8:03 pm

Sanjay, your solution addresses distribution of fact data. The problem is all the of the dimensions have a different primary key and primary index. All joins will cause redistribution which is a performance hit for Teradata. I have heard that aggregate join indexes solve this problem but I have not personally tested it. The downside to AJI's is the additional storage requirements. The fact is, you can't just drop an Oracle or SQL Server dimensional model on Teradata and expect similar results. It also seems that building dimensional models on Teradata is akin to fitting a square peg in a round hole. You're not leveraging the full power the Teradata offers (which on normalized models is quite amazing).
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  ngalemmo on Wed Apr 13, 2011 1:15 am

It also seems that building dimensional models on Teradata is akin to fitting a square peg in a round hole.

Agreed, but the issue is with Teradata in particular, not MPP architectures in general. Netezza, for example, particularly with their newer TwinFin systems, does a really good job with star schema. Other vendors, such as Aster, have made accomodations, such as allowing replication dimension tables across processors, to resolve the co-location issue.

avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  seadog2010 on Wed Apr 13, 2011 5:25 pm

"It also seems that building dimensional models on Teradata is akin to fitting a square peg in a round hole."

Rightly put.

Few questions for you to answer:
1) Are you creating just a data mart or a data warehouse?
2) Who would consume the data? BI tool and/or power users?
3) If BI tool, what BI tool? (most BI tool are dimensional model friendly, so eventually, you will have to create a dimensional model anyway)


1. Currently have about 24 data marts, most built on a semi-relational layer (integration layer).
2. Consumers - BI tool, services, and of course power users. In short, just about any consumer you can think of. Overnight reports & real-time queries.
3. Cognos, plus and in-house SQL access tool based on SSRS.


So, if Teradata is the problem, and I'm stuck with it, how best to distribute and co-locate? I'm not a DBA, don't care to be one, but unfortunately I rely on them to fine-tune the system based on how I convey the mart will be used/accessed. I've revisited past marts and found the performance was lousy - the DBA did not take into account how the mart would be accessed, and discussed, and so took the easy route and slammed it in. Keep in mind that our Teradata DBA's do not like dimensional modeling, so it's a catch-22.

We're migrating from using dimensions based on roles (one each for customer, ee, broker, although these are all variations of a 'Party') to using dimensions with multiple role views (one Party Dim, but many role views such as Customer, EE, Broker, etc). How best to distribute this heavily used dimension, as well as others (like TIME_DIM, STAT_DIM, etc) and the fact tables to make this beast work?

In layman's terms, what approaches should be taken? I'd hate to think most dimensional models need to have flat report tables placed on top of them in order to get performance.

seadog2010

Posts : 23
Join date : 2010-03-04

View user profile

Back to top Go down

Star schema the fact table is always fully normalised?

Post  twofivepie on Fri Apr 15, 2011 8:42 am

PHough wrote:... in a Star schema the fact table is "always" fully normalised and optimised....

Hi Paul,

I have been reading through this very interesting thread and noticed this comment from you and I wondered a lot about it. My understanding is that a fact table is DE normalised in the sense that it repeats information. For example if you had a fact table of order line items you would have the order key and the customer key in each order line item row. In a normalised form you would have a customer order_header table and a customer order_line_items table, joined on order number and the customer key would only exist once, in the orders header table.

In what sense were you using the term 'normalised'?

regards

Simon

twofivepie

Posts : 9
Join date : 2011-04-14

View user profile

Back to top Go down

Does this discussion hinge on whether its only BI or BI AND MDM being considered?

Post  twofivepie on Fri Apr 15, 2011 8:52 am

In the Glossary section of the Kimball book "Data Warehouse Lifecycle toolkit" they declare in the entry for EDW:

"...Others in the industry refer to the EDW as a centralized, atomic, and normalised layer of the data warehouse, used as a source for dimensional data marts that are created and published to the user community on-demand. We discourage this interpretation of the EDW, preferring to think of the EDW as the largest possible union of presentation server data. "

This seems to be a summary of this thread so far.

I am relatively new to this field, but it seems to me that one can come up to the idea of an EDW from two directions.

Firstly, you may want a normalised EDW for use as the centre of a MDM solution where you want your 'single version of the truth' shared between applications. In other words, your motivation for an EDW is for operational efficiency and interworking between applications.

Alternatively you might come at it from the Kimball direction where you don't care so much about the imperfections/inconsistencies in the source applications (they are just issues to be dealt with), but you do want a set of conformed star-schema data marts as the end-product for BI efficiency.

I am involved in a relatively newly set-up Information Management team within an IT group and it seems to me that I want both a shared 'single-version-of-the-truth' warehouse available for real-time use within applications AND I want a set of conformed datamarts for efficient analytics. So the question I am grappling with is whether the Kimball datamart building techniques are suitable for building both, or whether in reality, they are targeted at operational reporting and business analytics - and, whether these can be one and the same thing?

regards

Simon

twofivepie

Posts : 9
Join date : 2011-04-14

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  ngalemmo on Fri Apr 15, 2011 2:13 pm

twofivepie wrote:
PHough wrote:... in a Star schema the fact table is "always" fully normalised and optimised....

Hi Paul,

I have been reading through this very interesting thread and noticed this comment from you and I wondered a lot about it. My understanding is that a fact table is DE normalised in the sense that it repeats information. For example if you had a fact table of order line items you would have the order key and the customer key in each order line item row. In a normalised form you would have a customer order_header table and a customer order_line_items table, joined on order number and the customer key would only exist once, in the orders header table.

In what sense were you using the term 'normalised'?

regards

Simon

It's irrelevant, other than academicly, to describe a star schema in terms of normal forms, since normalization techniques are not used to develop the model.

However, if one was to desribe a fact table (and only a fact table) in terms of its normal form, it would usually be 3NF, as the contents of the row are bound by its key (i.e. the dimension keys make up the primary key) and only the key.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  twofivepie on Fri Apr 15, 2011 5:32 pm

ngalemmo, I guess it could be argued that is academic, but the title of the post is '3NF vs dimensional modeling', so I just wanted to clarify my understanding.

I maintain that a fact table is not in 3nf normal form, because it contains redundancy in potentially several keys - a sales order fact table may have many dimension keys not required for unique key in to the fact, after sales order number and line number, you may have order date, sold-to, salesman, payment terms that will all be characteristics of the order, but which will be repeated on each row. If this was 3NF, you would have an order header table and and order lines table. Or have I misunderstood 3NF in this context?

regards

Simon

twofivepie

Posts : 9
Join date : 2011-04-14

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  twofivepie on Fri Apr 15, 2011 6:02 pm

ok - now read Chapter 6 - Introducing Dimensional Modeling, "The Data Warehouse Lifecycle toolkit" and Kimabll does indeed say "...fact tables are typically normalised to 3NF in a dimensional model because the related context is removed to dimension tables." - hmm...

twofivepie

Posts : 9
Join date : 2011-04-14

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  hang on Fri Apr 15, 2011 8:08 pm

Maybe I am wrong, but as far as I can remember from one of Kimball's books, fact tables are in 3NF and dimension tables 2NF.

My understanding about 3NF is the repeated textual attributes should be removed and put in another table and then have FK in the subject table. So repeating keys instead of contents is 3NF thinking.

Another guiding principle about dimensional modeling is normalised facts and denormalised dimensions. However when we talk about applying 3NF for data warehouse system is really about snow-flaking/normalising dimensions. As far as fact tables are concerned, true dimensional modeling practitioners always advocate normalisation more vigorously than any relational thinkers in an OLTP system.

The key objectives on the dimensional modeling is, normalised fact will deliver the best performance and denormalised dimensions deliver the easy of use (and possibly performance as well). In layman's term, big (deep) fact tables should be normalised while small dimension (shallow) tables should be denormalised.

However the guidelines should never be used as dogmatic rules dictating your design. Kimball has pointed out, in his books and articles, many scenarios in which you should be more flexible about the guidelines. For instance, a bridge table needs to be introduced for multivalued dimensions, a monster dimension should be normalised to avoid performance and storage issues. I noticed in both relational and dimensional camps, many people go from one extreme to another and rarely try to take advantages from other side, or even worse, applying the methodology in wrong situations.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  ngalemmo on Sat Apr 16, 2011 1:39 am

twofivepie wrote:ngalemmo, I guess it could be argued that is academic, but the title of the post is '3NF vs dimensional modeling', so I just wanted to clarify my understanding.

I maintain that a fact table is not in 3nf normal form, because it contains redundancy in potentially several keys - a sales order fact table may have many dimension keys not required for unique key in to the fact, after sales order number and line number, you may have order date, sold-to, salesman, payment terms that will all be characteristics of the order, but which will be repeated on each row. If this was 3NF, you would have an order header table and and order lines table. Or have I misunderstood 3NF in this context?

regards

Simon

If you were applying normalization you would not have a dimensional model. I do not describe dimensional models in terms of normal forms. Wither one thinks a fact table is one normal form or another doesn't matter because it is a fact table, not a table or entity in the ER modeling sense. It is just that people like to give labels to things and most of the time a fact table is considered to be in 3NF or something close to it.

As I said, its irrelevant.

The thread is not... it started out as a discussion as to wither a normalized data store (Inmon) is more/less appropriate than a dimensional data store (Kimball).
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  ngalemmo on Sat Apr 16, 2011 1:44 am

As far as fact tables are concerned, true dimensional modeling practitioners always advocate normalisation more vigorously than any relational thinkers in an OLTP system.

Really? When I'm desiging a dimensional model, normalization is the farthest thing from my mind. When I teach the subject I always recommend that any ER modelers forget everything they've learned before...

avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  BoxesAndLines on Wed Apr 20, 2011 7:11 pm

ngalemmo wrote:
As far as fact tables are concerned, true dimensional modeling practitioners always advocate normalisation more vigorously than any relational thinkers in an OLTP system.

Really? When I'm desiging a dimensional model, normalization is the farthest thing from my mind. When I teach the subject I always recommend that any ER modelers forget everything they've learned before...


That's because normalization is ingrained into your thought processes. After normalizing models for years, you don't actually think about normalization, you just know how the attributes should be modeled. I find this is still a valuable skill in teaching how to appropriately model a dimension. The beginner dimensional modelers all struggle with the attribution of their dimensions because Kimball never provides a methodology for attributing dimensions.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  ngalemmo on Thu Apr 21, 2011 2:18 am

That's because normalization is ingrained into your thought processes.

...not my thought process. I started designing databases before relational databases became available. Normalization was just a topic of a research paper then. When I design star schema I think in terms of business events, states, and context.

For example, an employee belongs to a department. Is it one dimension or two? Normalization demands it be two tables, no question. In dimensional modeling, on the other hand, it depends on what that relationship means to the business... its it just an attribute or part of a more complex organizational structure?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  Jeff Smith on Thu Apr 21, 2011 12:14 pm

But 3nf is useful in the staging area. It certainly makes creating the Dimension tables easier.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Reasons to use a 3NF design over a Dimensional Model design for an EDW - Discussion

Post  Sponsored content


Sponsored content


Back to top Go down

Page 2 of 2 Previous  1, 2

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum