Data Vault vs Kimball

View previous topic View next topic Go down

Data Vault vs Kimball

Post  itcouple on Tue Nov 27, 2012 6:07 am

Hi

I'm trying to understand how Data Vault Modelling fits Kimball. I have limited knowledge of Data Vault and I understand it as a way to store data without modifications.

From this point of view it look to like Data Vault is "before" Star Schema and I would probably place some kind of MDM (for dimensions) in between for cleansing purposes (business Interpretation + extra reporting functionality like order). Is that accurate or is Data Vault supposed to replace Kimball way for building Star Schema?

Data Vault is quite appealing for me as it might potentially solve the problem of losing data and not being able to re-create history or re-create dimensions (switching to SCD) and obviously audit is becoming very critical for more and more organizations.

I would appreciate your comments.
Regards
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  ngalemmo on Tue Nov 27, 2012 12:25 pm

Data Vault is a store and publish architecture. It is similar to Inmon, the difference being the technique used to model the data store. Like Inmon, users do not directly query the store, instead, data is published to data marts (star schema) or other forms of extracts.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

what about Managing Dimensions

Post  itcouple on Tue Nov 27, 2012 5:41 pm

Thanks for your reply... I think one missing link in my understanding is where does MDM fits in. By MDM I just want to limit it to managing dimensions like Codes, Sorting, Extra categorization and hierarchies etc and...... modifying data... by modifying data I mean descriptive fields like mapping old codes to new codes or taking data from system that has business key and wrong values for attributes and replacing them with 'master data'.

Many thanks
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Data Vault

Post  itcouple on Thu Mar 07, 2013 7:58 am

I've almost finished reading Data Vault Book and I must admit a see a lot of sense in Data Vault. Benefits differ depending on project and industry but recently I work more in healthcare projects (UK / NHS) and it would simplify development and minimize re-work, obviously that doesn't mean I after over 5 years of using Kimball I won't use; as there are many more factors that needs to be taken into consideration and Kimball is most popular (does not mean it is always implemented properly) but I have a feeling Data Vault will be more and more popular in the next several years.

I hope after I finish the book (last 20%) it will give me info how data marts are built and if there are any differences between Data Vault Mart and Kimball DW/Data Mart.

Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  BoxesAndLines on Thu Mar 07, 2013 9:22 am

Please keep us posted on your findings.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  hang on Fri Mar 08, 2013 9:50 am

Very interesting and controversial topic. IMHO, no matter how much we love dimensional modeling, Data Vault definitely deserves some attention.

Obviously, itcouple is not alone. I have also been spending some time on Data Vault, after experiencing some major misuses under the name of Kimball dimensional modeling. To me, Data Vault is closer to data, or system if you like, while dimensional modeling is closer to business.

As a data modeler, I want to be closer to data so that I can have better control over the data structure that has more disciplined approach. As a BI or report developer, I may want to present every piece of information that makes business happy. I can see there are two distinctively different focuses within the data warehouse, similar to the line we are always drawing between OLTP for operational and OLAP/BI systems for analytic purpose.

I guess what dimensional modeling is trying to do is to achieve data storage, performance and business consumption in one hit. On the other hand, Data Vault focuses more on system; how historical data should be structured and stored efficiently, how data can be easily partitioned vertically for parallel processing to achieve better performance. However Data Vault does rely on downstream star schema for user consumption, as it is too normalized for business to query.

With dimensional modeling, modelers can more easily make critical mistaks and still believe the model is using Kimball methodology. It is very common that a fact table is modeled as a dimension under the name of dimension conformance, should a relationship be in a fact table or bridge or a dimension. Do we need SCD2 historical dimensional context, or business is always interested in current view regardless of how historical and futuristic SCD2 is. At the end of the day, it is case by case treatment, and misusing an approach perfectly suitable in other case could be very detrimental, even worse than just copying ER straight into DW.

It seems to me Data Vault is more disciplined, like normalization. Violation can be more easily identified. You don't need to declare what fact or dimension is, therefore avoiding mistakening fact for dimension. I do have one dilemma about Data Vault. That is how the downstream star schema should be modeled to cater for trend analysis that can be elegantly realized by SCD2 and fact table with time series. I believe Data Vault only assumes current view star schema as its presentation.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Fri Mar 08, 2013 10:32 am

Hi

I completely agree with you. I believe I've read some articles that give some common mistakes with Kimball but actually by using Data Vault the problem is largely minimized. Two main aspect I have in mind are SCD where you change your mind (or it was implemented incorrectly).. but you cannot recreate the data. Second is idea of conformed dimensions/fact which is very important but tremendously increases complexity and in most small/medium projects I worked on is not feasible as communication between business and IT is very poor so business expects IT do it without any structure (framework) which often ends up in high Total Cost of Ownership not because Kimball is not the way to do it but because people are not ready.

The key element is business doesn't know what they need until it is build so I can use Data Vault which is an extra step (we cannot argue that) BUT you know that you can store data and postpone part of data profiling and data cleansing and meanwhile build a PoC (I use PowerPivot) show to business, let them use it for a while and then do core "presentation" layer development. It also fits in very well with my idea of self-service which is Excel + Power Pivot (cube) pre or post Data Cleansing (depending on case), obviously Self Service BI has it's own challenges but Data Vault is still simple design as you have in most cases Hub + satellite (= Dimension) and Link + Satallite (= Fact with Degenerate Dimension)

What I also particularly like about Data Vault is that you have full access to raw data and that makes everyone's life much easier.

Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  BoxesAndLines on Fri Mar 08, 2013 10:51 am

Data vault has the same inherit problem as Inmon's CIF. It's too expensive. It's another layer in the information lifecycle. More modeling, more ETL, more servers, etc. With all the added work, it takes more time to enhance the EDW. A change that should take a couple of days now takes a month. That's a deal breaker for all but the biggest companies.

avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Fri Mar 08, 2013 11:37 am

Too expensive? I think that will depend on case by case basis. I have worked on a number of project and most of them are not massive so I don't see an increase in hardware costs almost at all in these projects and I actually see decreased of "hidden cost" which is something that is not visible in the first 1 year, but is rather obvious when you come in for a project which runs for 1-2 years and you see everyone wants to re-design it again and again.

Most costs (for most of projects I worked in which are not massive) are perms/contractors so "fluent" development without major pitfalls is very important.

Let's bear in mind that I have used Kimball (only) for over 5 years and I do like it and will use it but in certain environments I can say Data Vault is a better choice which doesn't mean Kimball will not work because it will but often I see people thinking they do Kimball but actually they don't do Kimball and I believe these people will be better of with Data Vault mainly because it is much simpler.

Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  hang on Fri Mar 08, 2013 7:27 pm

itcouple wrote:I can say Data Vault is a better choice which doesn't mean Kimball will not work because it will but often I see people thinking they do Kimball but actually they don't do Kimball and I believe these people will be better of with Data Vault mainly because it is much simpler.

I believe, majority of the existing DW's are falling into this category, and hence very high rate of failure in the fields. What is hard to put an effective dimensional modelling in place is to strike a good balance in your design. Most modellers don't like balanced approach, or they get carried away by one way or the other. Once they have learnt dimensional modelling, they want to move away from normalisation completely or get stuck with what Kimball said in his early books without coming back to see what Kimball's new thoughts are. I would be interested to know how many dimensional modellers understand the following points:

- Why should we sometimes have natural key, or durable key per se, in fact tables.
- What is the difference between Factless table and Bridge.
- Why a common degenerate dimension in multiple fact tables is also a perfect form of dimension conformance.
- Have you tried to avoid accumulating snapshot fact table and model it as a dimension with whole bunch of date outriggers.
- Under what circumstance, a dimension should be normalised, or even completely normalised if applicable.
- How is a bridge table leveraged to keep a fact table to stay in its natural grain.
- Is it true that most of businesses don't care about historical dimensional context, therefore SCD2 is only used as exception.
- Do you know in most cases, a time dimension should be avoided by having timestamp in fact table if time of day is required.

I know I did not get most of these points in the first a few years of dimensional modelling practice and I have also realised in all those early projects, any pitfall caused by ignorance on these points is flaw of the model. And bear in mind, Kimball's dimensional modelling is a methodology that evolves with new thoughts coming along at later time. I think alternative methodology should have their space in DW, so that new thoughts on dimensional modelling can be introduced to overcome many issues we are facing today. If you still believe no other methodology should live alongside with dimensional modeling, hopefully the following article from Kimball group might change your mind.

http://www.kimballgroup.com/2012/08/01/design-tip-148-complementing-3nf-edws-with-dimensional-presentation-areas/

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  ngalemmo on Fri Mar 08, 2013 8:10 pm

I think there is a little bit of political positioning taking place, rather than any new thinking.

"Organizations who’ve adopted this architecture often find some business users developing reporting and analytic applications directly against the atomic 3NF data structures."

Yeah, that happens, but that is what Inmon says should not happen.

"Ideally, these users would leverage an architected downstream analytic platform."

That is an essential point of Inmon's and Linstedt's architectures.

"Unfortunately, many organizations either populate the downstream environments with summary rather than atomic data, or worse, never get around to building the user assessable environments. Inevitably this results in a set of frustrated business users."

Which often leads to the failure of many such implementations. None of this is news. It is as true when the referenced article was written as it was 10-15 years ago.

There are those who believe that the only way to create a cohesive, clean historical record of enterprise data is by creating an Inmon EDW or a Linstedt Data Vault, and there are those who don't. I am of the latter group.

I believe that a properly designed enterprise dimensional data warehouse is just as capable as the other architectures. That, to me, is the Kimball architecture.

All three architectures are valid and all three can be successful. Each have their pros and cons.

What bothers me is this notion of a 'hybrid' architecture. It simply doesn't exist. To me, what exists are the architectures, and the methodology around dimensional modeling.

The implementation of an "architected downstream analytic platform" is inherent in the store and publish architectures of Inmon and Linstedt. One would be wise in embracing the dimensional modeling methodology in doing so.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Data Vault vs Kimball

Post  hang on Sat Mar 09, 2013 7:38 pm

ngalemmo wrote:"Unfortunately, many organizations either populate the downstream environments with summary rather than atomic data, or worse, never get around to building the user assessable environments. Inevitably this results in a set of frustrated business users."
.....
What bothers me is this notion of a 'hybrid' architecture. It simply doesn't exist. To me, what exists are the architectures, and the methodology around dimensional modeling.
Here are relevant lines from the article:

"To overcome these challenges, a popular modification to the Kimball Architecture has evolved. This hybrid architecture leverages the existing 3NF data warehouse as a primary source of clean, integrated data to feed a dimensionally-structured enterprise presentation area. The resulting dimensional presentation area would consist of a number of atomic, business process-centric fact tables integrated via a set of conformed dimensions.

The key advantages of enhancing the 3NF environment with a dimensional presentation area are to present an atomic, integrated, consistent environment to the business community that is significantly less complex. As a result, the data is easier to understand, users can more readily create the queries they require, and the queries themselves are less complex. In addition, the query response from the underlying dimensional structures will be significantly quicker."
ngalemmo wrote:None of this is news. It is as true when the referenced article was written as it was 10-15 years ago.
True, none of that is new. But the thoughts about fact table having durable key, effective date pair, smart integer date key and timestamp measure are new and not in Kimball's book written 10-15 years ago.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  ngalemmo on Sun Mar 10, 2013 12:53 am

I am not arguing that the dimensional design methodology has not advanced. I am saying 'hybrid' is a misnomer, a marketing/political term. A 3NF data store published to a dimensional presentation area is an Inmon architecture. The only suggestion being made is that facts should be atomic. Which I have no argument with, but it doesn't change the architecture in place.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Sun Mar 10, 2013 6:54 am

never get around to building the user assessable environments. Inevitably this results in a set of frustrated business users

I don't think any business user cares about any technical aspects. There are very important but in my experience key challenge is not methodology we use but how we use it to deliver business value. Data Warehouse needs to be "profitable" (directly or indirectly) and that often means "breaking the rules" as business doesn't want to spend the money to get ideal solution (and they might be right) or they don't understand the implications which results in having frustrated business users with high TCO.

I've worked on 6 DW projects for different clients in the last 2 years and I must admit I didn't see a single client who would have a team of developers and business users who are "on the same page". So CTO wants a Ferrari, user wants just a standard car and Developer says that with money & time they can build a car but it ends up like the one below


Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Sun Mar 10, 2013 7:12 am

One more comment. I've used Kimball in all projects and there was only one company where I felt Kimball is a bit problematic. The company is NHS (National Health Service) in UK and it is one of the biggest employers in the world and what they do; they operate by having "trusts" and each trust has it's own environment and get's data from various systems which are independent (partially implemented with MDM). They also need to use national code (kind of master data, but often done after extractions) and submit data in a pre-defined format on a frequent basis to national system.

Problem is 'conflict in interest' as they are multiple audiences and rules that are subject to interpretation. NHS also has frequent merges/splits and goes through "big chance" this April. Also each audience is generally interested in it's own data/source and does not care about other sources at all as it really has little to do with them. Very often they need access to untouched data but do not have access to source system themselves.

I see Data Vault working for NHS, this is the only company where I believe Kimball is at a disadvantage, and for me this is the third contract in the last 2 years for NHS (3 different trusts) and I hope I can influence the team to consider Data Vault and personally I believe as they have worked in NHS for many years, they will get it very quickly.

Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  hang on Sun Mar 10, 2013 9:34 am

Negalemmo, I got your point and agree with you. I am not a big fan of the term 'hybrid' or DW 2.0 for that matter. I don't even trust hybrid car. I have never done Inmon's DW, as it was out of favor when I started looking into Kimball's methodology when his second edition was out just over 10 years ago. You might have better knowledge and practical experience with Inmon's style.

However, itcouple makes a very valid point. Business does not care about what methodology you are using for DW modeling. They also do things like backdating changes in the source and ask you to cater for that in DW. They are only interested in current view of all the dimensions as they have only seen operational reports in their life and don't have any clue about the real value of OLAP system. So the unupdatable facts and slowly changing dimensions are not the best fit for these primary requirements. On the other hand, Data Vault is well placed to give what business initially wants while laying the foundation for downstream data mart population when business starts to make analytical sense.

Believe me I am more Kimball than anyone else I have worked with. I have even been accused of being too kimball in my recent experience. If done right, Kimball methodology is the most effective solution for DW. The question is how many self claimed Kimball practitioners really get the gist of his thoughts. If done improperly, dimensional modeling worst practice can destroy the DW project much more easily than normalized model.

BTW, I really want hear the comments about the points for dimensional modeling in my previous post. I know some very experienced modeler did not have any clue about many of these points, or in the way I understand them based on all Kimball' books and recent articles. The worst one is the difference between factless and bridge table, and I know the same topic has appeared in this forum at least twice.

My point is, if most of dimensional modelers are not clear about these crucial points, then dimensional modeling is too hard to be used properly.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  BoxesAndLines on Sun Mar 10, 2013 12:14 pm

Hang,
You have the same issues with 3NF modeling. When interviewing application data modelers, I always dwell on the normalization aspects of data modeling. The vast majority of candidates can't recite first, second, and third normal formal. Given a simple example, they can't normalize the data. Very few can provide an example of 3NF violation.

Most application data modelers I know have never made the jump to dimensional modeling. They still want to live in the world of "logical only", don't want to profile the data, much less consider performance implications of their designs. Sure they read the book, but with little to know physical database experience, and no desire to understand the benefits of bitmap indexes over btree indexes, their designs will always be subpar. It's not their fault. The role of modeler within most large organizations has not evolved in the last 20 years. I view this as the biggest reason for the wide discrepancy in quality for dimensional models.

My current client is implementing a data vault warehouse, so I am looking forward to understanding the intricacies (i.e. gotchas) related to the approach. My first reactions are, easy to load, easy to enhance, a bear to extract.

Emil is spot on with "I don't think any business user cares about any technical aspects." At the end of the day, I need to run a report, identify trends, run a campaign.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  hang on Sun Mar 10, 2013 8:36 pm

Thanks, B&L. You are absolutely right about modelers' quality in general, and perhaps, we were one of them at some point of our earlier data experience. I am always saying that to be a good dimensional modeler, you have to be an excellent relational modeler to start with. To transcend relational thinking to become a good dimensional modeler, you got to jump out of your comfort zone.

The world has left to us a maintain of badly designed OLTP systems as the source for data warehouse, and we are expected to resolve all the data quality and integration issues in ETL and sort them out in dimensional model. I have noticed some writings in Kimball Group to point out the importance to have an upstream MDM system in DW architecture, to alleviate the problems existing in the source. I think the idea is one step closer to Data Vault + Data Mart architecture. Maybe some people don't like to link MDM's Hub-Spoke architecture to Data Vault Hub-Satelite-Link, but I know they are very similar, and I do remember hearing Kimball referr to Data Vault as Hub-Spoke in the class.


hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  ngalemmo on Mon Mar 11, 2013 12:11 am

Yep, the business doesn't care, they just want a solution that works. But as professionals in the field, we should care. That's what a business would expect of us.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Mon Mar 11, 2013 2:53 am

and we all do

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  Jeff Smith on Mon Mar 11, 2013 5:01 pm

Is there really an argument of Data Vault Vs Kimball?

I was under the impression that the data vault was kind of a super staging area for a Data Warehouse. Not all of the data from the Data Vault was loaded into the Warehouse as the data vault may contain data that maybe not be appropriate for a data warehouse.

But then again, maybe the Data Vault is one of those things that is in the eye of the beholder.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  itcouple on Tue Mar 12, 2013 3:51 am

I think it always comes to usage.... if I was to "create an image" then Kimball for me is like a Super Market (with warehouse behind it). Clients have access to everything they expect to see in a super market and they can "browse" and choose what they need.

Data Vault is like huge network of suppliers that stores "raw" products and supply it to the super market warehouse. In my opinion they don't compete they coexist.

An just a simple steps that I follow when I implement Kimball:
Phase 1: extract + Source specific DQ
Phase 2: stage + DQ
Phase 3: DW Fact/Dimension

What we "lost" is raw data, obviously we could extend staging but what is the point if Data Vault does it much better? So if we need strong staging phase + raw data then I choose Data Vault.

The Data Vault book didn't mention anything about "presentation"/data mart layer which generally in my experience is Kimball Dimension model so as I don't have a different point of view I will still use Kimball as Presentation layer until I find someone who has a different point of view and if it is a good one I might chance mine

Just to recap. Data Vault is extra step but has extra usage. In NHS I see it to be ideal but at this point in time I still believe Kimball is needed for presentation layer with Data Vault. In NHS I need raw data it is a requirement, and it also helps to split complex requirements into data vault and presentation layer (kimball) which means they don't compete.

I personally don't care who creates a definition of a data warehouse: Kimball, Inmon or Linstedt (Data Vault) as it will evolve anyway and is largely subject to interpretation for each company and each person. I read Inmon (DW 2.0) & Data Vault because I saw limitations in Kimball for NHS and now........ I have more data and can make better decision.... (is that not what a DW is about ;p)

Take care
Emil

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Kimball vs Data Vault

Post  itcouple on Tue Mar 12, 2013 4:01 am

I was under the impression that the data vault was kind of a super staging area for a Data Warehouse

Just a clarification... From my point of view you can think about Data Vault as super staging area but "super staging" is more suitable for Kimball as it generally would be handled in "kimball way". Data Vault is like super staging but the import difference is flexibility of the design and very easy adaptation to business change (relationships etc) and it isolated from business rules which means you don't do business rules transformations when you load it (which might not be the same of standard staging or super staging).

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Kimball vs Data Vault

Post  itcouple on Tue Mar 12, 2013 4:14 am

.... thinking about it.... kimball is like a super market but I presume Data Vault is more like amazon: you buy directly from suppliers and can get much more than from super market for better price (as you can choose which "source" to use) but still it is under one roof called amazon.

itcouple

Posts : 45
Join date : 2010-10-13

View user profile

Back to top Go down

Re: Data Vault vs Kimball

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum