Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

View previous topic View next topic Go down

Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  Ramtin on Sun May 08, 2011 3:49 am

I am working on a project that is trying to pick an choose what attributes should be in the data warehouse but looking at existing reports in reporting systems and the ones that people put together on spreadsheets. This however represents a much smaller number data elements that exists on source systems.
These elements are not system related or system operation data they are actually business process data however they don't appear on any report and it will double the size of the effort.
The argument is that they are not intresting and hence they only make the project expensive.
My concern is that this compremises the completeness of the data warehouse and hence reducing it's analytical value. Also intresting is a subjective argument that can change over time which means we have to for ever go back and re-design every time someone decides that something is intresting.
The project team is very new to data warehousing and barely undrestands Kimball methodology however they have decided to adopt it somewhat. We have a bus matrix that implements some aspect of each subject area leaving gaps. for example we are looking at General ledger but not accounts receivable or payable. We take the Posting accounts but not the chart of accounts and account heirarchy.

Has anyone been in such a situation and did it work out ok for them? or how did they do the hard sell and get everyone to change their mind to go back and model each subject area completely.

I guess fundementally this is the first time I have been asked model by requirements and by process and infromation asset.

Love to hear even comments as I am not sure whether I am being idealistic or there is some serious grounds for concern.




Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  ngalemmo on Sun May 08, 2011 7:15 pm

The project team is very new to data warehousing and barely undrestands Kimball methodology however they have decided to adopt it somewhat. We have a bus matrix that implements some aspect of each subject area leaving gaps. for example we are looking at General ledger but not accounts receivable or payable. We take the Posting accounts but not the chart of accounts and account heirarchy.

It is not unusual, in fact generally recommended, that the DW be built in phases. So, starting with the GL is ok in itself. The subledgers would be covered by other fact tables, so it is easy to incorporate that data later as the business requires.

Where you need to be careful is the dimensions. If GL is the subject area, you want to be sure all available dimensions are covered. Once identified, they should be fully populated. It does not 'save' effort by not loading attributes because they are not in current reports. Once the DW environment is in place, users begin to demand more information. You are better off loading it in the first place, rather than do rework.

Hierarchies can be deferred if needed to meet deadlines. If you have established the dimensions correctly, hierarchies can be retrofitted as bridge tables without impacting the existing schema.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  Ramtin on Sun May 15, 2011 3:10 am

ngalemmo, thanks for the reply.
What you were suggesting implies that the subject area, if not fully modelled at least is fully undrestood even if not fully implemented. All Dimensions are identified but additional facts and bridges are left for later. That is not case here. The issue I am having is development is directed and scope is restricted based on reports required from the subject area. With This approach how do I know I got all or even the right dimensions if I have such a narrow view of the subject area. The reports require only 5 dimensions on GL fact. While from little work I have done I know there is more than 10 dimensions so far. There is not enough time allocated to each implementation to go outside requirements scope. If you were to investigate the entire subject would you not model the whole thing even if you implement some of the dimensions as just place holders with just the business key during the ETL stage? That way you will only have to do minimal work on dimensions for re work later and will not have to rework the fact.
The next pass will be based on requirements of some other reports and then again there is no garauntee that the subject area will be done by same modeller or completed. Am I being a puritan in saying that the point of an enterprise data warehouse is to provide an analytical space to the enterprise and not just a reporting system? With all the gaps in enterprise information assets created by Report Requirement based approach are we not severly diminishing the viablity of data warehouse for analytics? And are we not setting ourselves for sever rework? It is the first time I have seen this approach by a data warehouse project and architects and I do not have the experiance of this method of delivery. It just does not feel right to me.

I guess I am using the forum as sounding board to either put my worries to rest or get good case to argue against the delivery methodology.


Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  ngalemmo on Sun May 15, 2011 9:15 pm

Desiging a data warehouse for specific reports is a bad idea. It is usually the result of a misapplication of scope controls from a project management standpoint. The PM error comes in misstating scope. If cost and sustainability are the key concerns, scope needs to be defined in terms of subject areas, not data elements. When implementing a dimension or fact, a reasonable effort should be made to capture all useful and available attributes and measures, not just those 'in a report'. The cost to rework an existing model far exceeds the small incremental effort needed to capture additional data items and dimensions that are readily available in the first place.

That is not to say having reports as requirements is a bad idea. To the contrary. Reports should guide the data warehouse designer in identifying the subject areas of interest. From there the design should fully address all attributes and measures germaine to that subject area. The reports then serve as a basis for gap analysis to confirm the design meets minimal requirements.

There are some who would say that bringing in all the data is a bad idea from a data quality perspecitve. I disagree. The role of the data warehouse is to accurately reflect the contents of the source systems. That is easily attainable. Data cleansing is a function of data governance and maintenance of the source systems. A data warehouse, with its accurate representation and ability to efficiently perform analysis on large amounts of data serves as a tool to identify data issues and to aid resolving them in the source. The corrected data then flows back into the data warehouse.

With that said, there are techniques to mitigate the dimension issues when dealing with strict scope controls. The simplest is to 'stub' dimensions that are currently out of scope. You pull all natural keys when extracting for facts, and associate the fact with all applicable dimensions. In the case of a stub, it contains nothing more than the surrogate PK and the natural key. The stubs serve as placeholders until the need for attributes comes up. The dimension is then expanded to include the necessary attributes and a process is developed to load the dimension. The existing stub rows would then be updated with attribute values, which become immediately available to related facts.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  Ramtin on Mon May 16, 2011 1:31 pm

Thank you.
These were the very thoughts that had me concerned however you articulated them much better and that was helpful.
I in fact am using stubs already however I have run out time and now have about 5% my fact rows that seem to be linked to only the account_code and nothing else. I fear there may be some unknown dimension in play however and may be there isn't .I do not have to discover it.
I am putting a business case together to discuss change of approach.

Thanks again

Ramtin

Posts : 12
Join date : 2011-03-10

View user profile

Back to top Go down

Re: Should an enterprise data warehouse contain all imformation asset or only the ones that we think is intresting

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum