Reporting table data repository vs. Dimensional data store

View previous topic View next topic Go down

Reporting table data repository vs. Dimensional data store

Post  hang on Thu Jul 21, 2011 9:08 pm

I am facing this business situation. Our current fact-dimension based dimensional store is under scrutiny which has found out some of the key fact and dimension tables have issues in their implementations. The impact of the issues has been highlighted to justify replacing the current architecture with another methodology, the reporting table based data repository, for our BI reporting services.

Now we have implemented a quite good effective date based data store that can provide all the dimensional and fact data at any given point of time for a few years history. Some argument points out why we need another redundant data store in form of fact and dimension tables when we can simply load data to fully de-normalised reporting tables that suffice all the required information for the front-end. And the argument goes on, If we need historical data, we then load reporting data snapshots into the reporting tables. BTW, OLAP Cubing is nice to have but not imperative at the moment.

In the proposed approach, we could eliminate one extra data layer, and the ETL becomes very straightforward without worrying about all the dimensional modelling complexities, such as SCD, bridge and junk dimension structure. Our reporting data are always much smaller subsets of original facts and dimensions in dimensional model. So the reporting tables may not go too deep even with a few snapshots, although very wide.

Now the justification seems convincing and the new approach will have some repercussion on national services once adopted. I know this simplistic approach goes against my BI/DW understanding and practice in many years, I really need some expert advice that will pin point the flaw and consequence, if there is one, of this reporting table practice.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Fri Jul 22, 2011 7:15 pm

Ngalemmo, Warrent. any comment on this?

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  John Simon on Mon Jul 25, 2011 5:02 am

Firstly, performance. Fully denormalised tables do not improve performance.
Secondly, you don't build reports from your database. You don't build your database for your reports or you will lock yourself in to an inflexible arrangement where you need to recreate tables for each report.
Thirdly, data quality and reuse. You can easily drill across tables in a DDS. If you need to create a new report then your development times are dramatically increased due to loading, testing each report table.

I did some consulting at a bank that tried the above approach and they had a lot of issues with having the same measure being incorrectly calculated and other data issues each time a report is required.

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  BoxesAndLines on Mon Jul 25, 2011 9:06 am

I think you'll run into trouble when you have the wide and deep table. Most of today's DBMS' return the whole row of data from disk even though you only have a few columns in the select statement. OTOH, if you're running Netezza, Teradata, or a columnar database, this may work just fine.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  ngalemmo on Mon Jul 25, 2011 10:18 am

It depends on what you are trying to do.

One cardinal rule I live by is you don't design a data warehouse to produce a set of reports. You look at the big picture and build to address the business.

With that said, if you aready have a data warehouse, building a publication layer to produce 'report files' or other extracts is not out of the question provided there is some benefit in doing so.

But, trying to 'fix' a data warehouse by replacing it with a collection of report oriented tables (or cubes) is a dead end. You wind up with the mess that prompted the creation of data warehouses in the first place.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  ngalemmo on Mon Jul 25, 2011 10:22 am

BoxesAndLines wrote:OTOH, if you're running Netezza, Teradata, or a columnar database, this may work just fine.

I've been working a lot with Netezza lately. It tends to work better with large thin tables than large wide ones. You generally get significantly (orders of magnitude) better performance with star schema and its joins than trying to query a one-table equivalent.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  BoxesAndLines on Mon Jul 25, 2011 11:43 am

Interesting. What is a shame is I just attended the Netezza conference in Boston a couple of weeks ago and this type of information was severely lacking. I learned exactly nothing about the appropriate database designs for Netezza.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Tue Jul 26, 2011 8:21 am

John Simon wrote:you don't build reports from your database. You don't build your database for your reports or you will lock yourself in to an inflexible arrangement where you need to recreate tables for each report.
I can't agree more! However in many people's sight, the delivery of reports is the quick and tangible benchmark of a successful BI system.
ngalemmo wrote:But, trying to 'fix' a data warehouse by replacing it with a collection of report oriented tables (or cubes) is a dead end.
It appears we could head in that direction, I wish our decision makers could take your comment seriously. However, the opponent of the dimension modeling has been quite successful in producing reports by using effective date based datasets for almost everything, facts and dimensions. The approach has also be claimed more resilient to changes as it does not rely on surrogate keys.

Ngalemmo, I know you have mentioned, a few times in the relevant posts, about the effective date pattern for fact table, and it's almost like applying SCD2 on the fact tables, but without dedicated SK and NK as in the dimension table. So the question is, is it feasible to use this generic approach to produce equivalent results which would otherwise be achieved by periodic snapshots, or even accumulating snapshots. I can imagine every thing has to be date ranged in the absence of SK on dimensions.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  ngalemmo on Tue Jul 26, 2011 10:34 am

However, the opponent of the dimension modeling has been quite successful in producing reports by using effective date based datasets for almost everything, facts and dimensions. The approach has also be claimed more resilient to changes as it does not rely on surrogate keys.

Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  BoxesAndLines on Tue Jul 26, 2011 10:42 am

Nick's spot on. On a recent engagement, I was converting SSRS reports to a dimensional model. The SSRS reporting guru showed up with a smug gring and a huge binder full with over 500 "reports". Two fact tables and 30 some odd dimensions later, the whole stack was running on the dimensional model.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  Mike Honey on Tue Jul 26, 2011 8:44 pm

From my experience, the biggest negative against the Reporting table approach is the exponential duplication of code it produces. Consider boxesandlines scenario of 500 "reports". The code for many of the business rules (expressed as joins, lookups, calculated columns etc) would be repeated in literally hundreds of different sections of code. Surely no-one with any IT training or experience can consider this a good solution?

Even with the best of efforts and resources, it is impossible to maintain consistency across such a code base - consider that each rule implementation for each specific report will be in a subtly different context, to meet the requirements of each report. Duplicated inconsistent code then leads to inconsistent results, and implies huge code maintenance and testing overheads for any changes.

The dimensional datamart optimises this challenge by implementing each business rule in a single piece of code and materialising the result in a single piece of data, which can then be easily queried by report authors and testers.

Hang I think your best counter-attack strategy would be to insist on a peer code review of the code produced so far for the supposedly successful Reporting table approach. Odds are you will be able to point to many duplicated business rules, a very high "Lines of code" count and maybe even some inconsistencies creeping in already.

Good luck (and keep the faith)!
Mike
avatar
Mike Honey

Posts : 185
Join date : 2010-08-04
Location : Melbourne, Australia

View user profile http://www.mangasolutions.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Tue Jul 26, 2011 8:45 pm

ngalemmo wrote:Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
Sounds like quick and dirty. But anyway, itís out of my control. B&L, I see your point, as now we started this simplistic approach, our SSRS developers can take over the tasks and stuff data together without any sense of data modelling.


Last edited by hang on Thu Aug 04, 2011 12:34 am; edited 1 time in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Tue Jul 26, 2011 9:14 pm

Mike Honey wrote:Hang I think your best counter-attack strategy would be to insist on a peer code review of the code produced so far for the supposedly successful Reporting table approach. Odds are you will be able to point to many duplicated business rules, a very high "Lines of code" count and maybe even some inconsistencies creeping in already.
Mike, thanks for the advice and it seems to be an effective one. You are right, there are heaps of ugly copy-paste duplications all over the place.


Last edited by hang on Thu Aug 04, 2011 12:35 am; edited 1 time in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  John Simon on Tue Jul 26, 2011 10:11 pm

Here's a list of issues:
  • Performance
  • Little Re-use of existing calculations, leading to:
  • Increased development time
  • Highly prone to bugs in business rules since they are re-done for each table
  • High risk of ending up with hundreds or thousands of reports and table leading to maintenance and support nightmares


I cannot think of a single good reason to go down that path.

Please let us know how it goes.

John

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Wed Jul 27, 2011 7:35 am

Good points John, I will keep you updated.


Last edited by hang on Thu Aug 04, 2011 12:37 am; edited 1 time in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  ngalemmo on Wed Jul 27, 2011 10:27 am

hang wrote:
ngalemmo wrote:Sure. Its quick an easy. Oh, and you rarely change a report file... if you need new data you just create a new one. Let's see how its going in 3-4 years...
Sounds like quick and dirty. Itís a shame that I have to wait that long to see its failure. However as saying goes, if itís doomed why doing it in first place. But anyway, itís out of my control. B&L, I see your point, as now we started this simplistic approach, our SSRS developers can take over the tasks and stuff data together without any sense of data modelling.

The real problem is it is kind of like crack. Really addictive and you lose sight of the real cost of doing it. Outside observers will see a disjoint mess while those on the inside wil think they are humming along just fine. People will still be running reports off files that were obsolete years ago because they don't know there was a replacement. IT will still be producing those files because 'the business still uses it' and nobody has the time to check to see what that file really is nor does anyone have time to convert those reports to the new file. Hardware will be upgraded because there isn't enough time or space to generate all the files on the old machine (but that's normal isn't it?).

Eventually, an intervention is needed.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  BoxesAndLines on Wed Jul 27, 2011 4:44 pm

ngalemmo wrote:...Eventually, an intervention is needed.
Since Hang is in Australia, I volunteer.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Wed Jul 27, 2011 9:45 pm

BoxesAndLines wrote:
ngalemmo wrote:...Eventually, an intervention is needed.
Since Hang is in Australia, I volunteer.
B&L, I appreciate your offer, and will give you a call when the number of reporting tables have reached a few hundreds.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  John Simon on Wed Jul 27, 2011 11:00 pm

As a consultant I've been asked in the past to put forward cost estimates for migrating 1000 reports, and more recently 2,500 reports.

Ridiculous!

John Simon

Posts : 186
Join date : 2009-05-29
Location : Melbourne, Australia

View user profile http://jsimonbi.wordpress.com

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  Ham09 on Thu Jul 28, 2011 5:24 am

Hi Hang,

If I was tackling this scenario I would document some concrete scenarios with real data applicable to your business and then highlight to pros and cons of each method and give a score card value on how important each aspect is to your business now and in the future. Don't forget to include the cost of developing and maintaining each approach (I'm thinking labour costs and available skill sets here), what client tools will need connect to data repository, will they need custom interfaces building (labour costs again), time spent developing reports with each approach. The Cost of having more than one version of the truth in your solution. The cost of supporting hardware and software as it scales.

The biggest thing that sticks out to me in your post is the 'OLAP Cubing is a nice to have at the moment' comment. Well if you go down the reporting table route only, the OLAP option goes out of the window along with power of the queries you can create in MDX versus the potentially slower and complex SQL equivalent. I would think this is a significant part of your scenario based evaluation above.

I know your question was purely a data architecture related, however this kind of decision has implications on the technologies you may want to use now and in the future which in turn impacts the bottom line of the business.

Simon

Ham09

Posts : 8
Join date : 2011-07-26
Location : United Kingdom

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Thu Jul 28, 2011 5:53 am

Ham09 wrote:Well if you go down the reporting table route only, the OLAP option goes out of the window along with power of the queries you can create in MDX versus the potentially slower and complex SQL equivalent. I would think this is a significant part of your scenario based evaluation above.
Simon, you hit the nail on the head! I really love your comments. That's almost what I told one of the important stakeholders. In my view, if the model is not cube friendly, then there must be some flaw in it. Obviously, reporting table approach will require a lot of work to be able to feed OLAP cube, if ever possible. I think it is the most powerful and convincing argument point. Thanks for further strengthening my confidence.


Last edited by hang on Thu Aug 04, 2011 12:39 am; edited 1 time in total

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hellovineet on Thu Jul 28, 2011 4:06 pm

Hang,

We had a same issue with one of a telecom client where in they have an architect who with lots of OLTP exp suggested to have the analytical reports in reports table approach.

In the beginning when the tables were small less than 300-400K, it worked fine, but later on as the table size increased, the performance issue started to creep in. This resulted in lots of business users shouting as now they had to wait for over a minute to get their reports. Instead of getting to nip this in the bud by adopting the dimensional approach, the management went with the another brilliant idea of same architect by introducing the aggregate report tables on top the reports table where ever require to increase performance...Now imagine the developer pain when he has to first introduce the logic of populating these aggregate tables with the base data and then changing each report to refer to them in all there were 60 reports.

Here comes the real problem, one fine day business users asked to introduce another aggregation levels (from cities to states to regions to country). Now it bowled everyone as the current architecture does not support this as the reports tables are static and changing them would result in corruption of the reports. thus forced the development team to create another set of 180 tables and reports( 60 * 3 new levels) with sql written to aggregate at region level. you can imagine the maintenance problem introduce in it, like changing a label for a type of report requires change in reports for each level.

So reporting table/reports bound the scalability of reporting environment which dimensional model provide.


hellovineet

Posts : 2
Join date : 2011-07-28

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  hang on Fri Jul 29, 2011 6:15 pm

hellovineet wrote:Here comes the real problem, one fine day business users asked to introduce another aggregation levels (from cities to states to regions to country).
A very likely same requirement change in our case, as the flexible location navigation is such a valuable part of our reporting environment.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Reporting table data repository vs. Dimensional data store

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum