Textual fact or Junk dimension?

View previous topic View next topic Go down

Textual fact or Junk dimension?

Post  grgs27 on Fri Aug 27, 2010 6:07 am

Hello,

I have been following the forum for a while now and I must say that many doubts about dimensional modeling have gone thank to you all.
However I am facing now the modeling of a cube and I am somewhat stuck on what is the best implementation.

The cube is rather simple:

Dimension
- Commodity
- Country
- Time
...

Measure
- Production
- Export
- Import
- Re export
...

with each measure I have a non-numeric "attribute" which defines the quality of the measure (for each cell): estimated, official, unofficial, derived and so forth.

Based on your experience would it be better

- to store them as a non-numeric facts along with the corresponding measure.
or
- to store them as a junk dimension (as suggested order management in page 117)

considering also that it is not expressed as requirement (so far but most likely will be) the need to "aggregate" the indicator with a custom algorithm (example: if the summed production of some commodity in EMEA has some unofficial figure the indicator should show "unofficial" in the total).

Thanks in advance.

grgs27

Posts : 3
Join date : 2010-08-27

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  BoxesAndLines on Fri Aug 27, 2010 9:41 am

It sounds like a regular dimension to me. What makes it a junk dimension?
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  grgs27 on Fri Aug 27, 2010 10:52 am

BoxesAndLines wrote:It sounds like a regular dimension to me. What makes it a junk dimension?

Because it is related strictly to a fact...

The production (measure) of butter (commodity) in Germany (country) 2007 (time) was 10000 tons.

Now this is the fact and its attribute can be: this figure is derived, official, unofficial.
I don't have three possible figures for that but only one with a "quality indicator" of that fact.

In one fact-table I have up to 14 measure, with the dimension approach I would have 14 more dimension pointing at a dimension table of 5 rows.
It doesn't make sense to me...





grgs27

Posts : 3
Join date : 2010-08-27

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  LAndrews on Fri Aug 27, 2010 12:18 pm

The grain of your fact should determine how to solve this.

For example, if the grain of the row states that all measures are of the same quality, then you have a single "quality" dimension with your 5 values in it.

Alternately, if the grain is independant of quality, then each measure needs to be denormalized by quality (e.g. Derived_Production, Official_Production etc).

This is similar to the budgeting models,where you can have multiple scenarios (Actual, Plan, Forecast etc).

Hope this helps.

LAndrews

Posts : 132
Join date : 2010-05-13
Location : British Columbia, Canada

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  VHF on Fri Aug 27, 2010 7:41 pm

If I understand correctly, the OP wants to store a data quality metric for each of (up to) 14 measures in the fact table--essentialy a piece of metadata about each measure. So are these data quality metrics (a.) non-numeric measures or are they (b.) a dimension attribute?

I think a key question--touched on in the original post--is how these values will be handled when data is aggregated. If the need is to slice/dice/group data by the quality measurements, then they are (b.) a dimension attribute.

However, the original post mentioned a custom aggregation. That leads me to belive that these values are (a). non-numeric measures in themselves. To keep the fact table as narrow as possible AND to allow the aggregation function to work with numeric rather than string values, I would store a numeric value rather than a textual description. Something like 10=estimated, 20=derived, 30=uofficial, 40=official. (You could of course have a lookup table to tie the numeric code to a textual description for reporting purposes.)

Measure1
Measure1DataQuality
Measure2
Measure2DataQuality
...

The main trick is to make sure your data quality measure isn't aggrated inappropriatley--after all, a SUM of discreate data quality values doesn't tell you much. If your data quality was on a scale (such as 1 to 10) or a percentage than AVG might be useful. Assuming the numeric values are in ascending order of data quality, you should be able to use MIN to tell you the lowest quality piece of data included in your aggreated results. Otherwise, you may very well be looking at that custom aggregation.

Hope this helps with thinking through the process--and I hope your environment has good support for custom aggregations!

(I just tried to create a custom aggreation in SQL Server 2008, imagine my surprise that there was no "New" option for Aggregate Functions! I can create new Table-valued Functions, and I can create new Scalar-valued Functions, but not Aggregate functions. Is this an Enterprise Edition-only feature? No, it turns out Aggregate Functions can only be created using a CLR function written in C# or VB .NET!)


VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  hang on Fri Aug 27, 2010 11:13 pm

BoxesAndLines wrote:It sounds like a regular dimension to me. What makes it a junk dimension?
BoxesAndLines makes the point. The junk dimension is not the only way to reduce the number of dimensions, or measures for that matter. It is the proper dimensional modeling in general that makes the structure sound.

If you don't have too many measures, then LAndrews' alternative approach is the simplest and yet elegant way to model it, storing all the measures side by side.

If you are really concerned about the excessive number of measures, you make look a couple of pages further (p119) in Kimball's book where the Multiple Currencies scenario is discussed. You may not need a fact table like currency conversion fact, but you do need to store some deriving logic (say rate) in a dimension table. If the derived values are purely arbitrary, you might have to go back to LAddrews' approach. It is possible, but not recommended, to reduce the number of columns in fact table by partitioning your fact table vertically and join them by dimension conformance.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  grgs27 on Tue Aug 31, 2010 11:03 am

VHF wrote:

[...]

Hope this helps with thinking through the process--and I hope your environment has good support for custom aggregations!

(I just tried to create a custom aggreation in SQL Server 2008, imagine my surprise that there was no "New" option for Aggregate Functions! I can create new Table-valued Functions, and I can create new Scalar-valued Functions, but not Aggregate functions. Is this an Enterprise Edition-only feature? No, it turns out Aggregate Functions can only be created using a CLR function written in C# or VB .NET!)


Thank you VHF

we are going to "develop" our environment ourselves extending mondrian and few other things from the pentaho suite.
So for the support is not a big deal... and Postgres has aggregation support.

g


grgs27

Posts : 3
Join date : 2010-08-27

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  Jeff Smith on Tue Aug 31, 2010 3:14 pm

I think the original post has the answer. The field in question is referred to as an "attribute". That makes it a dimension.

Jeff Smith

Posts : 471
Join date : 2009-02-03

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  VHF on Wed Sep 01, 2010 9:16 am

Ah, but is an "attribute" always an attribute?!? :-)

The answer to the question I posed--are these data quality metrics (a.) non-numeric measures or are they (b.) a dimension attribute?--might be (c.) both, in which case it would be appropriate to model them both ways, as a dimension attribute for filtering/grouping based on data quality AND as a measure in order to aggregate a measure of data quality.

VHF

Posts : 236
Join date : 2009-04-28
Location : Wisconsin, US

View user profile

Back to top Go down

Re: Textual fact or Junk dimension?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum