Handling Routine One-To-Many Relationships in the Dimensional Model

View previous topic View next topic Go down

Handling Routine One-To-Many Relationships in the Dimensional Model

Post  vbcastora on Wed Feb 18, 2009 12:44 pm

The more I am exposed to dimensional modeling, the more I understand it is as much an art as it is a science. In addition, my experience in IT has taught me that there a number of ways to skin the proverbial cat and that the same problem may be solved a number of different and valid ways. Lately, I have been struggling with your run-of-the-mill one-to-many relationships (based in the OLTP system) that need to be transformed into a dimensional model. Specifically, the business users are looking to slice data by things such as a persons degrees or qualifications (MD, PhD, JD, etc.) and possibly specialty (or specialties). In the OLTP, this is your standard Person->PersonDegree<-Degree, kind of thing, and I am not sure if there some sort of general pattern for moving this and other data relationships of this type into the dimensional world.

In addition, the DW team has been specifically asked to not "limit" the relationship. In other words, we can not simply have three hard degree attributes in the person dimension - so it truly is a one-to-many that needs to be considered. Being that it is a multivalued dimension related more to a dimension (person), than a fact (a payment, for example), I am not sure this is a candidate to push a degree group key down to the fact table via a bridge table - although I don't know. Perhaps this is an acceptable use case for snowflaking. Any guidance or opinions?

Thanks.

vbcastora

Posts: 2
Join date: 2009-02-18

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  mugen_kanosei on Thu Feb 19, 2009 3:37 am

This sounds very much like the Keyword outrigger described in The Data Warehouse Toolkit (Second Edition), Chapter 8, page 194. There it describes two ways to handle this situation. The first involves each keyword being a separate row in the outrigger with a skill primary key, and skill group primary key. The problem listed with this approch though was having to use UNION or INTERSECT depending on if your doing an AND/OR for a persons skill sets. The second method involved rolling up the skill group into one line with the skills separated by a character, ex: \PHD\GED\BA\ and doing a substring search with wildcards.

This article by Ralph Kimball also describes the keyword outrigger.

mugen_kanosei

Posts: 12
Join date: 2009-02-03
Age: 32
Location: Germany

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  vbcastora on Thu Feb 19, 2009 6:02 am

Thanks for the reply and I understand the differences in each approach. While Ralph does explain usage of this data with SQL, how would an analytical tool, such as Analysis Services, handle this? If skills are concatenated into a single string, does this mean facts can only be sliced by a skill group and not by an individual skill? Given the example skill group, \PHD\GED\BA\, can an analysis tool answer a simpler question about one particular skill, such as how many payments were made to people with a BA?

vbcastora

Posts: 2
Join date: 2009-02-18

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  BoxesAndLines on Fri Feb 20, 2009 10:35 am

You have to do this just as you would in an OLTP environment. In the OLTP worlkd we call them associative entities. Ralph doesn't want anything related to traditional data modeling in his nomenclature. I've heard them called bridge tables, groupers, outriggers, in the dimensional world. Just store the keys and not the describing attributes.

BoxesAndLines

Posts: 1077
Join date: 2009-02-03
Location: USA

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  BrianJarrett on Thu Feb 26, 2009 7:34 am

This is a tough one. It seems there could be a few approaches here.

This doesn't appear to be a candidate for an outrigger, at least not a dimensional low-cardinality outrigger. Typically that would be a one-to-one relationship between the dimension and the outrigger table. What you have here, as I understand it, is a one-to-many relationship, where one person might have multiple degrees.

To fit this more into a traditional model I'd probably recommend pushing it down to the fact table and adding in a bridge to handle the many to many. We've already spent some time discussing this here on the board. Even though the number of degrees a person might not directly correlate with your fact record, the number of degrees that a person had at the time the fact record was created is still true and can be represented with the bridge table.

Here's another thought. This breaks some of the rules but we actually did this at my previous company. We had a household dimension that represented a single record for a customer's household. These households had zero or more dependents (children). Our design had a one-to-many relationship between the household dimension and a dependent table. It wasn't really a true outrigger, due to the cardinality of the dependent table. Now there are obvious problems, such as duplicate records when returning all the household's dependents and we were forced to do outer joins to the table to return households with no dependents. In order to return unique records we would normally return the youngest dependent but in the case of twins-or when we wanted all the dependents-duplicates were permissible. Also since we rarely tied these dependents to a fact record the duplicates were a non-issue. Most often these two tables were used to generate mailing lists and the criteria for a mailing list extraction didn't care about fact records.

I don't know that this design is applicable for you. At a minimum it could use some improvement (such as assigning household with no dependents a -1 "no dependent" key) but it's another approach that we actually used successfully, despite some of the design issues. I'm not even sure I'd recommend it; I'd probably use it as a last-ditch solution.

So to summarize, this doesn't seem to be a candidate for a traditional outrigger due to cardinality. The cleanest way I think would be to join to the fact table with a bridge (including a divisor). This third solution is a bit unconventional but I thought I'd toss it out there in case it might help.

Hopefully this helps a bit. Let me know if not.

BrianJarrett

Posts: 61
Join date: 2009-02-03
Age: 40
Location: St. Louis, MO

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  BrianJarrett on Thu Feb 26, 2009 7:47 am

I guess you could also think about a "helper" bridge table between the two dimensions as well. There's more info on that here: http://www.intelligententerprise.com/010810/412warehouse1_1.jhtml

I think BoxesAndLines might have also referenced this approach in an earlier post.

BrianJarrett

Posts: 61
Join date: 2009-02-03
Age: 40
Location: St. Louis, MO

View user profile

Back to top Go down

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Post  BoxesAndLines on Thu Feb 26, 2009 6:19 pm

One other option I used for at a non-profit that dealt with lots of doctors is to just store the degrees in a text field. This will only work if you do not need to filter on degree. We did not so this was a viable solution. Additionally, doctors are very paticular on the degree order they put on the signature line. It's usually not enough to just list the degrees. You need to list them in the order that you received them.

BoxesAndLines

Posts: 1077
Join date: 2009-02-03
Location: USA

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

Permissions in this forum:
You cannot reply to topics in this forum