Handling Routine One-To-Many Relationships in the Dimensional Model

vbcastora Wed Feb 18, 2009 4:44 pm

The more I am exposed to dimensional modeling, the more I understand it is as much an art as it is a science. In addition, my experience in IT has taught me that there a number of ways to skin the proverbial cat and that the same problem may be solved a number of different and valid ways. Lately, I have been struggling with your run-of-the-mill one-to-many relationships (based in the OLTP system) that need to be transformed into a dimensional model. Specifically, the business users are looking to slice data by things such as a persons degrees or qualifications (MD, PhD, JD, etc.) and possibly specialty (or specialties). In the OLTP, this is your standard Person->PersonDegree<-Degree, kind of thing, and I am not sure if there some sort of general pattern for moving this and other data relationships of this type into the dimensional world.

In addition, the DW team has been specifically asked to not "limit" the relationship. In other words, we can not simply have three hard degree attributes in the person dimension - so it truly is a one-to-many that needs to be considered. Being that it is a multivalued dimension related more to a dimension (person), than a fact (a payment, for example), I am not sure this is a candidate to push a degree group key down to the fact table via a bridge table - although I don't know. Perhaps this is an acceptable use case for snowflaking. Any guidance or opinions?

Thanks.

mugen_kanosei Thu Feb 19, 2009 7:37 am

This sounds very much like the Keyword outrigger described in The Data Warehouse Toolkit (Second Edition), Chapter 8, page 194. There it describes two ways to handle this situation. The first involves each keyword being a separate row in the outrigger with a skill primary key, and skill group primary key. The problem listed with this approch though was having to use UNION or INTERSECT depending on if your doing an AND/OR for a persons skill sets. The second method involved rolling up the skill group into one line with the skills separated by a character, ex: \PHD\GED\BA\ and doing a substring search with wildcards.

This article by Ralph Kimball also describes the keyword outrigger.

vbcastora Thu Feb 19, 2009 10:02 am

Thanks for the reply and I understand the differences in each approach. While Ralph does explain usage of this data with SQL, how would an analytical tool, such as Analysis Services, handle this? If skills are concatenated into a single string, does this mean facts can only be sliced by a skill group and not by an individual skill? Given the example skill group, \PHD\GED\BA\, can an analysis tool answer a simpler question about one particular skill, such as how many payments were made to people with a BA?

BoxesAndLines Fri Feb 20, 2009 2:35 pm

You have to do this just as you would in an OLTP environment. In the OLTP worlkd we call them associative entities. Ralph doesn't want anything related to traditional data modeling in his nomenclature. I've heard them called bridge tables, groupers, outriggers, in the dimensional world. Just store the keys and not the describing attributes.

BrianJarrett Thu Feb 26, 2009 11:34 am

This is a tough one. It seems there could be a few approaches here.

This doesn't appear to be a candidate for an outrigger, at least not a dimensional low-cardinality outrigger. Typically that would be a one-to-one relationship between the dimension and the outrigger table. What you have here, as I understand it, is a one-to-many relationship, where one person might have multiple degrees.

To fit this more into a traditional model I'd probably recommend pushing it down to the fact table and adding in a bridge to handle the many to many. We've already spent some time discussing this here on the board. Even though the number of degrees a person might not directly correlate with your fact record, the number of degrees that a person had at the time the fact record was created is still true and can be represented with the bridge table.

Here's another thought. This breaks some of the rules but we actually did this at my previous company. We had a household dimension that represented a single record for a customer's household. These households had zero or more dependents (children). Our design had a one-to-many relationship between the household dimension and a dependent table. It wasn't really a true outrigger, due to the cardinality of the dependent table. Now there are obvious problems, such as duplicate records when returning all the household's dependents and we were forced to do outer joins to the table to return households with no dependents. In order to return unique records we would normally return the youngest dependent but in the case of twins-or when we wanted all the dependents-duplicates were permissible. Also since we rarely tied these dependents to a fact record the duplicates were a non-issue. Most often these two tables were used to generate mailing lists and the criteria for a mailing list extraction didn't care about fact records.

I don't know that this design is applicable for you. At a minimum it could use some improvement (such as assigning household with no dependents a -1 "no dependent" key) but it's another approach that we actually used successfully, despite some of the design issues. I'm not even sure I'd recommend it; I'd probably use it as a last-ditch solution.

So to summarize, this doesn't seem to be a candidate for a traditional outrigger due to cardinality. The cleanest way I think would be to join to the fact table with a bridge (including a divisor). This third solution is a bit unconventional but I thought I'd toss it out there in case it might help.

Hopefully this helps a bit. Let me know if not.

BrianJarrett Thu Feb 26, 2009 11:47 am

I guess you could also think about a "helper" bridge table between the two dimensions as well. There's more info on that here: http://www.intelligententerprise.com/010810/412warehouse1_1.jhtml

I think BoxesAndLines might have also referenced this approach in an earlier post.

BoxesAndLines Thu Feb 26, 2009 10:19 pm

One other option I used for at a non-profit that dealt with lots of doctors is to just store the degrees in a text field. This will only work if you do not need to filter on degree. We did not so this was a viable solution. Additionally, doctors are very paticular on the degree order they put on the signature line. It's usually not enough to just list the degrees. You need to list them in the order that you received them.

Sponsored content

Handling Routine One-To-Many Relationships in the Dimensional Model

Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model

Re: Handling Routine One-To-Many Relationships in the Dimensional Model