Many to many relationship question

ekv11 Tue Oct 27, 2009 6:43 pm

Hi all. I know the common standard practice for many to many relationships is to use a bridge table, and I've done that already in my model. But I've come across a situation where I'm thinking of using a group table related directly with the fact. Let's use the example of an insurance policy for a trucking company. For each policy the company can haul multiple commodities. These commodities are free form entered into the system, along with a percentage of the total time that particular commodity is hauled. The group of commodities the company hauls will add up to 100 percent. Let's say there are 5 fact tables that will use this for analysis. Typically we would create a bridge table something like this:

Many to many relationship question Model210

Many to many relationship question Model210

This would require that we create a separate bridge table to each fact table we want to be able to analyze commodities with. Instead of that, how about creating a commodities group table something like this:

Many to many relationship question Model111

Many to many relationship question Model111

Between all of the typos and different spellings, we're not saving much by putting the commodity name into it's own dimension anyway. This way we won't have to create the bridge table for each fact table. We can still relate the group of commodities to the policy they belong to through the fact. We obviously won't be able to enforce referential integrity through a physical FK in the database since there is no FK, we'll have to enforce it in the ETL. My question is, has anyone tried something like this before, and if so, how did it work out? Thanks.

mlflakus Tue Oct 27, 2009 9:52 pm

I am confused by what you are trying to accomplish with your 'grouping' strategy. If the data is so dirty, that you want to create a 'common' name for all the permutations of the commodity name, then you can do this with a helper / lookup table to match the various instantiations of a commodity name to the 'standard' dimension value.

BTW -- A more classic many to many relationship for your model would be: Commodity Dim to CommoditiesPolicyDim (aka Factless Fact table) to Policy Dim to Policy Fact.
See Marc Russo's paper, "The many-to-many revolution", at: http://sqlblog.com/blogs/marco_russo/default.aspx
You may also want to check out "Analysis Services Many-to-Many Dimensions: Query Performance Optimization Techniques": http://www.microsoft.com/downloads/details.aspx?familyid=3494E712-C90B-4A4E-AD45-01009C15C665&displaylang=en

However, if you are trying to use a grouping strategy to reduce the number of records in the bridge table due to having a large number of policies related to a set of fairly 'common' commodity groups / mixtures, the a 'grouping' strategy will help this, but you have the cardinality wrong for this type of solution. The bridge table would contain the Commodities Dim Key and the Commodiities Group key. This group key is then migrated to the inner Dim (in your case Policy Dim). This solution ususally uses a helper table that concatenates the set of commodities / mix together into a string with a group id. The commodity set / mix of each policy is then concatenated into a like string to identify the group to which it belongs. If no matching group exists, then one is created. This solution increases complexity and is only useful when it significantly reduces the number of records in the bridge table.

ekv11 Wed Oct 28, 2009 10:42 am

Thanks for the info. We are not using Analysis Services for our presentation layer, but I will check out the docs that you provided. A bit more additional info:

1. We would like to be able to analyze each fact record directly using the commodities, hence the desire to have the commodities relationship directly on the fact tables. We're trying to keep from attaching the commodities to the Policy dimension.
2. The Policy Fact is a factless fact. The grain is 1 record per policy. Keep in mind that the commodities will also be related to other facts with varying levels of grain.
3. The idea was to try to make presentation of the data as simple as possible -- users will be querying the data directly (through views), not using presentation such as cubes in Analysis Services. This was an idea of a way to eliminate a separate bridge table between each fact and the commodity dimension.

Your classic many-to-many relationship that you describe is what we were trying to avoid -- having a bridge table from the Policy Dimension to the Commodity Dimension. Your CommoditiesPolicyDim factless fact is in essence serving the same purpose as a bridge table would. Our Policy Fact factless fact table is what will be used for various policy counts, and is where the analysis of commodity needs to take place. It was just a thought that I was hoping someone may have tried. Anyone else ever tried this?

Thanks for the ideas mlflakus.

ngalemmo Wed Oct 28, 2009 12:04 pm

mlflakus is providing good advice. I am unable to see your diagrams (not sure why), but I don't understand your concern about the bridge.

First, you have a group dimension whose natural key is a delimited concatentation of the commodities specified for the policy. Every unique combination gets a unique group key. That goes on the fact table. That pretty much gets to what you are asking for, but how do you expect to do analysis on commodities? Substring scans? That is where the bridge comes in. You create a commodity dimension with the commodity as received (forced to either upper or lower case to reduce variance) as the natural key and have a bridge relating the commodity to the group. Having a commodity dimension allows you to clean things up. Have additional attributes in the dimension table to allow entry of a 'correct' commodity name and maybe category codes if needed. Something like this can be manually maintained on a periodic basis (you just need to fix new rows) and allow you to consitantly report commodity information. I did a similar thing with search keywords for a website's clickstream warehouse. Even with hundreds of thousands of visitors a day, keyword correction became trivial as people have a tendancy to misspell words the same way. Maintenace was done with a simple form in Access linked to the warehouse keyword dimension table.

mlflakus Wed Oct 28, 2009 12:49 pm

In your design, if you group / aggregate facts by commodity, you may count the same fact record more than once. If you then sum the values of the each grouping aggregation, it will most likely overstate the number of policies covering the set of commodities. SSAS 2005 or greater corrects for this. You will need to use another strategy. There are several Kimball Design tip articles on this design problem.

ekv11 Wed Oct 28, 2009 2:02 pm

I'm not sure I follow how a fact record can be double counted in this case. There is one record per commodity per group in the Commodities Group Dim. For example:

Commodities Group ID Commodity Pct Hauled
3 LIVESTOCK 25
3 STEEL PRODUCTS 25
3 PRODUCE 50
5 CARDBOARD 60
5 PAPER 40
...etc

The Commodities Group ID would be on the fact record. 1 group belongs to 1 policy and a commodity cannot be repeated on a given policy. Our example has a Policy Fact record with a grain of 1 row per policy. No matter how we search, I don't see how double counting would occur. Do you have an example of how that would happen?

mlflakus Wed Oct 28, 2009 3:02 pm

Let's say that 20 policies have CommodityGroupID = 3

Select A.Commodity, count(*) as PolicyCount
From CommodityDim A
Inner Join PolicyFact B
On A.CommodityGroupID = B.CommodityGroupID
Where CommodityGroupID = 3
Group by A.Commodity

Results:

Commodity PolicyCount
------------------- -----------
LIVESTOCK 20
STEEL PRODUCTS 20
PRODUCE 20

If you total the result, then you will get 60, not 20, but the true distinct policy count is 20.
This is only an issue if the results are presented in a tool that subtotals or totals the column values.

I'll let you run through the excercise when 'LIVESTOCK' and 'PRODUCE' are in more than one group and you want to count the distinct number of policies where Commodity = 'LIVESTOCK' and 'PRODUCE' by commodity.

There are better examples in the Kimball design tips and Marc Russo's paper on this subject.

ekv11 Wed Oct 28, 2009 3:42 pm

Ah, I see what you're saying, but we won't be asking the question of "How many policies have commodity group 3?". We will be asking questions more like "How many policies haul PRODUCE?" And ultimately, "What are the total losses for policies that haul PRODUCE?" (that would obviously involve another fact table).

I think your follow-up would come down to the question of "How many policies haul both PRODUCE and LIVESTOCK?". That particular question can be answered like this:

SELECT COUNT(1)
FROM PolicyFact P
INNER JOIN
(SELECT CommoditiesGroupID, COUNT(1)
FROM CommoditiesDim
WHERE CommodityName = 'LIVESTOCK'
OR CommodityName = 'PRODUCE'
GROUP BY CommoditiesGroupID
HAVING COUNT(1) = 2) D
ON D.CommoditiesGroupID = P.CommoditiesGroupID

I believe that same strategy would need to be used with or without a bridge table for that particular question.

kapoor_dh Tue Dec 08, 2009 2:19 am

Your design mignt fail where you need to join the more than one record from the fact table to the dimension table based on the group id

Consider your own example where the dimension has the values like
3 LIVESTOCK 25
3 STEEL PRODUCTS 25
3 PRODUCE 50
5 CARDBOARD 60
5 PAPER 40
...etc

Policy Fact

Policy Id =P1
Dimension Group id = 3

Policy Id =P2
Dimension Group id = 3

Policy Id =P2
Dimension Group id = 5

Now if we need to join the fact table with the dimension table based on the group id = 3 to get any such kind of query answered,it will end up in the cartesian join fact X dimension = 2x6 rows

Sponsored content

Many to many relationship question

Many to many relationship question

Many to Many using a Grouping strategy

Re: Many to many relationship question

Re: Many to many relationship question

As designed, beware of counting the same fact record multiple times in totals

Re: Many to many relationship question

Re: Many to many relationship question

Re: Many to many relationship question

Re: Many to many relationship question

Re: Many to many relationship question