Cascading Many to Many Relationships

BICorvette Sun Sep 14, 2014 11:10 pm

Hello,

I am currently modeling a scenario implicating several multi-valued attributes that are related to one another. Basically many, many-to-many dimensions that are related to one another.

Here is some background information regarding my dimensional model.

Business Process
I am modeling a periodic snapshot oriented automobile insurance related business process that will measure premium and claim dollars on a monthly basis.

Business Process Grain
The fact table I am modeling contains one record per insurance coverage (bodily injury, property damage etc…), per vehicle, per policy, valued on a monthly basis.

One policy can contain multiple vehicles, and one vehicle can contain multiple coverages. One vehicle can also have multiple drivers assigned to it. Drivers in turn can have multiple violations associated to them (tickets, traffic infractions etc..).

The various components of a policy can be mirrored in this manner:

Policy => Vehicle => Coverages (coverage is the most atomic level in which facts are measured)
Vehicle => Drivers => Violations/Infractions

The fact table measures premium and loss dollars at the most atomic level which is at the coverage level of granularity.

Business Problem
Now, one requirement that is being presented is business users wanting to calculate loss ratios based on driver characteristics. For example, they would like to calculate a loss ratio for all vehicles having a single female driver, having a speeding violation on file.

So far, I have created a “[Driver]” dimension and established a many-to-many relationship between the fact table and the [Driver] dimension using a group and bridge table. The [Driver] dimension is considered multi-valued being that one vehicle can contain one or more drivers.

I unfortunately cannot include the driver within the grain of the fact table as the operational source system does not allocate premium dollars to individual drivers. If I were to lower the grain of the fact table in order to include all drivers on the policy, I would duplicate measured facts as facts are not measured at the driver level of granularity.

There is also no “allocation” or “weight” metric I can apply to allocate premium dollars across multiple drivers on a vehicle, as again, no numerical facts are directly associated to drivers.

Hence, drivers on a policy are more viewed as “rating characteristics” of a given vehicle, and would be used by users to "constrain" or "filter" numerical facts. Users would be aware of the potential double and triple counting of measures if users attempt to "slice" and "dice" numerical measurements by drivers on a vehicle.

Up until now, my fact/dimension relationships between the fact table and [Driver] table would be modeled as such:

[Fact Table] <= [Driver Group Table] => [Driver Bridge] <= [Driver Dimension]

The many-to-many relationships above work great as users can utilize the multi-valued [Driver] dimension to answer questions such as:

“How many premium dollars have we earned this year for vehicles having a female driver aged 25, that also had a speeding ticket?”

Now, as previously mentioned above, one driver on a policy can have multiple violations/infractions on file.

Due to the many-to-many relationship already established above between the [Driver] dimension and fact table, I am questioning the feasibility of relating the [Violation] dimension as a multi-valued dimension from the [Driver] dimension.

This would result in the following “chain” of modeled relationships/tables:

[Fact Table] <= [Driver Group Table] => [Driver Bridge] <= [Driver Dimension] <= [Violation Group Table] => [Violation Bridge] <= [Violation Dimension].

I would essentially be relating 2 multi-valued dimensions to each other using 2 bridge tables. This would allow users to drag and drop the attributes within both dimensions into filters and constrain aggregate measures by the selected attributes.

Have you ever seen this scenario in which 2 multi-valued dimensions are related to one another in this manner?
Is this modeling scenario valid and the correct approach for my unique scenario?

I am worried that this would cause major performance issues as we will be using SQL Server Analysis Services to present this data to users.

However, as mentioned above, I cannot change the grain of the fact table and include drivers as a single valued attribute as premium dollars are not allocated to individual drivers.

I thought about using a factless fact table to “track” all vehicles, drivers, and their associated violations/infractions, but users will still need to have access to ALL rating characteristics on a policy to generate the desired loss ratios, but still filter the loss ratio by driver and their associated violation/infraction characteristics.

Users will be using BI tools to access this data and so they will need to access all data from a single fact table for ease of use.

Are there any other options for modeling such a scenario, or am I stuck with having to implement this “cascading” multi-valued dimension relationship?

I greatly appreciate your thoughts on this.

Thanks!

ngalemmo Mon Sep 15, 2014 2:22 am

It would help if you can place the policy as a dimension on the violation fact. I also don't see where you have claims for the loss side of things (assuming the coverage fact just has premiums).

But, the query to figure out loss ratio is a multi-fact query, which does not require bridge tables. Any join across fact tables are many to many, therefore you aggregate the facts across common dimensions, creating 1:1 relationships and join across the aggregates.

BICorvette Mon Sep 15, 2014 4:10 am

Thanks for the quick reply.

The fact table in question is a periodic snapshot based fact table where each policy down to the coverage level is measured on a monthly basis. During this monthly valuation, I calculate the premium earned and the amount incurred on the claims side and place these 2 aggregates as measures all within the same row within the fact table.

The loss ratio can then be computed using a calculated measure via a BI tool as the 2 aggregates encompassing the loss ratio calculation are stored within the fact table all within the same row.

I do have a Policy dimension that is associated to the fact table which contains the core, base attributes of a policy. I also have many other conformed dimensions associated to the fact table such as vehicle, and policy coverage.

With this said, I have no issues computing loss ratios as I am able to slice and dice this metric by a rich set of attributes available within the aforementioned dimensions.

My issue is the cascading multi-valued dimensions that are related to each other.

My [Driver] dimension establishes a M2M relationship with the fact table via group and bridge tables as I previously outlined.

The problem is that violations/infractions are also associated to individual drivers on the policy. I need to associate my [Violation] dimension to the multi-valued [Driver] dimension.

Being that the [Driver] dimension is already considered a multi-valued dimension off of the fact table, I would think I would be forced to model an additional multi-valued dimension (Violation) off of the [Driver] dimension.

Using group and bridge tables, I would think this would result in the following table relationships:

[Fact Table] <= [Driver Group Table] => [Driver Bridge] <= [Driver Dimension] <= [Violation Group Table] => [Violation Bridge] <= [Violation Dimension]

As shown above, one fact record could be associated to one or more [Driver] entities, and one or more [Driver] entities could be associated to one or more [Violation] entities.

Is my approach to modeling the association of violations to drivers above correct?

As I previously outlined, I am not sure I have any other choices/methods as my fact table is currently configured optimally to allow the aggregation of facts based on my other core dimensions. It is just the multi-valued nature of my drivers and their associated violations that has me questioning if there is any other way to model this.

Again, if I was able to lower the grain of the fact table to store the associated drivers there I would have no issues. I unfortunately cannot do this as no premium is allocated to the drivers on an individual basis, hence the need for the [Driver] dimension using a M2M relationship off of the fact table.

I greatly appreciate your thoughts on this.

Thanks!

nick_white Wed Sep 17, 2014 4:04 pm

Hi - this is obviously quite a complex scenario so please take the following as thoughts/comments rather than necessarily a solution. If I have understood your description correctly I think fundamentally this is a business issue rather than a technical issue.
You make the following 2 statements in your original post:
"they would like to calculate a loss ratio for all vehicles having a single female driver, having a speeding violation on file."
"the operational source system does not allocate premium dollars to individual drivers"

The first issue I see is that presumably the "loss ratio" is a calculation based on premiums paid in and claims paid out? Suppose a policy covers 2 single female drivers, both having a speeding violation on file and both having a claim in the same month; how does the business expect this to be handled? If you treat each claim individually then aren't you double counting the premium in your loss ratio?

The second issue may be that you are trying to make one fact table do too much work. I would create a (possibly factless) fact table that holds information about drivers, their characteristics and their violations and possibly another fact table that holds the relationships between policies, vehicle and drivers. Then to answer the query "calculate a loss ratio for all vehicles having a single female driver, having a speeding violation on file" you would query the table to get a list of all "single female drivers, having a speeding violation on file", use this result set to filter a query on the second table to get a distinct list of policies, vehicles, etc. and then use this result set to finally filter the query against your snapshot table that holds the loss ratio.

To summarise, I find it much easier to model complex relationships in fact tables and then use conformed dimensions to filter multiple queries - rather then trying to do this with multiple bridge tables and groups where you constantly have issues with measures being doubled counter and generally assigned incorrectly.

Hope this helps a bit?

Regards,

Sponsored content

Cascading Many to Many Relationships

Cascading Many to Many Relationships

Re: Cascading Many to Many Relationships

Re: Cascading Many to Many Relationships

Re: Cascading Many to Many Relationships

Re: Cascading Many to Many Relationships