In which layer to relate facts to other facts?

Al Wood Tue Jun 25, 2013 7:12 am

Hi,

I have set up a data feed of events from a Radiology system, which is separate from our main hospital system. I am being asked to load this as fact tables into the Data Warehouse. But before that, I am being asked to create a layer of data, in the staging server, in which the Radiology events are related to Episodes in the main hospital system. We have a good patient identifier across the systems, but nothing to link the events in the two systems where patients have several events in each. This will need to involve some sort of closest-match-in-time logic, with some rather arbitrary rules governing the matching, based on the date difference and Inpatient/Outpatient and other flags.

This design seems wrong to me. Should we match events from different fact tables using the attributes of the date dimension?
The finance people want to match each Radiology event directly to a single Episode of care, even if it falls outside the date range of that episode.
I think that if the staging area contains this matching result, they will want to see that come through into the fact tables as a hard link.

Is my feeling right, that this is not the best way to do it?
Can a Data Warehouse perform a rules-based fuzzy date matching, and how?

Many thanks,
Al Wood

ngalemmo Tue Jun 25, 2013 3:15 pm

Quick recap: You have an existing data warehouse covering patient events and are introducing new facts relating to radiology. You need to associate radiology events to instances of care based on existing facts and a host of business rules.

I would develop a bridge table that identifies an instance of care based on the existing patient events. This can be done without affecting existing facts. Such a table would include the patient identifier, diagnosis group (or maybe primary diagnosis), and a date range that defines the care period. Such a table is inherently unstable as new patient events may occur that expand the care period, so a process needs to be in place to update the end date of the period as new events are added. However, it is a relatively small table.

A fundamental challenge in defining an instance of care involves complex health situations. A patient may undergo a series of events relating to different diagnosis which, depending on your business rules, may be reflected as different instances of care with overlapping timeframes. Which is why diagnosis in an important discriminator in the bridge. To be useful, the radiology data should carry the diagnosis for which the tests are being performed. If this is not available, you could build a bridge without it, but the defined instance of care would be less accurate.

I would load radiology as an independent fact table. With the bridge you can then combine different facts at query time.

Due to the complexitiy and instability of defining an instance of care, I would not 'hard link' facts. It is far simpler to regenerate the bridge as rules change rather than rekey existing fact tables.

Al Wood Thu Jun 27, 2013 9:18 am

Hi,

Thanks for replying. I'm sorry, I should have explained more fully. Our situation is a little bit different.

Our main hospital system creates the episode data, with from and to dates, and finance want us to map radiology events to episodes. Unfortunately the episode data is not always very accurate, and for example a radiology scan may happen a few days after an episode is closed - but the episode is not updated. Then another episode might begin a few days after that, so it's unclear which episode the scan belongs to. Even worse, episode dates, or scan dates, may change long after the events have taken place.
Another difference is that we never have overlapping episodes. We specialize in cancer treatment, and this is a policy based on the fact that it's very rare that we treat two conditions in isolation.
Further, even if we add a broad diagnosis group to an episode, it may change at a late stage in the treatment. Standard diagnosis codes do not always fit the patients as our doctors see it. We have a complex diagnosis and noting system, which is not included in this scope.

How should we relate an event fact to an episode fact? Is a bridging table still appropriate?

Many thanks,
Al Wood

ngalemmo Thu Jun 27, 2013 12:54 pm

I would use the episode data the other system provides as a bridge. Since you specialized in Oncology, diagnosis isn't as significant.

The challenge would be what to do with outliers. This is a question for the business. If a radiology event occurs between two episodes, is it considered part of the earlier episode? If that is the case, I would add an new 'extended' end time to the episode based on the start time of the next episode and use that for associations.

Al Wood Fri Jun 28, 2013 10:02 am

Thanks again,

I was hoping to create an Episode fact table, which would have Start and End dates. Would this duplicate the bridging table? Unless I put the extended Start and End dates in the bridging table along with the Event ID and Episode ID?

The finance people are keen to have what we call "date based fuzzy matching", that they use in their costing system. It produces a hard link in the activity tables themselves, but it also fills extra columns which show how close in time the matching is. I would like to be able to recreate something like this in fact and dimension tables or cubes, so I didn't have to do it in the staging layer. Perhaps the bridging table could hold these extra columns which show the quality of the match?

Does this seem like a sensible design?

Many thanks,
Al Wood

ngalemmo Fri Jun 28, 2013 1:01 pm

Yes, it makes sense. As you describe it, it is closer to a fact table than a bridge, but they basically serve the same purpose. Call it what you want.

Sponsored content

In which layer to relate facts to other facts?

In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?

Re: In which layer to relate facts to other facts?