How to model a financial security dimension where we may find out members are actually the same after the fact

View previous topic View next topic Go down

How to model a financial security dimension where we may find out members are actually the same after the fact

Post  Benjamin Pierce on Thu Jan 30, 2014 3:05 pm

Hello,

I'm currently building out a financial services dimension to model securities. This dimension will be populated from different source systems. Unfortunately securities are uniquely identified differently in each country/source system so there's not always a way to know right away that a particular security in 2 or more systems is really the same security.

For instance, say we have the following dimension model (abbreviated for simplicity):

DimSecurity
SecurityId INTEGER -- this is a system generated surrogate key and is the FK used by all fact tables.
ISIN VARCHAR(255) -- securities in some European systems are represented by these.
CUSIP VARCHAR(255) -- securities in North American systems are represented by these.
SEDOL VARCHAR(255) -- securities in one of our systems are represented by these.
Symbol VARCHAR(255) -- other systems just use the ticket symbol

On day 1 we might load ISIN "ABC" and Symbol "XYZ" as two distinct rows in this dimension. The next day we find out that ISIN "ABC" and Symbol "XYZ" are in fact the same security, so ideally we'd like to "collapse" this down to one row in the system, however we can't just delete one of the rows because it's already being used by the previous day's fact table records (unless we also go back and update the fact tables as well). To further complicate matters it's possible that the next day we find out that someone made a mistake in thinking that ISIN "ABC" and Symbol "XYZ" are the same, so now the system needs to split them back into two distinct records. Hopefully this example illustrates how it could quickly become messy.

I'm inclined to think that this dimension should be modelled as a SCD whereby every "collapse" or "expand" operation creates a new record version in the database, however that could still potentially leave fact table data pointing to stale data.

Has anyone had success modelling something similar to this? Is there a best practice for this particular scenario?

Any advice would be greatly appreciated!

Cheers,

Ben


Benjamin Pierce

Posts : 1
Join date : 2014-01-30

View user profile

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  ngalemmo on Fri Jan 31, 2014 12:31 am

Keep the rows and update all of them. It doesn't really matter if there is 1 row or 20 rows for the same security. Simply keep the attributes straight and no one will know the difference.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  nick_white on Fri Jan 31, 2014 4:36 am

Hi,
it may be helpful if you think about what you are trying to report on and the facts that support this rather than the dimensions - if a dimension record is not linked to a fact record then it makes no difference if you split, combine or SCD it.
If you start with two security dim records and then realise you need to merge them into one does that mean you have multiple fact records that need to be effectively merged as well? If so then you're going to need to update the fact record pointing to the "losing" Dim record in the merge to flag it as invalid or delete it - or you could drop both old fact records and create one new one that point to the correct Dim record.
Similarly, if you split one Dim record in to two then does that mean you need to duplicate all the fact records that referenced the original Dim record?

It may also be helpful if you look at the concept of "durable supernatural keys": basically you create your own business key for each security and use this in your Security Dim (alongside your surrogate key) and then have a cross-reference table that maps the actual keys from each source system to your BK. If your reporting/business requirements can be met by updating this cross-reference table rather than your Dims/Facts then this may be the simplest solution

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  ngalemmo on Fri Jan 31, 2014 2:25 pm

As a general rule you want to avoid any solution that involves updating the fact table.

The dimension table is orders of magnitude smaller and much easier to manage. It is also important to note that the reason why surrogate keys are used is to mitigate issues such as this. If your fact table used natural keys to join to the dimension, you would be in a world of hurt. But it does not. So, if you go through the trouble of assigning surrogate keys, why not take advantage of it and make your life a lot simpler?

Multiple dimension rows for the same entity is not a problem for queries. Queries use attributes to summarize and filter, not surrogate keys. It is also the normal state in a Type 2 dimension.

The issue in a Type 1 dimension is managing natural keys and ensuring all instances contain the same attribute values. On the natural key side, you need to ensure that a natural key lookup does not result in multiple surrogate keys. You can do that by taking the MIN or MAX key value. It really doesn't matter as long a you have one key to put on the fact. You also need to cross-populate the various natural keys as new coding relationships are discovered.

Your attribute update process needs to be constructed so it updates multiple rows. As it is, these processes usually use the natural key, so the process should do that normally.

avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  nick_white on Mon Feb 03, 2014 4:19 am

I quite agree that amending fact tables should be your last resort - however if the business logic of combining/splitting security records results in a change of fact then you have no choice e.g if you have a fact record per security and then split that security into 2 Dim records (because you had incorrectly merged them originally) then you would presumably need to create a new fact record and you may or may not need to update the old fact record (obviously any process you can come up with that means you don't have to update the existing fact record would be preferred).

Just to be clear, I was not suggesting using natural keys to join to your fact table, the "durable supernatural keys" are used to map multiple source records to a single Dim record. For example, supposing you have customer records in 3 source systems, no corporate MDM system and therefore you need to de-dupe them before loading into your DW. You identify that the same customer exists in all 3 source systems and has the natural keys of A, B and C in each one, so you create a cross-reference table that maps all three of these to the same "durable supernatural key" that you define e.g. 'XX' and this is the natural key you hold in your Dim alongside the usual numeric surrogate key that links the Dim to the Fact

nick_white

Posts : 366
Join date : 2014-01-06
Location : London

View user profile

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  ngalemmo on Mon Feb 03, 2014 2:27 pm

While 'splitting' dimensions is a somewhat unusual situation, I would argue there is never a need to 'combine' dimension rows and as such modify the fact table as a result. Its a really bad idea, particularly in this case.

The particular case that Ben has described can be handled fully in the dimension table with no ill effects to queries or analysis. You do not 'combine' dimension rows, a more accurate term is to 'coordinate' or 'syncronize' them. It is much simpler and far more efficient. The notion of a super key is an unnecessary complexity.

Furthermore, I would also argue that having multiple dimension rows is a significant advantage. Any given dimension row should have one and only one natural key. For example, a security could be traded using a CUSIP or SEDOL. There would be two dimension rows, one whose natural key is the CUSIP and the other the SEDOL ID. Each trade would reference one or the other depending on the ID used in the trade. The dimension itself may contain attributes for each of the different possible ID's (CUSIP, SEDOL, ISIN, etc) as well as common attributes and other attributes unique to the particular ID (such as symbol). When the dimension is updated, IDs and common attributes can be maintained across all similar rows.

What this also provides is an easy way to adjust the dimensional values if a CUSIP/SEDOL relationship is incorrect or changed for some reason. The facts are never affected because they are referencing the dimension row based on the specific ID used in the transaction. That will never change. What may change is its relationship with other IDs and that can be handled by altering the attributes in the row.

If the trade itself is incorrect, that is handled by adjusting trade entries, one to reverse the original trade and the other to post the corrected trade. Those facts would simply point the the appropriate dimension based on the ID used in the transactions. You never update the fact itself.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: How to model a financial security dimension where we may find out members are actually the same after the fact

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum