Fraud Dimension
2 posters
Page 1 of 1
Fraud Dimension
Hi,
I am currently working on modeling a Fraud Dimension and have run into some issues/concerns when trying to design this dimension. A brief description - it is a dimension that holds information based on First/Last Names, Phone Numbers, or Addresses that have been marked as fraudulent. The issue is that each row in the source table contains columns for name, address, phone number, and other demographic information. On some reports, they match on Name, some reports they match on Address, and other reports they match on Phone Number to try to see if there is a known match. On every report, they want to print out all of the rows content regardless of the match type. The issue(s) I am running into are:
- There is no business key on this table - I was going to try to use a hashbyte column to determine new rows, etc.
- There can be multiple rows depending on the lookup criteria. An example, Joe Smith could have three rows because one row his Address is different, and the other rows his phone number is different. The same holds true for any combination - there can be multiple rows for the same fraud phone number (tied to different address or Name). On a report/fact table, they want a row for each match - so the fact table would have 3 rows for Joe smith if the match was on name.
Since they want to see every possible match, should I just have the dimension table contain all of the columns and in the ETL, perform lookups based on the criteria for that fact table or should I break the dimension into a few based on different criteria (have a Fraud Name, Fraud Address, Fruad Phone, etc) or should I have a bridge table?
Thanks!
I am currently working on modeling a Fraud Dimension and have run into some issues/concerns when trying to design this dimension. A brief description - it is a dimension that holds information based on First/Last Names, Phone Numbers, or Addresses that have been marked as fraudulent. The issue is that each row in the source table contains columns for name, address, phone number, and other demographic information. On some reports, they match on Name, some reports they match on Address, and other reports they match on Phone Number to try to see if there is a known match. On every report, they want to print out all of the rows content regardless of the match type. The issue(s) I am running into are:
- There is no business key on this table - I was going to try to use a hashbyte column to determine new rows, etc.
- There can be multiple rows depending on the lookup criteria. An example, Joe Smith could have three rows because one row his Address is different, and the other rows his phone number is different. The same holds true for any combination - there can be multiple rows for the same fraud phone number (tied to different address or Name). On a report/fact table, they want a row for each match - so the fact table would have 3 rows for Joe smith if the match was on name.
Since they want to see every possible match, should I just have the dimension table contain all of the columns and in the ETL, perform lookups based on the criteria for that fact table or should I break the dimension into a few based on different criteria (have a Fraud Name, Fraud Address, Fruad Phone, etc) or should I have a bridge table?
Thanks!
tpolak- Posts : 6
Join date : 2012-12-13
Re: Fraud Dimension
Does Joe Smith have a valid address and zero to many fraudulent addresses? Or do you only have a collection of fraudulent information? If the later, then your idea of having separate dimensions sounds good. You can just let the fact table be the intersection of the fraudulent dimensions.
bruce.szalwinski- Posts : 8
Join date : 2012-12-11
Re: Fraud Dimension
Basically the table is a collection of fraud information. The issue with seperating them completely is that on a report, I would have to know which Name belongs to which address and phone number. Even if I match up against say name, or just phone, they want the whole fraud record (even if there are three names that match John Smith, three rows will be on the report). Thats what I was thinking a bridge table may be best that can link all of them together, or maybe I dont even need anything more than the one Fraud dimension with a hashbyte column to compare and get new rows.
Thanks!
Thanks!
tpolak- Posts : 6
Join date : 2012-12-13
Similar topics
» Predictive Analytics (Fraud Score) and Dimensional Models
» Replacing Aggregate dimension with Conformed base dimension at lowest level.
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» bridge table and junk dimension on customer dimension (bank/credit union)
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
» Replacing Aggregate dimension with Conformed base dimension at lowest level.
» How to handle a Type I or II dimension with a snowflaked customer sub dimension (kimball book page 337, 338)
» bridge table and junk dimension on customer dimension (bank/credit union)
» Modeling an Employee Dimension to a Fact which has two columns relating to the Dimension
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|