Dimensional model - denormalized source

View previous topic View next topic Go down

Dimensional model - denormalized source

Post  joegenshlea on Tue Nov 13, 2012 7:11 pm

I have a dimensional model challenge.

The project is a single-source data warehouse.

The source system has a very wide table called 'case' and is a logical collection of various classes arranged horizontally (one row per case)

Account (customer)
Project
Property
Loan
Borrower (1-3)
County

The case itself doesn't have too many attributes....there are some status codes and similar workflow-related columns, but 90% of the columns in this case table are related to the property and loan associated with the case. In the source system, the case table is associated with many tables related to workflow, documentation, billing etc. Moreover, most of those classes (property, borrower, Loan, etc) don't have reliable unique natural ids.

Issue #1
I have not had a chance to work with a source system this denormalized in the past. I own a copy of Data warehouse toolkit (1st and 2nd versions) and the Data warehouse Life Cycle toolkit, but I can't find any discussion on dealing with how to model dimensions from source data when the natural unique id is not well defined.

Issue#2
All the reports I've analyzed use Account and project in the filter (the relation ship is 1:M Account ----< Project) in the filter. In UML world I'd say that relationship between Project and Account is a "composition", in other words, Project does not exist out side of Account, you would never see a ProjectID in a table without a AccountID. However, Account does exist from time to time without a project.

I am tempted to combine Account and Project in a single account dimension (much like the customer dimension in the Telco Database in DWLTK). If I do this, I'm worried about how join the Account dimension to a fact table when project is not applicable. Thoughts?




joegenshlea

Posts : 4
Join date : 2012-11-13

View user profile

Back to top Go down

Re: Dimensional model - denormalized source

Post  ngalemmo on Tue Nov 13, 2012 8:32 pm

#1. Worry about natural keys after you identified your dimensions. The choice of dimensions is driven by the nature of the business.

#2 Why would you want to combine project and account in the same dimension? Are projects specific to an account or is the project:account relationship M:M?
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimensional model - denormalized source

Post  joegenshlea on Tue Nov 13, 2012 8:57 pm

ngalemmo wrote:


#2 Why would you want to combine project and account in the same dimension? Are projects specific to an account or is the project:account relationship M:M?

Well, yes. Accounts have many projects. and the relationship is O:M (more on this below). A project always has an account, but accounts do not always have projects. But There is a wrinkle: the PK on the source project table is [Account, ProjectCd]....so this must mean M:M? Well, no not really. The application will allow users to enter any ProjectCd they want, even if it has been used previously for another Account. When there are two rows in the Project table with the same ProjectCD, this is purely coincidence, as there is no shared commonality between the duplicate values (making my life difficult in the dimensional model). Because of this, SQL WHERE clauses at the source require an AccountID when a projectID is specified. A ProjectCD is meaningless with out an Account qualifier.

Business point of view:
Account -+------O< project

Database implementation:
Account >+------O< project


I considered two dimensions, dimProject and dimAccount. The problem is that every time the dimProject is used in a query, dimAccount must be used unless I put accountID in the dimProject dimension. But isn't it a bad practice to put attributes in multiple dimensions?

Maybe an outrigger might be the solution?



joegenshlea

Posts : 4
Join date : 2012-11-13

View user profile

Back to top Go down

Re: Dimensional model - denormalized source

Post  ngalemmo on Wed Nov 14, 2012 1:58 am

Ok. So the natural key to a project must include the account number. It still doesn't mean they need to be the same dimension. Project grain fact tables would reference the account and project dimensions, while an account grain fact table would only reference the account dimension. This gives you conformance and points of integration between the two facts.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimensional model - denormalized source

Post  joegenshlea on Wed Nov 14, 2012 1:07 pm



I decided to propose two dimensions to the other members of the design team: Account and Project. The project dimension will have at least one account attribute (the account number), otherwise the project is not uniquely identifiable.

Thank you - I appreciate your time.


joegenshlea

Posts : 4
Join date : 2012-11-13

View user profile

Back to top Go down

Re: Dimensional model - denormalized source

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum