Beginning Data Modeling for DW

View previous topic View next topic Go down

Beginning Data Modeling for DW

Post  raghuk on Fri Jun 19, 2009 1:34 pm

Hello,

Please bear with me as these are a beginner's struggle to implement the DW the best way possible. I recently started studying about Kimball modeling.

I would like to ask a few questions on common practices on modeling the DW. I need to store historical data all the time; and I'm thinking I should use SCD2.

The following are a few tables. I just need a little help grouping them into a fact/dim category and if there are any intermediate tables that I need to group them appropriately.

Policy (Dim/Fact)
PolicyHistory (Dim/Fact)
======================================================
Id Policy# Term Eff.Date Exp. Date TermPremium Billed Written Status CommissionCode
---------------------------------------------------------------------------------------------------------------------------
1 AB0001 1 1/1/2000 5/31/2000 $1000 900 900 ACTIVE CO-130
2 AB0001 2 6/1/2000 12/31/2000 $1900 94.8 1100 CANCELLED TR-1A0
3 AB0001 3 1/1/2001 5/31/2001 $1200 302.44 1000 ACTIVE BB-1C0

AutoCoverages (AutoCoverageHistory) (Dim/Fact)?
================================================
Id PolicyId Type Premium Billed Written Fee1 Fee2
------------------------------------------------------------------------------
1 1 BI $10 $20 $10 $1 $5
2 1 COLL $20 $30 $20 $1 $5
3 1 PD $30 $40 $20 $1 $5

HomeCoverages
=============
Id PolicyId Type Premium Billed Written Fee1 Fee2
------------------------------------------------------------------------------
1 1 COVA $10 $20 $10 $1 $5
2 1 COVB $20 $30 $20 $1 $5
3 1 COVC $30 $40 $20 $1 $5

The dimensions are
AutoDriver (auto policy)
AutoVehicle (auto policy)
HomeLocation(home policy)(city, state, zip)

Questions:
==========
1. What criteria should I use in the above tables to group them into Fact/Dim? All the tables can be categorized into Fact/Dim depending on how you look at it. This is where I'm stumped.

2. Surrogate Keys.
In the above example, I have a policy with 3 rows, each with different term.
a. Should I use the combination of Policy# and Term as the natural key?
b. Or should I just have the same PolId with (IDENTITY) in addition to Surrogate Key?

3. Since AutoCoverages and HomeCoverages are subsets of Policy, how should I handle this design?

4. Is the history maintained in OLTP or OLAP?
I. Run ETL every night to capture changes and bring them over to DW. Which of the following methodologies is used?
1) Use checksum to determine if a row has changed and then process the rows accordingly.
2) Create triggers on source tables (I, U, D) and process the queue frequently (or at the end of the day).

Any help is greatly appreciated.

Thanks,

RK.

raghuk

Posts : 8
Join date : 2009-06-16

View user profile

Back to top Go down

Re: Beginning Data Modeling for DW

Post  Joy on Fri Jun 19, 2009 2:28 pm

I highly recommend that you read the Insurance chapter of The Data Warehouse Toolkit, 2nd Edition (starting on page 305). It addresses a lot of your questions.
avatar
Joy

Posts : 20
Join date : 2009-02-03
Location : Kimball Group

View user profile http://www.kimballgroup.com

Back to top Go down

Re: Beginning Data Modeling for DW

Post  raghuk on Fri Jun 19, 2009 2:49 pm

Joy wrote:I highly recommend that you read the Insurance chapter of The Data Warehouse Toolkit, 2nd Edition (starting on page 305). It addresses a lot of your questions.

Joy,

Thanks for a quick reply. I just placed an order for this book from Amazon after reading your message.

Can you please let me know your thoughts about other general questions?

I have other situations where I cannot determine if a table should be deemed a fact or dim. Am I wrong to think that all the data in the warehouse is either a fact or a dim? or are there situations where a table is both used as Fact and Dim? What is your recommendation in the afore-mentioned situation? Probably I'll have to study more on this in another book that I placed the order for 'The Complete Guide To Dimensional Modeling' by RKimball and MRoss.

It would be good to hear your thoughts before I get my hands on those books.

Questions:
==========
1. What criteria should I use in the above tables to group them into Fact/Dim? All the tables can be categorized into Fact and Dim both depending on how you look at it. This is where I'm stumped.

2. Surrogate Keys.
In the above example, I have a policy with 3 rows, each with different term.
a. Should I use the combination of Policy# and Term as the natural key?
b. Or should I just have the same PolId with (IDENTITY) in addition to Surrogate Key?

3. Since AutoCoverages and HomeCoverages are subsets of Policy, how should I handle this design?

4. Is the history maintained in OLTP or OLAP?
I. Run ETL every night to capture changes and bring them over to DW. Which of the following methodologies is used?
1) Use checksum to determine if a row has changed and then process the rows accordingly.
2) Create triggers on source tables (I, U, D) and process the queue frequently (or at the end of the day).

Thanks,

raghuk

Posts : 8
Join date : 2009-06-16

View user profile

Back to top Go down

Fact or Dim Determination

Post  JoeSalvatore on Fri Jun 19, 2009 3:38 pm

A common myth is that a given source table will only ever be use as a fact or dimension table in a dimensional model.
One given business process a business data entity like employee may be a dimension while for another it may be a fact (e.g. HR metrics).

The key is that you follow a process of design analysis that really does not look at specific data tables and columns but focuses on what you are trying to create with the dimensional model. You seem very data centric, as am I; as such it is really comfortable to just look at data and try to classify it as fact or dimension.

The process really invovles getting to a whiteboard and answering the following questions (from DW Toolkit Ch2):
1. What business process are you modeling and what are the interesting questions that come out of those processes?
2. What is the grain for the business process - What will a row in the fact table represent?
3. For fact rows, how do you describe/classify/group those measurements? [Dimensions]
4. What are the numeric (usually additive) measurements of the business process? [Facts]

After answering those questions you begin to revist data to determine where that information resides and how it would map to the design generated out of yours answers.
avatar
JoeSalvatore

Posts : 4
Join date : 2009-06-19

View user profile

Back to top Go down

Re: Beginning Data Modeling for DW

Post  BoxesAndLines on Wed Jun 24, 2009 9:13 pm

raghuk wrote:Hello,

--snip--
==========
1. What criteria should I use in the above tables to group them into Fact/Dim? All the tables can be categorized into Fact/Dim depending on how you look at it. This is where I'm stumped.
Think transaction or event, these types of entities are prime candidates for fact tables. For dimensions, think context. Entities that provide context for a transaction or event are good candidates for dimensions.


2. Surrogate Keys.
In the above example, I have a policy with 3 rows, each with different term.
a. Should I use the combination of Policy# and Term as the natural key?
b. Or should I just have the same PolId with (IDENTITY) in addition to Surrogate Key?
Ralph has written extensively on using surrogate keys. Use them here.

3. Since AutoCoverages and HomeCoverages are subsets of Policy, how should I handle this design?
Several options available for you here, a hierarchy, different dimensions, even fact tables.

4. Is the history maintained in OLTP or OLAP?
I. Run ETL every night to capture changes and bring them over to DW. Which of the following methodologies is used?
1) Use checksum to determine if a row has changed and then process the rows accordingly.
2) Create triggers on source tables (I, U, D) and process the queue frequently (or at the end of the day).
I use the easiest method supported by the application. If they support sending me deltas, I use deltas. These are all just different tools in the toolbox. My goal is to process the least amount of data to support the business requirements. History is supported in the warehouse. It may also be supported in the OLTP application.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Beginning Data Modeling for DW

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum