Mini Dimension Needed?

View previous topic View next topic Go down

Mini Dimension Needed?

Post  schone on Sun Feb 26, 2012 8:01 am

Hi all,

I have a Client Dimension and a Fact table which tracks Sessions with Clients, these have the following columns:

Code:

[DimClient]
----------
PK_ClientKey
ClientNumber
EmailAddress
Postcode
PostcodeLongitude
PostcodeLatitude
DateOfBirth
Gender *
Sexuality *
CulturalIdentity *
LanguageSpokenAtHome *
CountryOfBirth
UsualAccommodation *
LivingWith *
OccupationStatus *
HighestLevelOfSchooling *
RegistrationDate
LastLoginDate
Status

[FactSession]
-------------
PK_SessionKey
FK_ClientKey
...

My first requirement was to start grouping the age of the Clients at a specific Session (FactSession), the best way to approach this was to create a Age Group dimension and create a foreign key (FK_AgeGroupKey) in the FactSession to the DimAgeGroup dimension.

Now I'm thinking it would be good to track all the columns with an * (above). These could (not yet proven) have a high correlation against Sessions. Reading through the DWH Toolkit it seems a Mini Dimension to accomodate all the * columns along with the Age Group would suit best, so I put together the following structure:

Code:

[DimClient]
----------
PK_ClientKey
ClientNumber
...
Status

[DimDemographic]
-----------------
PK_DemographicKey
AgeGroup
Gender
Sexuality
...
HighestLevelOfSchooling

[FactSession]
-------------
PK_SessionKey
FK_ClientKey
FK_DemographicKey

The DimDemographic table would need to utilize a SCD Type 2 to be able to track the changes over time. Would this be the best approach to my requirements?

Additionally, I have RegistrationDate and LastLoginDate columns on my Client Dimension, in the case where a Client registers but never logs in what would be the best value to put in the LastLoginDate field? Something like '1900-01-01' or NULL?

Sorry for the long post but hopefully I have given enough information Thanks in advance!

schone

Posts : 2
Join date : 2012-02-26

View user profile

Back to top Go down

Answered my own question...

Post  schone on Sun Feb 26, 2012 8:46 pm

After reviewing the requirement I believe it is best to leave the Client dimension as is and make sure it follows a SCD of Type 2.

However, what would be best practice when tracking the Age/AgeGroup of a Client over time? As the source does not update would I need to introduce a layer of logic in the ETL process that checks if a new Session has occured a year later and to update the DimClient with the new AgeGroup?

Input welcome!

Also, with regards to the LastLoginDate, I will be inputting NULL instead of an actual date.

schone

Posts : 2
Join date : 2012-02-26

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  Vishy on Thu Mar 01, 2012 1:43 am

Hi,
I am not sure about SCD 2 here as you dont have any clientnumber in DimDemographic dim. If client number is not present or no connection with ClientDim then in the demographic dim how you are going to relate it.

If you have to maintain history then offcourse it has to be a dimension but seeing the columns you are going to store in the new dimension are mostly not type 2

Agegroup -- will change each year Type2
Gender -- can not change
Sexuality -- can not change ( I feel)
CulturalIdentity -- can not change
LanguageSpokenAtHome - can not change
CountryOfBirth -- can not change
UsualAccommodation -- Type 2
LivingWith -- Type 2
OccupationStatus -- TYpe 2
HighestLevelOfSchooling -- hardly will change

what is stopping you to have all these in clientdim only ??

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  hang on Thu Mar 01, 2012 2:27 am

If ClientDim is small, say tens of thousands, it may not be worth having a mini dimension and just leaving the attributes in the main dimension would suffice, as Vishy suggested. However the age or age group should not go into the main dimension but connected to the fact separately.

If the main dimension is a big dimension, say close to or above a million, then you should have a mini dimension to control the ever growing size of the main dimension due to SCD2 attributes. The concept of mini dimension is to move all the low cardinality attributes out of the main dimension, regardless of their SCD type, and put them in a junk dimension and connect it to the fact table directly, so that all the historical values and relationship will be preserved along the time series in the fact table.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  Vishy on Thu Mar 01, 2012 2:36 am

If only AgeGroup is giving trouble, you can also think of calculating this at reporting level.

i.e TimeDim.Date-ClientDim.DOB

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  hang on Thu Mar 01, 2012 2:48 am

But you still need a AgeGroup dimension to slice and dice the fact. So you are far better off storing the dimension key or through a mini dimension right in the fact table by ETL.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  Vishy on Thu Mar 01, 2012 4:06 am

Slice and Dice can be done even if agegroup is calculated at the report level.
Report will be come the source for cubes.

I am from Cognos so I can confirm on this. I am not sure what reporting tool you are using and whether itcan take a calculated column. But logically speaking it should be possible as slice n dice happens via cube and cube needs a source, you can make report as a source of it or a query which will have this calculated column.

Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  hang on Thu Mar 01, 2012 8:23 am

I am sure Cognos can be configured to construct an age group dimension for the cube. I am using SSAS cube and there may be a way to configure an age group based on DOB and current date for a SSAS cube as well.

However we are talking about dimensional modeling which is not just feeding data for cube but also for general reporting. The important dimension like age group should be defined in base model and hence shared across the board, including all kinds of reporting methodologies.

Now look at the issue this way. Is it easier and more straightforward to build an age group dimension and connect it to fact table and feed the the schema to the cube than trying to fiddle with cube to achieve the same functionality. I am sure I would not try it in SSAS as I know it is not that simple, although possible.

In my DW experience, the hardest thing is to strike a proper balance between dimensional and relational, normalised and de-normalised, physical and logical, operational and analytical. If a DW project gets these balances right, it will highly likely deliver as expected.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  Vishy on Thu Mar 08, 2012 3:31 am

I totally understand your point Hang, what I meant was when we do dimensional modelling ETL as well as reporting guys take part in that and discussing their comfort level as well ideas that can improve performance helps in overall design that will be delivered.

whatever I project I worked in I had age group as calculated item and not coming from the fact/dim, it helps when you need to change the age group, you have 20-25 years of age group and someday customer wants a different kind of age group, it is very easy to make changes in reporting and just pass it to customer then making changes at dimension definition and run the ETL's again and update the entire dimension.



Vishy

Posts : 75
Join date : 2012-02-28
Location : India

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  hang on Fri Mar 09, 2012 1:02 am

Vishy, I understand your situation about age group, and have similar experience. But I still think the definition should be driven by data rather than code or reporting logic. What I do is to create a standalone type 2 dimension for age group, it's actually the age dimension referenced by the fact table, say AgeKey that covers all the possible ages (1..150). You then have a group attribute (20-25) as type 2, so the changes are limited in the dimension without resetting anything in the fact table.

The point is you should always be able to reproduce any historical report under old definition while reporting using the current definition (by self join on the NK), so that you don't have to mess up the reporting logic from time to time.

Another point about age is that it should not be a type 2 attribute in the mian dimension, neither does it have to be calculated out by report. It can be worked out on the factual record by ETL. Once realising this, you can have the age group logic driven by data inseted of reporting logic.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Mini Dimension Needed?

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum