Dimensional model for projecting households and population

View previous topic View next topic Go down

Dimensional model for projecting households and population

Post  Janis-EST on Mon May 16, 2011 2:38 am

Hi,

1. There are two type of objects:
Persons
attributes:
Sex, Age (hierarchical), Occupation, Status of Employment, Country of Citizenship (hierarchical) etc. Total nr of attributes is approximately 150

Households
attributes:
Type of private household (hierarchical), Size of private household (hierarchical), enure status of households etc. Total nr of attributes approximately 10

2. Person can be a member of 1..n household (so there's many-to-many relationship between persons and households)

3. Persons have relationships within household, for example "person_1 is the husband of person_2"

------------------

At the moment we have two fact tables
Fact Population
One record per one person, all the attributes are references to dimensions, so there are 150+ dimensions, some of them are hierarchical, but most are just id, code, text

Fact Households
One record per one household, all the attributes are dimensions (10+)

And several bridge tables
Bridge for persons in households
One record per person in houshold.
Attributes:
Person_id
Household_id

Bridge for relationships between persons
One record per one relationships between two persons.
Attributes:
person_1_id
person_2_id
household_id
relationship_type_id (reference to "Dimension Relationship")

------------------

From theory I know at least one reason why it's not correct way to model this, because bridge tables are for managing relationships between dimensions and facts, but what's the way to go here?
Should persons and households be considered as dimensions and relationships between them as facts (but what about attributes then)?

All suggestions are appreciated.

Thanks in advance,
Janis

Janis-EST

Posts : 3
Join date : 2009-02-25

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  Dave Jermy on Mon May 16, 2011 8:33 am

What measures do you have on your fact tables?

As far as the Population fact goes, what you've described is a centipede fact (think of all the dimension joins as legs...), which is not recommended because of the amount of space required to store all the foreign keys (among other reasons). Take a look at reducing the number of dimensions.

First of all, look at all the simple code & description dimensions which only have a small-number of possible values. Things like Sex (two-values), Status of Employment (say, five values, Marital Status (say, four values). A single 'junk' dimension that contains all possible values for those three attributes would have just 40 rows and means you would only need one foreign key in the fact instead of three - a big saving on space with a large number of fact rows. Adding more low-cardinality attributes into the junk dimension will save even more at the cost of a few hundred or a couple of thousand rows in the dimension.

Dave Jermy

Posts : 33
Join date : 2011-03-24
Location : London, UK

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  ngalemmo on Mon May 16, 2011 10:46 am

From theory I know at least one reason why it's not correct way to model this, because bridge tables are for managing relationships between dimensions and facts, but what's the way to go here?

It is called a bridge table if it is managing a relationship between a fact table and a dimension table. If it is managing a relationship between dimension tables it is called a fact table (or factless fact table if there are no measures). Simple, no? They are just labels to specify the role of the table... the basic structure is the same.

Your Population fact sounds fishy... I find it difficult to believe you would need 150+ dimensions to describe a person... 150+ attributes, maybe, but not dimensions. A big wide fact table with 150+ foreign keys is never a good idea. Condsider consolidating low cardinaltity and correlated attributes into junk dimensions as Dave suggests.

It is also not clear why household is included in relationships between persons. What you have is relationships between persons in the same household. If you require more generality, I would omit the household key.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Dimensional model for projecting households and population

Post  Janis-EST on Tue May 17, 2011 4:28 am

Thanks for the replies, I really appreciate.

Dave Jermy wrote:What measures do you have on your fact tables?
There's actually no measures so they are basically factless fact tables.

Dave Jermy wrote:
Adding more low-cardinality attributes into the junk dimension will save even more at the cost of a few hundred or a couple of thousand rows in the dimension.
ngalemmo wrote:
Condsider consolidating low cardinaltity and correlated attributes into junk dimensions as Dave suggests.
I was actually thinking about that but found it difficult to group the attributes into logical groups (junk dimensions) so that model would still be easily understandable for business users. I'm wondering if it makes sense to design one big junk dimension (person attributes) for all the attributes that are low cardinality code-text pairs without hierarchies?


ngalemmo wrote:
It is also not clear why household is included in relationships between persons. What you have is relationships between persons in the same household. If you require more generality, I would omit the household key.
It's there because one person can be a member of 1..n households.

ngalemmo wrote:
It is called a bridge table if it is managing a relationship between a fact table and a dimension table. If it is managing a relationship between dimension tables it is called a fact table (or factless fact table if there are no measures). Simple, no? They are just labels to specify the role of the table... the basic structure is the same.
Do I understand correctly that I should consider persons as dimensions to describe relationships between them? At the moment "Fact population" represents persons so there would be 1-1 relationship between "Dim Person" and "Fact Population" (the same nr of records). Or should the junk dimension "Dim Person Attributes" be designed as dimension containing persons?


Regards,
Janis

Janis-EST

Posts : 3
Join date : 2009-02-25

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  Janis-EST on Thu May 26, 2011 1:53 am

any suggestions?

Thanks,
Janis

Janis-EST

Posts : 3
Join date : 2009-02-25

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  hang on Thu May 26, 2011 8:50 am

The low cardinality Junk dimension has to be separate from person dimension that is of very high cardinality. I imagine your person dimension is a monster dimension (> 1 million rows?), so you should take out all the attributes that you want to keep changes for, and put them in individual dimensions if cardinality is high. However you may put all the low cardinality attributes in junk dimensions regardless of their SCD types, and group them into 2 or 3 groups if there are more than 20 such attributes.

Now the person dimension should only have SCD 1 or very slowly changing attributes that won't grow the dimension alarmingly due to SCD treatment, such as DOB, gender or names etc. You should also snowflake those SCD 1 high cardinality dimensions and junk dimensions by pointing to current profile, so that you can minimise the dimension entries in the fact table.

Your population fact table is likely a periodic snapshot fact table containing almost just FK's to all the dimensions. It is a factless fact table but with a very important implicit measure, the count. Actually many valuable measurements, cubes and KPI's are based on this humble starting point, and the sense of fact (measurement) only becomes more obvious at aggregate levels.

I guess you could have a Family group dimension appearing as FK in a Family member (relationship) bridge table with Person FK and an additional family role type attribute (wife, husband, child) to indicate the relationship. The fact table should have FK for the family group dimension as well so that the fact table also covers the facts on families and the change of families if it is a periodic snapshot table.


hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  Dave Jermy on Thu May 26, 2011 9:34 am

I think you've got it all backwards. If the 'facts' are just occurences of a particular relationship (between a person and a household or between two people), then the tables you've labelled as bridges are actually your fact tables and your fact tables are actually your dimensions.

Dim Person
One record per person, 150 attributes

Dim Household
One record per household, 10 attributes

Dim Relationship Type
One record per type of relationship

Fact Householders
One record for each occurence of a person in a household
Columns:
Person_id
Household_id
Person_Count (always equal to 1)


Fact Householder Relationship
One record per relationship between two people in a household
Columns:
Household_id
Person_1_id
Person_2_id
Relationship_type_id
Relationship_Count (always equal to 1)

A few points to note:
  • Although the fact tables are essentially 'factless', putting a count metric on them allows you to easily answer questions such as 'How many households have people aged 16-24 living in them?' or 'What's the average number of relationships in a suburban home?'.
  • Should the Relationship fact contain two rows per relationship - e.g. one where Person 1 is the mother and Person2 is the daughter and the second where Person 1 is the daughter and Person2 is the mother?
  • You could still reduce the size of the Person dimension by moving some attributes onto a junk dimension. This would be of particular benefit if the dimension is slowly changing - move the attributes that never (or at least, very rarely) change.


Hope that helps.

Dave Jermy

Posts : 33
Join date : 2011-03-24
Location : London, UK

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  hang on Thu May 26, 2011 9:41 pm

Dave, Maybe I am wrong, but I think you might have missed a fundamental point in dimensional modeling, what is the fact table.

Firstly a fact table is the centre of the star schema for a subject area, and it has grains and at least one time series. None of your fact tables has a clear grain and time series.

Secondly a fact table should be built at atomic level. So in this case, the individual person is the most granular level we should base the fact on, meaning one person in each row, instead of in many rows in your fact tables. Its really important to have clear grain in the fact table as you dont risk double-counting your facts. Family or Household are at aggregate level in relation to Person and may be derived through either a denormalised dimension, a bridge table if there is any multi-valued attribute involved and possibly a factless fact table if relationship history needs to be analysed. Having both Person FK and Family FK in the fact can give snapshot picture on person and family level at different point of time, assuming one person is in one family, similar roles played by other aggregate level FKs.

In my understanding, householder is a little different from family as it can hold unrelated persons and there is a further complication about multiple residences for one person. So I would not put family relationships in household. I could have a snapshot coverage factless fact table for householder given the fluid changing nature of residents in the household, but I would not have a physical count metric in the table as it would be misleading. Nevertheless the primary focus is really about the Population fact table.

Lastly a fact table should have a time series (Date Key), even if its a factless fact table, so that you can do OLAP trend analysis over history, otherwise you would just stick to OLTP system for reporting purpose.

The two fact tables you proposed are just relationship tables, and all the counts are not additive. Only the count (=1) metric in the true fact table, the population fact, is additive within a snapshot, as follows:

Dim Person
Person Key (SK)
Person ID (NK)
Surname
Given name
DOB
Other static attributes

Dim Geographic
Geographic Key
States
City


Dim Demographic 1
Demographic 1 key
Gender
Employed
Disabled
Homeless
..

Dim Demographic 2
Demographic 2 key
AgeGroup
Income Band


Dim Family
Family Key (PK)
Primary Contact person /Key
No. of Persons

Dim FamilyPerson
Faimily Key
Person Key
Family Role ( you could snowflake by a Role FK, but we may not need to)

Dim Household
Household Key
Household ID
Contact
Address


Fact Population
Snapshot Date Key
Person Key
Family Key
Primary Household Key
Geographic Key
Demographic 1 key
Demographic 2 key
other FK's ...
Count=1


Fact Household
Snapshot Date key
Household Key
Person key

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Dimensional model for projecting households and population

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum