Datawarehousing design questions

View previous topic View next topic Go down

Datawarehousing design questions

Post  RobinvB on Fri Jul 30, 2010 3:43 am

Hello,

We are designing a new datawarehouse based upon a new service platform. My role in the team is not technical; I do not design packages nor do I decide how data is stored. The people in the team that do have a technical role come from a different, not technical background. After reviewing their work I came upon a couple of decisions made that I doubt as being the ideal solution. When discussing these issues with them, I failed to explain to them why (if at all) it might be better to use another solution. So I decided to ask help from professionals so that I can better explain the issues I came upon (if the issues I raise are issues indeed, not just moot points).

The use of indicators
As always, we have a lot of indicators in the datawarehouse. Not only the DWH_Active indicator, but many more (e.g. Gender, Deb/Cred etc) Instead of using a bit-field for this, many different forms have been chosen (Nchar(1), Nvarchar(1), Numeric(1), Char(1)). The reason for not using bit-fields is that a ‘1’ or a ‘0’ has no meaning to an end-user. Is this a valid point (or; for what amount of data is this point valid)? I think it might be better to use a bit-field (smaller, better performance) and use metadata to describe what a ‘1’ means.

The use of views versus tables
A couple of dimensions hold information that have multiple roles. Relation is one such dimension. A relation can either be a supplier or a customer (in our case). So besides DimRelation, two views (not persistent) are created, one for DimSupplier and one for DimCustomer. Is it not better (performance) to use two tables for this? Or is it just as fine to persist the views?

Ntext…
One of the fields used in the datawarehouse is of type ntext in the source. I think it might be wise to convert this field to nvarchar(max) since ntext will not be supported anymore. However, the reply I got was that the field was ntext in the source and thus it will be ntext in the datawarehouse. How can I better explain why not to use ntext in a datawarehouse?

The use of an ‘unknown member’
We use two different levels of the unknown member, -1 unknown and -2 not applicable. At the moment only the key value and the first nvarchar-field of sufficient length in a dimension will have information in them with regard to the unknown member. The other attributes will just be null. I thought that when using an unknown member, you have to assign a distinct value to all attributes or else the aggregation of the fact will not work as intended (if you aggregate over a nullable attribute then the unknown members will be included as well, if the attribute happens to be null). Is this correct?

Nullable fields
Only the primary key is not nullable in each table, all the other attributes are nullable. Even the application primary key is nullable. Is this wise? Or does it not matter?

This is it (for now…). A lot of questions. Any help will be much appreciated.

Regards,
Robin

RobinvB

Posts : 1
Join date : 2010-07-30

View user profile

Back to top Go down

Re: Datawarehousing design questions

Post  BoxesAndLines on Fri Jul 30, 2010 10:12 am

RobinvB wrote:Hello,

...So I decided to ask help from professionals so that I can better explain the issues I came upon (if the issues I raise are issues indeed, not just moot points).
This advice is free and intended solely for entertainment purposes.

RobinvB wrote:
The use of indicators
As always, we have a lot of indicators in the datawarehouse. Not only the DWH_Active indicator, but many more (e.g. Gender, Deb/Cred etc) Instead of using a bit-field for this, many different forms have been chosen (Nchar(1), Nvarchar(1), Numeric(1), Char(1)). The reason for not using bit-fields is that a ‘1’ or a ‘0’ has no meaning to an end-user. Is this a valid point (or; for what amount of data is this point valid)? I think it might be better to use a bit-field (smaller, better performance) and use metadata to describe what a ‘1’ means.
I make all my indicators number(1). In my current DB, you cannot sum a bit which I like to do for my accumulating snapshots. I agree with you.

RobinvB wrote:The use of views versus tables
A couple of dimensions hold information that have multiple roles. Relation is one such dimension. A relation can either be a supplier or a customer (in our case). So besides DimRelation, two views (not persistent) are created, one for DimSupplier and one for DimCustomer. Is it not better (performance) to use two tables for this? Or is it just as fine to persist the views?
I just rolename the FK and let the BI tool manage which columns to display. I wouldn't create two tables though.

RobinvB wrote:Ntext…
One of the fields used in the datawarehouse is of type ntext in the source. I think it might be wise to convert this field to nvarchar(max) since ntext will not be supported anymore. However, the reply I got was that the field was ntext in the source and thus it will be ntext in the datawarehouse. How can I better explain why not to use ntext in a datawarehouse?
How about change it or I'll find someone else to change it for me. I make datatype changes for source columns to EDW all the time. The fact that Ntext is being deprecated makes it a no brainer.

RobinvB wrote:The use of an ‘unknown member’
We use two different levels of the unknown member, -1 unknown and -2 not applicable. At the moment only the key value and the first nvarchar-field of sufficient length in a dimension will have information in them with regard to the unknown member. The other attributes will just be null. I thought that when using an unknown member, you have to assign a distinct value to all attributes or else the aggregation of the fact will not work as intended (if you aggregate over a nullable attribute then the unknown members will be included as well, if the attribute happens to be null). Is this correct?
If both columns are null then they will be aggregated together.

RobinvB wrote:Nullable fields
Only the primary key is not nullable in each table, all the other attributes are nullable. Even the application primary key is nullable. Is this wise? Or does it not matter?

This is it (for now…). A lot of questions. Any help will be much appreciated.

Regards,
Robin
I use NOT NULL constraints on columns were applicable. I like my metrics to be NOT NULL as well as all FK columns. In the dimensions, you are handicapped due to the default row. This row prevents you from enforcing NOT NULL constraints since the default row will normally not have a value.
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum