Data Problems in Datawarehouse

View previous topic View next topic Go down

Data Problems in Datawarehouse

Post  alpsdev on Mon Jul 23, 2012 8:24 am

Hello,

I'm in a scenario where the datawarehouse has 2 or 3 dimensions with duplicate data, and has multiple intermediate databases (not staging) to do the load into final target database.

Issues started when the data sources which belonged to two different companies merged into one (ex: two sources for sales customer data). Both were separately identified with DB1 and DB2 identifier column. Later DB1 data was moved to DB2 in the source system, and all the DB1 data got loaded to the warehouse again with DB2 identifier.

Now while building the analytical cube we have arrived at a workaround by using composite keys , and hence picking non duplicate values only in the analytical database. My questions are

1. Is the approach right?
2. Will this cube be infallible in future, considering the underlying database already has issues.
3. What are the predictable cons that can occur in such design.
4. Is ther any other solution for this?
5. Is building the entire warehouse from scratch a better alternative?

Thanks in advance for your valuable inputs.

alpsdev

Posts : 2
Join date : 2012-02-20

View user profile

Back to top Go down

Re: Data Problems in Datawarehouse

Post  ngalemmo on Thu Jul 26, 2012 1:27 am

Interesting questions.

1. Is the approach right?

Don't know. It would be good to know what the 'composite key' thing is and the general state of the warehouse.

2. Will this cube be infallible in future, considering the underlying database already has issues.

Its the 'issues' comment that causes concern, which is why I hedged the first question. I've never known a cube to be infallible.

3. What are the predictable cons that can occur in such design.

Other than space I would not have a problem having more than one row for a customer if each could be tracked back to the source system they came from. The issue is the attributes you are maintaining for these customers. Keep two sets of attributes for each customer... possibly field a second table using the same primary keys. The original table would contain the actual attributes that were received from an original source system while the companion dimension contains the attributes you would want the business to see... customer attributes that have been standardized to the new system and are common to all rows that represent the same customer. Users will see them as one customer in their queries. Those who need to trace things back to the original source can always use the original dimension version to do forensic analysis.

4. Is ther any other solution for this?

Sure. But it depends on the problem and any background issues, what you are trying obtain, and the value it brings to the business.

5. Is building the entire warehouse from scratch a better alternative?

That is one solution. Better? It depends.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum