Loading data without key

View previous topic View next topic Go down

Loading data without key

Post  hennie7863 on Thu Nov 12, 2009 8:56 am

For a customer of mine i'm loading messages in a datawarehouse. The messages don't have an id(?!). With this message i want to load some dimensions and the fact. Are there best/Good practices of doing this? Currently i'm thinking of giving these messages a self generated key. Load the data and compare afterwards if the load went ok. So the dimensions are using this key and the fact.

I'm not very happy with this solution. So i hope that someone gives me a better solution. O and we're talking about 100000 messages a day..

Thanks.

hennie7863

Posts : 31
Join date : 2009-10-19

View user profile

Back to top Go down

Re: Loading data without key

Post  ngalemmo on Thu Nov 12, 2009 1:08 pm

The message is the key (sort of...). It really depends on what you want to do with the message. If all you need to do is store it with the facts and be able to spit it back out, the easiest thing to do is store it in a CLOB column in the fact table. What is nice about this is most database systems will place the CLOB in a separate data structure and place a pointer to it in the fact table. Essentially the same thing as creating a separate dimension table but without the extra work. If your query does not include the message you will not get a performance hit from the message.

If you want to include the ability to do efficient keyword searches on the message, then you are looking at a bit more work...
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Loading data without key

Post  hennie7863 on Thu Nov 12, 2009 4:07 pm

Thanx for your reply. I want to extract some data and put it into a dimension, like the sender of the message. Also, i want to store the message for retrieval (showing on the screen for some reason). Yes i want to store the message and my plan was to store it in a dimension as a Varchar(MAX) in SQL Server 2008. My ETL proces is first loading the dimensions, store the surogate key with the application key in the lookup table and use this for loading the foreign keys in the fact. BUT i don't have a key....So i can't lookup for a surogate key. So the only solution i have so far is that i'll keep record of the surrogate key in autonumbering field and do a lookup.

Another problem is that i want to check whether i've already loaded the message. Normally, first i do a fast comparison based on a hash and to make sure i'll compare the fields. Then the record is inserted, updated or marked deleted. But in this case the size of the messages can be 10000 characters and after a couple of years the dimension table can be huge. The process will be performance killer over a couple of years. And i have three more types of messages and another increment of 3 messages. So the problem is multiplied to six.

hennie7863

Posts : 31
Join date : 2009-10-19

View user profile

Back to top Go down

Re: Loading data without key

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum