Bridge table with SCD on either side

View previous topic View next topic Go down

Bridge table with SCD on either side

Post  remenaker on Wed Apr 25, 2012 9:19 am

I am running into a situation I am not sure how to solve. I have the need for a bridge table due to the fact that my source stores a many-to-many relationship that I have to maintain in my DW. Basically there are three source tables... QUESTIONS, CHOICES, and QUESTION_CHOICES. The QUESTION_CHOICES table is what is driving my bridge table. Each of these has it's own CDC (change data capture) table driven by triggers that I pull data from. In order to preserve history properly, my QUESTIONS and CHOICES are both Type 2 SCD. The problem I am running into is how to properly maintain a bridge table that points to two Type 2 SCDs on either side. Has anyone had experience doing this before that might be able to provide input?

Thanks in advance...
Rob

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  umutiscan on Wed Apr 25, 2012 10:43 am

I guess one response may be related with more than one question.
Your example looks like the classical account-customer problem.
You have to reflect the changes to your bridge table when any change occurs in QUESTION_CHOICES table, or any type 2 update occurs in your dimensions.

umutiscan

Posts : 51
Join date : 2010-11-26
Age : 36
Location : Istanbul, Turkey

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  ngalemmo on Wed Apr 25, 2012 2:04 pm

If you are trying to maintain a history of questions and the choices provided for those questions, you need to create a faceless fact table, not a bridge. You are maintaining a history of states.

If you are recording responses to a test and need to know what the choices were for that particular question, you do not link the question with the choices. The fact table would include both the question and a choice group (representing the list of choices as a multivalued dimension) as dimensions for that particular instance of the exam. The bridge would be between the choice group and the choice dimension.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge table with SCD on either side

Post  umutiscan on Wed Apr 25, 2012 2:45 pm

I think Time-Varying bridge table can be a solution for the problem. Am I wrong?

umutiscan

Posts : 51
Join date : 2010-11-26
Age : 36
Location : Istanbul, Turkey

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  remenaker on Wed Apr 25, 2012 3:09 pm

I have timestamps VALID_FROM and VALID_TO on the bridge table and it sits between dimensions. Perhaps that is wrong. I have never used a factless-fact table.

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  ngalemmo on Wed Apr 25, 2012 3:19 pm

remenaker wrote:I have timestamps VALID_FROM and VALID_TO on the bridge table and it sits between dimensions. Perhaps that is wrong. I have never used a factless-fact table.

It is a matter of semantics more than anything else. A bridge table is defined as a table that sits between a fact table and a dimension. It is used to resolve multi-valued dimensions and hierarchies. Only a fact table may relate multiple dimensions, which is what you are trying to do. So, if you have a fact table that references the question dimension and the choice dimension and include effective and expiration dates as degenerate dimension, you have a factless fact table. What makes it factless is it has no measures, only dimensions.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge table with SCD on either side

Post  remenaker on Thu Apr 26, 2012 9:13 am

I think part of my problem is handling things when someone removes (deletes) a choice from the question in the source system. I get a DELETE record from the CDC_QUESTION_CHOICES table out of the source system and right now I end-date the existing one, and cut a new one with a DELETE flag set. That way people who answered the question when it had more choices is preserved based on the date they answered the question and people who answered after the choice was removed show the question having fewer choices. It's kind of a pain to maintain and I hope I am doing it correctly. It just feels odd to join the answer dimension to the question_choice_bridge...perhaps there is a better way:

Code:

SELECT  du.name
      ,dq.question
      ,dc.choice
      ,da.answer
      ,fqs.score
  FROM  fact_question_score        fqs
      ,dim_user                  du
      ,dim_answer                da
      ,dim_question              dq
      ,dim_choice                dc
      ,dim_question_choice_bridge dqcb
 WHERE  fqs.dim_user_id          = du.dim_user_id
  AND  fqs.dim_question_id      = dq.dim_question_id
  AND  fqs.dim_answer_id        = da.dim_answer_id
  AND  dq.dim_question_id        = dqcb.dim_question_id
  AND  dc.dim_choice_id          = dqcb.dim_choice_id
  AND  da.answer_date      BETWEEN dqcb.valid_from
                                AND dqcb.valid_to
  AND  dqcb.deleted              = 'N'

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  ngalemmo on Thu Apr 26, 2012 1:39 pm

Please re-read my first post on this thread. You are trying to do two things in this fact table, record results and keep a history of test structures. It's not the way you should do it. Also, if this is a structured multiple choice test, why would you need separate answer and choice dimensions? An answer is one of the choices, is it not? Only if you have additional attributes about the answer that cannot be supported by the choice dimension would you consider a separate dimension. So you would have the same dimension serving two roles. Include a 'no answer' row in your choice dimension for those who do not answer a question.

What you need to do in your testing fact table is link to the choices that were actually on the test when the test was given. You don't care about what the choices were before or after that particular test (that information is for another fact table).

Lets' say you have a choice dimension with the following rows:

1 Yes
2 No
3 All of the above
4 What?

When the test was first published, choices 1, 2 and 3 were used for a question. It was later changed to 1, 2 and 4. You would record each choice group in a bridge:

group/choice
1/1
1/2
1/3
2/1
2/2
2/4

The fact table would refer to the specific group that appeared on the test that is being scored. You store the group key on the fact and join through the bridge to get the list of choices that were available to the person taking the test. You do not need dates.



avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Bridge table with SCD on either side

Post  remenaker on Thu Apr 26, 2012 2:20 pm

After reading what you said, this makes a little more sense. I see what you are saying in that I am trying to do two different things in my FACT table and that's probably what's causing me headaches.

So the source system is indeed multiple choice and the one hitch is that the correct answer for a given question can be either one or multiples of the choices. So my question is, how do I store that in the fact table? Would I have a separate fact row that links to each choice dimension row that they selected for a given question? The source system tells me which choices for each question are the correct answers and which ones are incorrect. Just not sure how to model that and I guess and my solution is not very good.

My next question is about how to handle the data as it comes to me to line all of this up. I get the following CDC tables from the source:

The following is the user information:
Code:

CREATE TABLE CDC_OO_AUTH_USER
(
  CDC_AUTH_USER_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, USERNAME VARCHAR2(255)
, FIRST_NAME VARCHAR2(255)
, LAST_NAME VARCHAR2(255)
, EMAIL VARCHAR2(255)
);

This sends the campaign (or test) name and info:
Code:

CREATE TABLE CDC_OO_IVY_SURVEY_CAMPAIGN
(
  CDC_IVY_SURVEY_CAMPAIGN_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, NAME VARCHAR2(255)
, DESCRIPTION VARCHAR2(300)
);

This sends the link between the campaign and it's questions and the order they appear:
Code:

CREATE TABLE CDC_OO_IVY_SRVY_CMPGNQUESTIONS
(
  CDC_IVY_SRVY_CMPGNQSTNS_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, CAMPAIGN_ID INTEGER
, QUESTION_ID INTEGER
, "ORDER" INTEGER
);

This sends all of the possible choices in the system...they can be re-used.
Code:

CREATE TABLE CDC_OO_IVY_SURVEY_CHOICEPOOL
(
  CDC_IVY_SURVEY_CHOICEPOOL_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, TEXT CLOB
);

This links up your question to it's possible choices and indicates which are considered correct along with the order they should appear on the test:
Code:

CREATE TABLE CDC_OO_IVY_SRVY_QSTNCHOICES
(
  CDC_IVY_SRVY_QSTNCHOICES_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, QUESTION_ID INTEGER
, CHOICE_ID INTEGER
, CORRECT CHAR(1)
, "ORDER" INTEGER
);

The answers come from two different tables that are joined together, but the CDC is on each table. The first is a header that indicates which question they answered:
Code:

CREATE TABLE CDC_OO_IVY_SRVY_CHECKBOXANSWER
(
  CDC_IVY_SRVY_CHECKBOXANSWER_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, USER_ID INTEGER
, CAMPAIGN_ID INTEGER
, QUESTION_ID INTEGER
);

Then another CDC table comes over that links to CDC_IVY_SRVY_CHECKBOXANSWER that indicates the choice they picked:
Code:

CREATE TABLE CDC_OO_IVY_SRVY_CHKBXANSWR_CHC
(
  CDC_IVY_SRVY_CHKBXANSWR_CHC_ID INTEGER NOT NULL
, CDC_TIMESTAMP TIMESTAMP NOT NULL
, CDC_OPERATION VARCHAR2(255) NOT NULL
, ID INTEGER
, CHECKBOXANSWER_ID INTEGER
, CHOICEPOOL_ID INTEGER
);

The CDC_OPERATION can be INSERT, UPDATE, or DELETE. Most of my issue is trying to determine how to handle the CDC tables that are joined. DIM_USER, DIM_CAMPAIGN, DIM_QUESTION, and DIM_CHOICE are all easy to manage since they are completely independent of each other. Where I run into problems is the others that are joined which could contain deletes, updates, etc. Should I stage that data before just shoving it into BRIDGE tables like I was trying to do?

Sorry for posting so much info, but I feel like I am back at the drawing board with this


Last edited by remenaker on Thu Apr 26, 2012 4:02 pm; edited 2 times in total (Reason for editing : Formatting)

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  hang on Sat Apr 28, 2012 11:46 pm

I would model the case differently, simply by having a periodic snapshot factless fact table for question-choice corelations as the key solution. The question-choice snapshot would be taken on all the campaign dates and may be connected by a junk dimension with two attributes, IsCorrect and ChoiceOrder as follows:

QuestionChoiceFact

CampaignDateKey
QuestionKey
ChoiceKey
AttributeKey (junk dim)

Your question and choice dimension can still be SCD2. Don't worry about any delete and update on their relationship, as they are naturally reflected by the snapshot.

Your campaign/test/answer table is another snapshot table as follows:

CampaignFact

CampaignDateKey
CampaignKey
PersonKey
QuestionKey
AnsweredChoiceKey

I believe querying against these two simple factless fact tables will give you all the results you want to know about the campaign.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  remenaker on Sun Apr 29, 2012 9:44 am

Hang,

This is the solution I have started modeling and I think it makes far more sense. Shouldn't there be a campaign_key in the QuestionChoiceFact table you designed? And if so, wouldn't I have to cut a brand new DIM_CAMPAIGN record and key anytime a question or choice changed and insert a whole new set of combinations in the QuestionChoiceFact table to preserve the history?

Thanks to everyone for the input...this is helping greatly!

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  hang on Sun Apr 29, 2012 6:26 pm

remenaker wrote:Shouldn't there be a campaign_key in the QuestionChoiceFact table you designed?
I guess there should be, if the campaign dimension has been properly updated before you load the snapshot. It's like a coverage fact. In this case, it helps to query what has not been answered for the campaign, but covered in the QuestionChoiceFact.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  remenaker on Mon Apr 30, 2012 3:25 pm

So I am at the point of trying to figure out how to manage this new FACT_CAMPAIGN_QUESTION table when three different things happen. My FACT_CAMPAIGN_QUESTION table looks like such:

Code:

DIM_CAMPAIGN_ID
DIM_QUESTION_ID
DIM_CHOICE_ID
CHOICE_ORDER
CORRECT_CHOICE

Since my DIM_CAMPAIGN, DIM_QUESTION, and DIM_CHOICE tables are all SCD2, if a change comes from CDC_CAMPAIGN, CDC_QUESTION, or CDC_QUESTION_CHOICE, do I always have to at least make a new DIM_CAMPAIGN SCD2 record even though it might not have changed in order to preserve history? This is where it gets messy for me. The only way I can think of to maintain how the test (campaign) looked whenever something was changed, is to create a whole new DIM_CAMPAIGN record so I can use that new ID. The problem I have with that is that it will happen when a QUESTION or CHOICE update comes over resulting in a new DIM_CAMPAIGN record that just has a new valid date range. Otherwise that new DIM_CAMPAIGN record will be exactly like the version before it.

Am I missing something here?!?! Is there some precedence for a JUNK dimension table just for linking up these group_keys?

remenaker

Posts : 16
Join date : 2011-03-08

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  hang on Mon Apr 30, 2012 6:06 pm

I think you might over complicate the design. Unless snowflaking, don't change any SCD dimension because other SCD dimension has changed. I guess the simplest approach is to keep droping the snapshot for the current campaign and reinserting the snapshot reflecting the current SCD dimension keys until there is no more SCD changes for the relevant campaign. If you partition the snapshot table properly, I don't think performance would be an issue.

Actually, you should be able to produce a snapshot with dimension context as of any point in time. Don't try to reflect every micro SCD changes in the snapshot fact table, you only need to snapshot when your SCD dimensions stablise for the campaign.

hang

Posts : 528
Join date : 2010-05-07
Location : Brisbane, Australia

View user profile

Back to top Go down

Re: Bridge table with SCD on either side

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum