Text Analytics and Dimensional Models

View previous topic View next topic Go down

Text Analytics and Dimensional Models

Post  sgrover3 on Wed Apr 20, 2011 11:05 am

I have been doing some research in the Text Analytics / Big Date space. I was wondering if dimensional models have any play in this space?
If yes, how? What is the architecture? What kind of measures / facts would belong in the Fact table?
Is there a real world example of this?

sgrover3

Posts : 8
Join date : 2011-04-14

View user profile

Back to top Go down

Re: Text Analytics and Dimensional Models

Post  ngalemmo on Wed Apr 20, 2011 12:51 pm

Yes and no... If you intend to structure the data in some manner, a dimensional model can perform well. For example, I implemented a dimensional model for a website which included clickstream analysis based on search phrases used to access the site. The phrase was represented as a multivalued dimension of keywords allowing users to analyze visitor behavior based on combinations of keywords used. It also included attributes to determine if access was through a paid link or generic search.

If you need to deal with large volumes of free text, the issue isn't so much the model as it is the effort to parse large volumes of text for the purposes of structuring it. Depending on the resources available to you, parsing may become a bottleneck in the load process. However, parsing and reducing the text to a series of surrogate keys can significantly reduce the data storage requirements.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Text Analytics and Dimensional Models

Post  sgrover3 on Wed Apr 20, 2011 1:10 pm

There are some tools out there that "convert" unstructured data into structured data (tools like attensity). Do we then have the same problem or bottlenec?
The model that you did, did the dimensions store big columns like free text etc or just keywords?

sgrover3

Posts : 8
Join date : 2011-04-14

View user profile

Back to top Go down

Re: Text Analytics and Dimensional Models

Post  ngalemmo on Wed Apr 20, 2011 1:49 pm

There was a phrase table and a keyword table. The phrase table was used to control surrogate key assignment for the multivalued dimension and to reduce the amount of parsing to be done. The source data was search phrases, not long text like documents... they rarely exceeded more than a few words and were often duplicated. The ETL process (including the parsing) was handled using Informatica. Only new phrases were parsed, and after a while the number of new phrases encountered represented a small fraction of all the phrases received.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Text Analytics and Dimensional Models

Post  BoxesAndLines on Wed Apr 20, 2011 7:17 pm

Wouldn't Google be an example of this, big data that is? I don't think they're using dimensional models. There was an a nice article on big data awhile back in Information Management, here's a link
avatar
BoxesAndLines

Posts : 1212
Join date : 2009-02-03
Location : USA

View user profile

Back to top Go down

Re: Text Analytics and Dimensional Models

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum