how to handle unstructured data?
2 posters
Page 1 of 1
how to handle unstructured data?
What approach have other people taken to process and store unstructured data? Inmon calls this textual etl as part of DW 2.0
If we want to process text sources such as documents, emails, facebook, google analytics, etc. what ETL tools/functions would you use to transform the text?
Where would you store the results? Assume we have an existing dimensionally modelled data warehouse, do we create a separate section of tables that reside independently of our dims and facts?
Curious what the rest of the Kimball community are doing...
If we want to process text sources such as documents, emails, facebook, google analytics, etc. what ETL tools/functions would you use to transform the text?
Where would you store the results? Assume we have an existing dimensionally modelled data warehouse, do we create a separate section of tables that reside independently of our dims and facts?
Curious what the rest of the Kimball community are doing...
robber- Posts : 41
Join date : 2009-02-28
Location : Canada
Re: how to handle unstructured data?
I haven't ventured too far into the unstructured world. Unless you're doing big data, the best solution for unstructured data is to add structure. Informatica has screen scrapers, PDF scrapers, or just about any other type of report scraping capabilities that I've used to pull in externally produced soft copy reports. While the whole social media sentiment analysis gets lots of play in the media, I've not seen a whole lot of use in financial services, telecom, or even health care.
BoxesAndLines- Posts : 1212
Join date : 2009-02-03
Location : USA
Similar topics
» Looking for a Data Architect/Data Modeler for NYC Big Data Startup
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
» Is it a best practice that Data warehouse follows the source system data type?
» clickstream fact data coming in with different levels of dimensional geography data
» difference between data mart and data warehouse at logical/physical level
» Reporting table data repository vs. Dimensional data store
» Is it a best practice that Data warehouse follows the source system data type?
Page 1 of 1
Permissions in this forum:
You cannot reply to topics in this forum
|
|