Integrating Hadoop environment with a DW environment

View previous topic View next topic Go down

Integrating Hadoop environment with a DW environment

Post  juanvg1972 on Mon Jan 18, 2016 7:50 am

I want to know the different ways that exists in order to integrate a DW and Hadoop environment.

I mean the typical DW environment: PowerCenter + Teradata + Microstrategy

My first option to integrate a Hadoop cluster with the DW is using Hadoop as a first stage. I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured). The part of my source data that is structured I put via ETL into RDBMS Teradata.
This structured data is the same that I was integrating in the DW, before to add Hadoop to my evironment.

The big data information that I can not treat in my DW will keep in Hadoop. I process and analyze it in Hadoop clusters.
After processing, aggergting and consolidating some big data information will be tranferred to DW in order to enrich it.

My question is:

Is there another possible and usefull arquitecture to integrate typical DW and Hadoop cluster?, I suppose so, any advice will be greatly appreciate.

Thanks in advance

Juan

juanvg1972

Posts : 25
Join date : 2015-05-05

View user profile

Back to top Go down

Re: Integrating Hadoop environment with a DW environment

Post  ngalemmo on Mon Jan 18, 2016 3:40 pm

Well... yeah, there are other ways to do this.

I do not recommend you throw away existing process for the sole purpose of using Hadoop. As in: "I put all my source data (rawdata) in the Hadoop cluster (structured, semi-structutured and non structured).". If you have an existing DW and already have processes in place, why move them to Hadoop?

Each platform provides significant advantage depending on the nature of the data. Structured data performs extremely well in a traditional SQL environment, while unstructured, code driven, processing is very performant in a Hadoop environment.

Both are important components and should be leveraged for their strengths.

For example, you may have Hadoop based analytics that identifies consumer preferences from tweets. This analysis is going to produce structured information which should then be integrated into the relational data warehouse for further use. The original data can be retained on Hadoop or discarded as desired. It would not go to the relational DW.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

DW and Hadoop

Post  juanvg1972 on Mon Jan 18, 2016 5:25 pm

Thanks Galemmo,

I don't mean throw away existing process, My ETL process of my DW remain the same, but the starting point of rawdata is Hadoop clusters instead of a normal server. I don't change my DW process, only the starting point. This way; I can have all my rawdata in Hadoop.

I understand your idea. One question....¿what is code driven data?


Thanks in advance,

juanvg1972

Posts : 25
Join date : 2015-05-05

View user profile

Back to top Go down

Re: Integrating Hadoop environment with a DW environment

Post  ngalemmo on Mon Jan 18, 2016 8:29 pm

Well, if you are using the map/reduce model, you are coding map & reduce classes in Java to do the work you need to do.  The logic and complexity of that work is too much for a typical relational environment.  It is the need to construct these classes is why I refer to it as 'code driven'.  Hadoop is basically a framework to allow you to execute objects of these classes in a massively parallel environment.
avatar
ngalemmo

Posts : 3000
Join date : 2009-05-15
Location : Los Angeles

View user profile http://aginity.com

Back to top Go down

Re: Integrating Hadoop environment with a DW environment

Post  Sponsored content


Sponsored content


Back to top Go down

View previous topic View next topic Back to top

- Similar topics

 
Permissions in this forum:
You cannot reply to topics in this forum