四种工作流的比较

HamakeComparisonWithOtherWorkflowEngines  

Phase-Support
Updated Jun 6, 2010 by vorl.s...@gmail.com

The table below attempts to compare Hamake and similar workflow engines for Hadoop (Oozie, Azkaban, Cascading) based on some key features. Although all of these systems could be used to solve similar problems, they differ significantly in design, philosophy, target user profile, usage scenarios, etc. So our feature-wise comparison is in no way conclusive. Please use it as a guideline, but read respective systems documentation to understand better which one is more suitable for your problem.

Feature Hamake Oozie Azkaban Cascading
workflow discription language XML XML (xPDL based) text file with key/value pairs Java API
dependencies mechanism data-driven explicit explicit explicit
requires Servlet/JSP container No Yes Yes No
allows to track a workflow progress console/log messages web page web page Java API
ability to schedule a Hadoop job execution at given time no yes yes yes
execution model command line utility daemon daemon API
allows to run Pig Latin scripts yes yes yes yes
event notification no no no yes
requires installation no yes yes no
supported Hadoop version 0.18+ 0.20+ currently unknown 0.18+
retries no at workflow node level yes yes
ability to run arbitrary commands yes yes yes yes
can be run on Amazon EMR yes no currently unknown yes

From FAQ:

What is the difference between Hamake and Cascading?

In short: Cascading is an API, while 'hamake' is an utility. Some differences:

  • hamake does not require any custom programming. It helps to automate running your existing Hadoop tasks and PIG scripts
  • We found hamake especially suitable for incremental processing of datasets
  • You can use 'hamake' to automate tasks written in other languages, for example using Hadoop streaming

How Hamake differs from Oozie and Azkaban?

Oozie and Azkaban are server-side systems that have to be installed and run as a service. Hamake is a lightweight client-side utility that does not require installation and has very simple syntax for workflow definition. Most importantly, Hamake is built based on dataflow programming principles - your Hadoop tasks execution sequence is controlled by the data. 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值