背景
对于Oozie的理解,我现在也是停留在它是一个job调度系统。至于在有Oozie的情况下,相比之下hadoop的性能是否会有很大的提升,官网中没有给出数据对比,无法直观的看到差异。但相比较没有oozie的hadoop,oozie可以指定workflow,这样在指定的时间段内,根据时间可以触发job,对于经常重复需要运行的作业,还是有很大的方便性。
以下是oozie官网给出的定义:
Oozie is a workflow scheduler system to manage Apache Hadoop jobs.
Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.
Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availabilty.
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).
Oozie is a scalable, reliable and extensible system.
同类型的产品有
Amazon Data Pipeline
、
Simple
Workflow
Engine
, Azkaban
、
Cascading
、
Hamake 。具体是什么状况,暂时没有去调查,不敢妄下断言去指点江山。
Hortonworks Data Platform、Clouder manager 、Amazon