Our Oozie Tutorials will cover most of the available workflow actions with and without Kerberos authentication.
Let’s have a look at some basic concepts of Oozie.
What is Oozie?
Oozie is open source workflow management system. We can schedule Hadoop jobs via Oozie which includes hive/pig/sqoop etc. actions. Oozie provides great features to trigger workflows based on data availability,job dependency,scheduled time etc.
More information about Oozie is available here.
Oozie Workflow:
Oozie workflow is DAG(Directed acyclic graph) contains collection of actions. DAG contains two types of nodes action nodes and control nodes, action node is responsible for execution of tasks such as MapReduce, Pig, Hive etc. We can also execute shell scripts using action node. Control node is responsible for execution order of actions.
Oozie Co-ordinator:
In production systems its necessary to run Oozie workflows on a regular time interval or trigger workflows when input data is available or execute workflows after completion of dependent job. This can be achieved by Oozie co-ordinator job.
Oozie Bundle jobs:
Bundle is set of Oozie co-ordinators which gives us better control to start/stop/suspend/resume multiple co-ordinators in a better way.
Oozie Launcher:
Oozie launcher is map only job which runs on Hadoop Cluster, for e.g. you want to run a hive script, you can just run “hive -f <hql-script-name>” command from any of the edge node, this command will directly trigger hive cli installed on that particular edge node and hive queries mentioned in the hql script will be executed. In case of Oozie this situation is handled differently, Oozie first runs launcher job on Hadoop cluster which is map only job and Oozie launcher will further trigger MapReduce job(if required) by calling client APIs for hive/pig etc. actions as per workflow.xml.
Let’s get started with running shell action using Oozie workflow.
Step 1: Create a sample shell script and upload it to HDFS
[root@sandbox shell]# cat ~/sample.sh #!/bin/bash echo "`date` hi" > /tmp/output
hadoop fs -put sample.sh /user/root/
Step 2: Create job.properties file according to your cluster configuration.
[root@sandbox shell]# cat job.properties nameNode=hdfs://<namenode-hostname>:8020 jobTracker=<resource-manager-hostname>:8050 queueName=default examplesRoot=examples oozie.wf.application.path=${nameNode}/user/${user.name}
Step 3: Create workflow.xml file for your shell action.
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>sample.sh</exec>
<file>/user/root/sample.sh</file>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
Step 4: Upload workflow.xml file created in above step to HDFS at oozie.wf.application.path mentioned in job.properties
hadoop fs -copyFromLocal -f workflow.xml /user/root/
Step 5: Submit Oozie workflow by running below command
oozie job -oozie http://<oozie-host>:11000/oozie -config job.properties -run
Step 6: Check Oozie UI to get status of workflow.
http://<oozie-host>:11000/oozie
If you click on workflow ID area then you will get detailed status of each action ( see below screenshot )
If you want to check logs for running action then just click on Action Id 2 i.e. shell-node action followed by Console URL(Click on Magnifier icon at the end of console URL)
Step 7: Check final output from command line. Please note that you need to execute below command on a nodemanager where your oozie launcher was launched.
[root@sandbox shell]# cat /tmp/output Sun Apr 3 19:44:52 UTC 2016 hi