Basics of Oozie and Oozie SHELL action

Our Oozie Tutorials will cover most of the available workflow actions with and without Kerberos authentication.

Let’s have a look at some basic concepts of Oozie.

 

What is Oozie?

Oozie is open source workflow management system. We can schedule Hadoop jobs via Oozie which includes hive/pig/sqoop etc. actions. Oozie provides great features to trigger workflows based on data availability,job dependency,scheduled time etc.

More information about Oozie is available here.

 

 

oozie-arch

 

Oozie Workflow:

Oozie workflow is DAG(Directed acyclic graph) contains collection of actions. DAG contains two types of nodes action nodes and control nodes, action node is responsible for execution of tasks such as MapReduce, Pig, Hive etc. We can also execute shell scripts using action node. Control node is responsible for execution order of actions.

 

MR-Dag-WF

 

Oozie Co-ordinator:

In production systems its necessary to run Oozie workflows on a regular time interval or trigger workflows when input data is available or execute workflows after completion of dependent job. This can be achieved by Oozie co-ordinator job.

 

oozie-coord

 

Oozie Bundle jobs:

Bundle is set of Oozie co-ordinators which gives us better control to start/stop/suspend/resume multiple co-ordinators in a better way.

 

 

Oozie Launcher:

Oozie launcher is map only job which runs on Hadoop Cluster, for e.g. you want to run a hive script, you can just run “hive -f <hql-script-name>” command from any of the edge node, this command will directly trigger hive cli installed on that particular edge node and hive queries mentioned in the hql script will be executed. In case of Oozie this situation is handled differently, Oozie first runs launcher job on Hadoop cluster which is map only job and Oozie launcher will further trigger MapReduce job(if required) by calling client APIs for hive/pig etc. actions as per workflow.xml.

 

Let’s get started with running shell action using Oozie workflow.

 

Step 1: Create a sample shell script and upload it to HDFS

[root@sandbox shell]# cat ~/sample.sh
#!/bin/bash
echo "`date` hi" > /tmp/output

 

hadoop fs -put sample.sh /user/root/

 

Step 2: Create job.properties file according to your cluster configuration.

[root@sandbox shell]# cat job.properties
nameNode=hdfs://<namenode-hostname>:8020
jobTracker=<resource-manager-hostname>:8050
queueName=default
examplesRoot=examples
oozie.wf.application.path=${nameNode}/user/${user.name}

 

Step 3: Create workflow.xml file for your shell action.

<!--
 Licensed to the Apache Software Foundation (ASF) under one
 or more contributor license agreements. See the NOTICE file
 distributed with this work for additional information
 regarding copyright ownership. The ASF licenses this file
 to you under the Apache License, Version 2.0 (the
 "License"); you may not use this file except in compliance
 with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-->
<workflow-app xmlns="uri:oozie:workflow:0.3" name="shell-wf">
 <start to="shell-node"/>
 <action name="shell-node">
 <shell xmlns="uri:oozie:shell-action:0.1">
 <job-tracker>${jobTracker}</job-tracker>
 <name-node>${nameNode}</name-node>
 <configuration>
 <property>
 <name>mapred.job.queue.name</name>
 <value>${queueName}</value>
 </property>
 </configuration>
 <exec>sample.sh</exec>
 <file>/user/root/sample.sh</file>
 </shell>
 <ok to="end"/>
 <error to="fail"/>
 </action>
 <kill name="fail">
 <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
 </kill>
 <end name="end"/>
</workflow-app>

 

Step 4: Upload workflow.xml file created in above step to HDFS at oozie.wf.application.path mentioned in job.properties

hadoop fs -copyFromLocal -f workflow.xml /user/root/

 

Step 5: Submit Oozie workflow by running below command

oozie job -oozie http://<oozie-host>:11000/oozie -config job.properties -run

 

Step 6: Check Oozie UI to get status of workflow.

http://<oozie-host>:11000/oozie

 

Screen Shot 2016-04-04 at 1.06.42 AM

 

If you click on workflow ID area then you will get detailed status of each action ( see below screenshot )

 

Screen Shot 2016-04-04 at 1.07.51 AM

 

If you want to check logs for running action then just click on Action Id 2 i.e. shell-node action followed by Console URL(Click on Magnifier icon at the end of console URL)

 

Screen Shot 2016-04-04 at 1.07.59 AM

 

 

Screen Shot 2016-04-04 at 1.08.16 AM

 

Step 7: Check final output from command line. Please note that you need to execute below command on a nodemanager where your oozie launcher was launched.

[root@sandbox shell]# cat /tmp/output
Sun Apr 3 19:44:52 UTC 2016 hi

 

Please comment if you have any feedback/questions/suggestions. Happy Hadooping!! :)

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
磁共振成像(MRI)是一种非侵入性的医学影像技术,通过使用强磁场和无害的无线电波,生成高分辨率的人体内部图像。MRI的基本原理包括磁共振现象和信号处理。 首先,MRI利用强磁场产生静态磁场。这个强磁场使得人体内的原子(通常是氢)的原子核在磁场中定向,使其自旋沿磁场方向预先排列。 其次,MRI利用无线电频率的脉冲来激发人体内的原子核,使其从平衡状态中倾斜。 然后,MRI探测激发后原子核的归位,这个过程称为回波信号。原子核的归位过程会产生微弱的无线电信号。 接下来,MRI系统采集这些回波信号并进行信号处理。信号处理包括一个复杂的计算过程,其基本目标是确定回波信号的来源和特征。通过这种方式,系统可以构建数百个不同方向上的切片图像。 最后,计算机将切片图像组合在一起,形成三维结构,可以帮助医生观察和分析人体内部组织和器官的详细结构。 MRI是一种非常有用的影像技术,因为它能够提供清晰的高对比度图像,并且不涉及使用有害的X射线。它在医学诊断中广泛应用,特别是用于观察脑部、骨骼、关节、脊椎和内脏器官等结构。同时,MRI还可以用来检测肿瘤、损伤、疾病和其他异常,并且具有较高的准确性和灵敏度。 总之,MRI是一种基于磁共振原理的医学影像技术,可生成高分辨率、非侵入性的人体内部图像,对于医学诊断和研究具有重要意义。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值