1、Oozie的用法
- 开发工作流的配置文件
- Azkaban:
- .job来定义每个程序
- .zip来定义工作流
- Oozie:
- workflow.xml:工作流的配置文件,定义工作流
- 放在HDFS上
- xml中会定义action,每个action就是一个job
- 定义DAG
- job.properties:工作流运行的
引导配置文件
- 定义工作流xml文件[workflow.xml]在hdfs上的地址
- 定义一些参数变量
- Node:工作流中的 job
- 在xml中定义多个action构建工作流
控制节点
:Control node- start:程序开始节点
- end:程序结束节点
- kill:强制关闭程序的节点
- fork:分支,多个程序依赖于一个程序
- join:合并,一个程序依赖于多个程序
- workflow.xml:工作流的配置文件,定义工作流
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
<start to action1>
<action name="action1">
实现第一步处理
ok to forknode
error to kill
</action>
<fork name="forknode">
<path start="action2" />
...
<path start="action3" />
</fork>
<action name="action2">
实现处理
ok to joinnode
error to kill
</action>
<action name="action3">
实现处理
ok to joinnode
error to kill
</action>
<join name="joinnode" to="action4" />
...
</workflow-app>
- decision:判断
<decision name="check-output">
<switch>
<case to="action1">
${wf:actionData('shell-node')['my_output'] eq 'Hello Oozie'}
</case>
<case to="action2">
${wf:actionData('shell-node')['my_output'] eq 'Hello Hue'}
</case>
<default to="action3"/>
</switch>
</decision>
<action name="action1">
做第一种处理
ok to
error to
</action>
<action name="action2">
做第二种处理
ok to
error to
</action>
<action name="action3">
做默认的处理
ok to
error to
</action>
- 用于控制程序运行的流程的
动作节点
:Action node- mr程序
- hive程序
- shell脚本
- 就是我要运行的程序
2、官方示例
- 进入oozie的家目录
cd /export/servers/oozie-4.1.0-cdh5.14.0/
- 解压官方示例
tar -zxvf oozie-examples.tar.gz
- 创建自己的案例目录
mkdir usercase
3、Shell action
- 复制官方案例
cp -r examples/apps/shell/ usercase/
- 修改job.properties
#定义一些属性
nameNode=hdfs://node-01:8020
jobTracker=node-03:8032
queueName=default
EXEC=user.sh
#指定工作流文件在HDFS上的地址
oozie.wf.application.path=${nameNode}/user/oozie/app/shell
- 创建hdfs案例目录
hdfs dfs -mkdir -p /user/oozie/app
- 修改workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>${nameNode}/user/oozie/app/shell/${EXEC}#${EXEC}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
- 自定义脚本
vim usercase/shell/user.sh
#!/bin/bash
echo "this is a bigdata class" >> /export/datas/oozie.txt
#添加执行权限
chmod u+x usercase/shell/user.sh
- 上传
hdfs dfs -put usercase/shell /user/oozie/app/
- 运行
- 通过读取本地的job.properties找到对应的HDFS上的workflow.xml地址
bin/oozie job -oozie http://node-01:11000/oozie -config usercase/shell/job.properties -run
- 监控
- 注意:如果运行失败了,怎么检查日志
- 第一步:看oozie中的日志
- 第二步:看yarn上程序的报错
4、MapReduce action
- 复制官方案例
cp -r examples/apps/map-reduce usercase/
- 放入程序jar包
- 修改job.properties
nameNode=hdfs://node-01:8020
jobTracker=node-03:8032
queueName=default
oozie.wf.application.path=${nameNode}/user/oozie/app/map-reduce
- 修改workflow.xml
- 官方的示例程序中,用的都是hadoop1的属性
- 我们实际的运行环境是Hadoop2,必须添加以下属性,启用新的Mr的API
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/wordcount/output-oozie"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.map.class</name>
<value>com.bigdata.hanjiaxiaozhi.mapreduce.wc.WCMapper</value>
</property>
<property>
<name>mapreduce.job.reduce.class</name>
<value>com.bigdata.hanjiaxiaozhi.mapreduce.wc.WCReduce</value>
</property>
<property>
<name>mapreduce.job.maps</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.inputdir</name>
<value>/wordcount/input</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.outputdir</name>
<value>/wordcount/output-oozie</value>
</property>
<property>
<name>mapreduce.map.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
</configuration>
</map-reduce>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
- 上传
hdfs dfs -put usercase/map-reduce /user/oozie/app/
- 运行
bin/oozie job -oozie http://node-01:11000/oozie -config usercase/map-reduce/job.properties -run
- 观察YARN
- 运行了两个MapReduce
- 第一个MapReduce是封装的工作流
- 第二个MapReduce是运行的WordCount
5、Hive action
- 复制官方案例
cp -r examples/apps/hive2 usercase/
- 修改job.properties
nameNode=hdfs://node-01:8020
jobTracker=node-03:8032
queueName=default
jdbcURL=jdbc:hive2://node-03:10000/default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/oozie/app/hive2
- 修改workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.5" name="hive2">
<start to="hive2-node"/>
<action name="hive2-node">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/oozie/app/hive2/output-data/hive2"/>
<mkdir path="/user/oozie/app/hive2/input-table/table"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script.q</script>
<param>INPUT=/user/oozie/app/hive2/input-table/table</param>
<param>OUTPUT=/user/oozie/app/hive2/output-data/hive2</param>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
- 修改SQL脚本
DROP TABLE IF EXISTS test;
CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
insert into table test values(1);
insert into table test values(2);
insert into table test values(3);
INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;
- 上传
hdfs dfs -put usercase/hive2 /user/oozie/app/
- 运行
bin/oozie job -oozie http://node-01:11000/oozie -config usercase/hive2/job.properties -run
6、workFlow
- 复制一个程序
cp -r usercase/map-reduce usercase/flow
- 效果
- 先执行mr
- 然后执行shell
- 最后执行hive
- 修改job.properties
#定义一些要用的变量
nameNode=hdfs://node-01:8020
jobTracker=node-03:8032
queueName=default
#指定workflow.xml文件在hdfs上的地址
oozie.wf.application.path=${nameNode}/user/oozie/app/flow
#定义了hiveserver2地址
jdbcURL=jdbc:hive2://node-03:10000/default
#使用当前系统的环境变量
oozie.use.system.libpath=true
EXEC=user.sh
- 修改workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="map-reduce-wf">
<start to="mr-node"/>
<action name="mr-node">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${nameNode}/wordcount/output-oozie"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.reducer.new-api</name>
<value>true</value>
</property>
<property>
<name>mapred.mapper.new-api</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.map.class</name>
<value>com.bigdata.hanjiaxiaozhi.mapreduce.wc.WCMapper</value>
</property>
<property>
<name>mapreduce.job.reduce.class</name>
<value>com.bigdata.hanjiaxiaozhi.mapreduce.wc.WCReduce</value>
</property>
<property>
<name>mapreduce.job.maps</name>
<value>1</value>
</property>
<property>
<name>mapreduce.input.fileinputformat.inputdir</name>
<value>/wordcount/input</value>
</property>
<property>
<name>mapreduce.output.fileoutputformat.outputdir</name>
<value>/wordcount/output-oozie</value>
</property>
<property>
<name>mapreduce.map.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.map.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
</configuration>
</map-reduce>
<ok to="shell-node"/>
<error to="fail"/>
</action>
<action name="shell-node">
<shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<exec>${EXEC}</exec>
<file>${nameNode}/user/oozie/app/shell/${EXEC}#${EXEC}</file> <!--Copy the executable to compute node's current working directory -->
<capture-output/>
</shell>
<ok to="hive2-node"/>
<error to="fail"/>
</action>
<action name="hive2-node">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="/user/oozie/app/hive2/output-data/hive2"/>
<mkdir path="/user/oozie/app/hive2/input-table/table"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
</configuration>
<jdbc-url>${jdbcURL}</jdbc-url>
<script>script.q</script>
<param>INPUT=/user/oozie/app/hive2/input-table/table</param>
<param>OUTPUT=/user/oozie/app/hive2/output-data/hive2</param>
</hive2>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
- 上传
hdfs dfs -put usercase/flow /user/oozie/app/
- 提交
bin/oozie job -oozie http://node-01:11000/oozie -config usercase/flow/job.properties -run
6、coordinator
- 定时调度
- 基于时间以来关系实现调度
- 使用
- job.properties
- 用于定义变量
- 用于指向workflow.xml的地址
- 用于指向coordinator.xml的地址
- workflow.xml
- 工作流的定义
- coordinator.xml
- 配置时间调度的规律
- 复制官方案例
cp -r examples/apps/cron usercase/
- 修改job.properties
nameNode=hdfs://node-01:8020
jobTracker=node-03:8032
queueName=default
oozie.coord.application.path=${nameNode}/user/oozie/app/cron
start=2020-06-26T16:25+0800
end=2020-06-28T00:00+0800
workflowAppUri=${nameNode}/user/oozie/app/cron
- 修改coordinator.xml
- 第一种方式:指定频率
frequency="${coord:minutes(10)}" 10分钟执行一次
<coordinator-app name="cron-coord" frequency="${coord:minutes(10)}" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
- 第二种:Linux crontab方式
<coordinator-app name="cron-coord" frequency="*/5 * * * *" start="${start}" end="${end}" timezone="GMT+0800"
xmlns="uri:oozie:coordinator:0.2">
<action>
<workflow>
<app-path>${workflowAppUri}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
- 上传
hdfs dfs -put usercase/cron /user/oozie/app/
- 运行
bin/oozie job -oozie http://node-01:11000/oozie -config usercase/cron/job.properties -run