前提是你的hive和你的oozie都已经配置好了。hive配置很简单,oozie配置中的问题可以看我的上一篇博客。
针对我装的cm5.7,相对应的目录是:
hive-site.xml:/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/etc/hive/conf.dist/hive-site.xml
例如我的job.properties如下配置:
nameNode=hdfs://cloud171:8020
jobTracker=cloud171:8032(这个是基于yarn的,hadoop的是8021)
queueName=default(job.properties里面的参数会在workflow.xml中引用${queueName})
hiveSitePath=hdfs://cloud171:8020/user/hive/hive-site.xml
oozie.libpath=hdfs://cloud171:8020/user/hive/share/lib(就是上面的hive依赖的jar包put到hdfs上这个)
在Oozie workflow的hive action中,也可以支持hive脚本的参数变量,使用${VARIABLES}
来表示。
以下是官网中对hive action的语法例子:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <action name="[NODE-NAME]"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>[JOB-TRACKER]</job-tracker> <name-node>[NAME-NODE]</name-node> <prepare> <delete path="[PATH]"/> ... <mkdir path="[PATH]"/> ... </prepare> <job-xml>[HIVE SETTINGS FILE]</job-xml> <configuration> <property> <name>[PROPERTY-NAME]</name> <value>[PROPERTY-VALUE]</value> </property> ... </configuration> <script>[HIVE-SCRIPT]</script> <param>[PARAM-VALUE]</param> ... <param>[PARAM-VALUE]</param> <file>[FILE-PATH]</file> ... <archive>[FILE-PATH]</archive> ... </hive> <ok to="[NODE-NAME]"/> <error to="[NODE-NAME]"/> </action> ... </workflow-app>
介绍一下这个语法中有几个参数:
prepare
如果需要在hive作业之前创建或删除HDFS目录,则可以增加prepare
参数,指定需要创建或删除的HDFS路径。job-xml
指定hive-site.xml所在HDFS上的路径;如果是CDH搭建的集群,则可以在任何一台hive gateway机器上的/etc/hive/conf
目录下找到该配置文件。如果不指定该文件路径,hive action就不work。configuration
包含传递给hive作业的参数,可以没有这个配置项,这样就全部使用默认配置script
指定hql脚本所在HDFS上的路径;这个参数是hive action必须的。这个hql脚本中,可以使用${VARIABLES}
来表示参数,获取在hive action中定义的param
参数配置param
定义在hql脚本中所需要的变量值
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.2" name="hive-wf"> <start to="hive-node"/> <action name="hive-node"> <hive xmlns="uri:oozie:hive-action:0.2"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>${nameNode}/user/hive/hive-site.xml</job-xml> <!-- <prepare> <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/hive"/> <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/> </prepare> --> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <script>${nameNode}/user/hive/hive.hql</script> <!-- <param>INPUT=/user/${wf:user()}/${examplesRoot}/input-data/table</param> <param>OUTPUT=/user/${wf:user()}/${examplesRoot}/output-data/hive</param> --> </hive> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Hive failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
重点:oozie作业流程解析
最后执行hive workflow命令如下:
在命令行下执行:oozie job -oozie https://cloud174:11000/oozie -config /opt/job.properties -run
因为我的job.properties是放在/opt下,所以是下面的地址,根据你自己的地址去下
下面是一些客户端的oozie命令可参考下:
1.提交作业,作业进入PREP状态
oozie job -oozie http://localhost:11000/oozie -config job.properties -submit
job: 14-20090525161321-oozie-joe
2.执行已提交的作业
oozie job -oozie
http://localhost:11000/oozie -start 14-20090525161321-oozie-joe
3.直接运行作业
oozie job -oozie http://localhost:11000/oozie -config job.properties -run
4.挂起作业,挂起前状态(RUNNING , RUNNIINGWITHERROR or PREP状态)
workflow job will be in SUSPENDED status.
5.杀死作业
oozie job -oozie http://localhost:11000/oozie -kill 14-20090525161321-oozie-joe
6.改变作业参数,不能修改killed状态的作业
oozie job -oozie http://localhost:11000/oozie -change 14-20090525161321-oozie-joe -value endtime=2011-12-01T05:00Z;concurrency=100;2011-10-01T05:00Z
7.重新运行作业
oozie job -oozie http://localhost:11000/oozie -config job.properties -rerun 14-20090525161321-oozie-joe
000000-130817230824019-oozie-ceny-W
Rerunning a Coordinator Action or Multiple Actions
$oozie job -rerun <coord_Job_id> [-nocleanup] [-refresh]
[-action 1, 3-4, 7-40] (-action or -date is required to rerun.)
[-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]
Rerunning a Bundle Job
oozie job -rerun <bundle_Job_id> [-nocleanup] [-refresh]
[-coordinator c1, c3, c4] (-coordinator or -date is required to rerun.)
[-date 2009-01-01T01:00Z::2009-05-31T23:59Z, 2009-11-10T01:00Z, 2009-12-31T22:00Z]
(if neither -coordinator nor -date is given, the exception will be thrown.)
8.检查作业状态
oozie job -oozie http://localhost:11000/oozie -info 14-20090525161321-oozie-joe
oozie job -oozie http://localhost:11000/oozie -info 0000001-111219170928042-oozie-para-W@mr-node -verbose
9.查看日志
oozie job -oozie http://localhost:11000/oozie -log 14-20090525161321-oozie-joe
oozie job -log <coord_job_id> [-action 1, 3-4, 7-40] (-action is optional.)
10.检查xml文件是否合规
oozie validate myApp/workflow.xml
11.提交pig作业
oozie pig -oozie http://localhost:11000/oozie -file pigScriptFile -config job.properties -X -param_file params
12.提交MR作业
oozie mapreduce -oozie http://localhost:11000/oozie -config job.properties
http://oozie.apache.org/docs-----根据自己的oozie version去查看