Ooize 串行并行(bundle)定时任务 使用总结(sqoop + shell)

本文包含内容:

一、ooize使用sqoop将oracle导入到hdfs

二、ooize串行定时任务

三、ooize并行定时任务

四、遇到的问题

 

一、ooize使用sqoop将oracle表导入到hdfs

此处在ooize的lib文件夹下需要oracle的OJDBC驱动包, 不然会报错

workflow.xml文件

<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-wmz">
    <start to="sqoop-node"/>
    <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
        <configuration>
                <property>
                  <name>mapred.job.queue.name</name>
                  <value>${queueName}</value>
                </property>
                <property>
                  <name>oozie.sqoop.log.level</name>
                  <value>WARN</value>
                </property>
            </configuration>
        <command>sqoop import --connect jdbc:oracle:thin:@***.***.**.***:1521:orcl --username ** --password ** --table ** --delete-target-dir --target-dir /yss/guzhi/**/** --m 1</command>
    </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Sqoop failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>

job.properties文件

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs

二、ooize串行定时任务

当需求需要导入导出多表或者多个操作时,可以添加多个action, 将多个命令放入一个command或者将多个command写入一个action都会报错

workflow.xml文件 首先通过shell脚本获取当前日期, 再赋值给sqoop的命令, 以当天日期建立文件夹

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="testshell-wmz">
<start to="shell-node"/>
	<action name="shell-node">
		<shell xmlns="uri:oozie:shell-action:0.1">
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<configuration>
				<property>
					<name>mapred.job.queue.name</name>
					<value>${queueName}</value>
				</property>
			</configuration>
			<exec>${shell}</exec>
			<file>${nameNode}/tmp/oracle2hdfs/${shell}#${shell}</file>
			<capture-output/>
		</shell>
		<ok to="sqoop-node"/>
		<error to="fail"/>
	</action>
    <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/guzhi/**/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1 </command>
        </sqoop>
        <ok to="sqoop-node2"/>
        <error to="fail"/>
    </action>

<action name="sqoop-node2">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
          
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1 </command>
        </sqoop>
        <ok to="sqoop-node3"/>
        <error to="fail"/>
    </action>

<action name="sqoop-node3">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
          
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username **--password **--table ***--target-dir /yss/**/**/${wf:actionData('shell-node')['day']}/CSQSXW --delete-target-dir --m 1 </command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
   <end name='end'/>
</workflow-app>

coordinator.xml文件  这里设置的是12小时跑一次

<coordinator-app name="oracleToHdfsBySqoop-wmz" frequency="${coord:hours(12)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.5">
<action>
    <workflow>
        <app-path>${nameNode}/tmp/oracle2hdfs/workflow.xml</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

shell  获取当天日期

#!/bin/sh
day=`date '+%Y%m%d'`
echo "day:$day"

job.properties

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=examples


oozie.service.coord.check.maximum.frequency=false
oozie.coord.application.path=${nameNode}/tmp/oozietest/
start=2018-09-11T16:00Z
end=2018-09-11T16:00Z
workflowAppUri=${oozie.coord.application.path}

因为设置的GML时间, 所以时间上要北京时间-8小时

三、ooize并行任务

当串行action过多时会导致效率过慢,此时可以设置并行执行

这里并行执行用到了bundle组建

workflow1.xml 

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="bundle-wmz">
<start to="shell-node"/>
        <action name="shell-node">
                <shell xmlns="uri:oozie:shell-action:0.1">
                        <job-tracker>${jobTracker}</job-tracker>
                        <name-node>${nameNode}</name-node>
                        <configuration>
                                <property>
                                        <name>mapred.job.queue.name</name>
                                        <value>${queueName}</value>
                                </property>
                        </configuration>
                        <exec>${shell}</exec>
                        <file>${nameNode}/tmp/oracle2hdfs/${shell}#${shell}</file>
                        <capture-output/>
                </shell>
                <ok to="sqoop-node"/>
                <error to="fail"/>
        </action>
         <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
 <command>import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table LSETLIST --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/LSETLIST/ --delete-target-dir --m 1 </command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>
     <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
   <end name='end'/>
</workflow-app>

workflow2.xml

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.5" name="bundle2-wmz">
<start to="shell-node"/>
        <action name="shell-node">
                <shell xmlns="uri:oozie:shell-action:0.1">
                        <job-tracker>${jobTracker}</job-tracker>
                        <name-node>${nameNode}</name-node>
                        <configuration>
                                <property>
                                        <name>mapred.job.queue.name</name>
                                        <value>${queueName}</value>
                                </property>
                        </configuration>
                        <exec>${shell}</exec>
                        <file>${nameNode}/tmp/oracle2hdfs/${shell}#${shell}</file>
                        <capture-output/>
                </shell>
                <ok to="sqoop-node2"/>
                <error to="fail"/>
        </action>
        <action name="sqoop-node2">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>

            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <command>import --connect jdbc:oracle:thin:@***.***.***.**:1521:orcl --username ***--password ***--table CSGDZH --target-dir /yss/guzhi/***/${wf:actionData('shell-node')['day']}/CSGDZH --delete-target-dir --m 1 </command>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>

     <kill name="fail">
        <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
   <end name='end'/>
</workflow-app>

workflow3.xml等以此类推

 

coordinate1.xml

<coordinator-app name="oracleToHdfsBySqoop-wmz" frequency="${coord:hours(12)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.5">
<action>
<workflow>
<app-path>${workflowAppUri1}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

coordinate2.xml

<coordinator-app name="oracleToHdfsBySqoop-wmz" frequency="${coord:hours(12)}" start="${start}" end="${end}" timezone="GMT+0800" xmlns="uri:oozie:coordinator:0.5">
<action>
<workflow>
<app-path>${workflowAppUri2}</app-path>
<configuration>
<property>
<name>jobTracker</name>
<value>${jobTracker}</value>
</property>
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>queueName</name>
<value>${queueName}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

corrdinate3.xml等以此类推

bundle.xml

<bundle-app name='bundle-app' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>

          <coordinator name='cron-bundle1'>
                 <app-path>${coordinator1}</app-path>
          </coordinator>

          <coordinator name='cron-bundle2'>
                 <app-path>${coordinator2}</app-path>
          </coordinator>
</bundle-app>

job.properties

nameNode=hdfs://bj-rack001-hadoop002:8020
jobTracker=bj-rack001-hadoop003:8050
queueName=default
examplesRoot=wmz_test
oozie.libpath=hdfs://bj-rack001-hadoop002:8020/user/oozie/share/lib/sqoop
oozie.use.system.libpath=true
#oozie.wf.application.path=${nameNode}/tmp/oracle2hdfs
shell=getDate.sh

oozie.bundle.application.path=${nameNode}/tmp/oracle2hdfs/bundle.xml

oozie.service.coord.check.maximum.frequency=false
#oozie.coord.application.path=${nameNode}/tmp/bundleTest
start=2018-09-10T16:00Z
end=2028-09-10T16:00Z

workflowAppUri1=${nameNode}/tmp/oracle2hdfs/workflow1.xml
workflowAppUri2=${nameNode}/tmp/oracle2hdfs/workflow2.xml

coordinator1=${nameNode}/tmp/oracle2hdfs/coordinator1.xml
coordinator2=${nameNode}/tmp/oracle2hdfs/coordinator2.xml

oozie job -oozie http://***.***.***.***:11000/oozie -config /data/temp/wmz/shelltest/job.properties -run 执行任务

四、遇到的问题

1、脚本文件起始 若#!/bin/bash无法执行报错,可写为#!/bin/sh

2、之前试过将sqoop操作写入shell, 使用ooize执行shell操作sqoop, 但是shell中的sqoop只能做到list-tables和list-databases,各种import命令都无法执行,至今不知道是什么原因, 单独执行脚本也可以执行, 单独用ooize执行shell和单独用ooize执行sqoop import操作都没问题, 但是结合起来就不行, 很诧异。

  • 2
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

EdwardsWang丶

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值