oozie调度总结（二）

最新推荐文章于 2021-04-11 20:09:05 发布

macalzheng

最新推荐文章于 2021-04-11 20:09:05 发布

阅读量293

点赞数

分类专栏：大数据 oozie 文章标签： oozie集成大数据

本文链接：https://blog.csdn.net/macalzheng/article/details/105776425

版权

大数据同时被 2 个专栏收录

23 篇文章 5 订阅

订阅专栏

oozie

2 篇文章 0 订阅

订阅专栏

重点介绍Pig、Hive使用Tez模式在Oozie上的调度

1.具体介绍

http://dongxicheng.org/mapreduce-nextgen/apache-tez-optimizations/

http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/

2.Pig with tez

2.1 本地提交（集群支持tez模式）

pig -x tez t.pig

2.2 oozie调度

（1）配置workflow （注意变红的）

<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">

    <start to="report1"/>



    <action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >

        <pig>

            <job-tracker>${job_tracker}</job-tracker>

            <name-node>${name_node}</name-node>

            <job-xml>${oozie_app_path}/workflow/job.xml</job-xml>

            <configuration>

                <property>

                    <name>exectype</name>

                    <value>tez</value>

                </property>

                <property>

                    <name>tez.lib.uris</name>

                    <value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>

                </property>

                <property>

                    <name>tez.use.cluster.hadoop-libs</name>

                    <value>true</value>

                </property>

                <property>

                    <name>mapreduce.job.queuename</name>

                    <value>${queue_name}</value>

                </property>

            </configuration>

            <script>script/report_monitor.pig</script>

            <param>input1=${input1}</param>

           <param>input12=${input12}</param>

            <param>house_type=1</param>

            <file>lib/udf-1.0.0.jar</file>

            <file>conf/hive-site.xml</file>

        </pig>

        <ok to="end"/>

        <error to="fail"/>

    </action>



    <kill name="fail">

        <message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end"/>

</workflow-app>

（2）将程序运行中需要的jar包，拷贝到workflow下的lib目录下面（以下几个必须要的）

commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar

tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar

一个解决上传jar包的方法，是将所有tez依赖的包传递到集群的share lib中

3.Hive with tez

3.1 本地提交（集群支持tez模式）

在hive脚本中添加

set hive.execution.engine=tez;

将hive的执行引擎设置成tez

3.2 oozie调度

（1）配置workflow （注意变红的）

<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">

    <start to="report1"/>



    <action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >

        <pig>

            <job-tracker>${job_tracker}</job-tracker>

            <name-node>${name_node}</name-node>

            <job-xml>${oozie_app_path}/workflow/job.xml</job-xml>

            <configuration>

                <property>

                    <name>hive.execution.engine</name>

                    <value>tez</value>

                </property>

                <property>

                    <name>tez.lib.uris</name>

                    <value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>

                </property>

                <property>

                    <name>tez.use.cluster.hadoop-libs</name>

                    <value>true</value>

                </property>

                <property>

                    <name>mapreduce.job.queuename</name>

                    <value>${queue_name}</value>

                </property>

            </configuration>

            <script>script/report_monitor.sql</script>

             <param>input1=${input1}</param>

           <param>input12=${input12}</param>

            <param>house_type=1</param>

            <file>lib/udf-1.0.0.jar</file>

            <file>conf/hive-site.xml</file>

        </pig>

        <ok to="end"/>

        <error to="fail"/>

    </action>



    <kill name="fail">

        <message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>

    </kill>

    <end name="end"/>

</workflow-app>

（2）将程序运行中需要的jar包，拷贝到workflow下的lib目录下面（以下几个必须要的）

commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar

tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar

一个解决上传jar包的方法，是将所有tez依赖的包传递到集群的share lib中