重点介绍Pig、Hive使用Tez模式在Oozie上的调度
1.具体介绍
http://dongxicheng.org/mapreduce-nextgen/apache-tez-optimizations/
http://dongxicheng.org/mapreduce-nextgen/apache-tez-newest-progress/
2.Pig with tez
2.1 本地提交(集群支持tez模式)
pig -x tez t.pig
2.2 oozie调度
(1)配置workflow (注意变红的)
<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">
<start to="report1"/>
<action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >
<pig>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<job-xml>${oozie_app_path}/workflow/job.xml</job-xml>
<configuration>
<property>
<name>exectype</name>
<value>tez</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${queue_name}</value>
</property>
</configuration>
<script>script/report_monitor.pig</script>
<param>input1=${input1}</param>
<param>input12=${input12}</param>
<param>house_type=1</param>
<file>lib/udf-1.0.0.jar</file>
<file>conf/hive-site.xml</file>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
(2)将程序运行中需要的jar包,拷贝到workflow下的lib目录下面(以下几个必须要的)
commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar
tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar
一个解决上传jar包的方法,是将所有tez依赖的包传递到集群的share lib中
3.Hive with tez
3.1 本地提交(集群支持tez模式)
在hive脚本中添加
set hive.execution.engine=tez;
将hive的执行引擎设置成tez
3.2 oozie调度
(1)配置workflow (注意变红的)
<workflow-app name="PKL_REPORT_WF" xmlns="uri:oozie:workflow:0.4">
<start to="report1"/>
<action name="report1" retry-max="${retry_max}" retry-interval="${retry_interval}" >
<pig>
<job-tracker>${job_tracker}</job-tracker>
<name-node>${name_node}</name-node>
<job-xml>${oozie_app_path}/workflow/job.xml</job-xml>
<configuration>
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
<property>
<name>tez.lib.uris</name>
<value>${name_node}/user/tez/tez-0.7.0_base_hadoop2.7.1.tar.gz</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>mapreduce.job.queuename</name>
<value>${queue_name}</value>
</property>
</configuration>
<script>script/report_monitor.sql</script>
<param>input1=${input1}</param>
<param>input12=${input12}</param>
<param>house_type=1</param>
<file>lib/udf-1.0.0.jar</file>
<file>conf/hive-site.xml</file>
</pig>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Job failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
(2)将程序运行中需要的jar包,拷贝到workflow下的lib目录下面(以下几个必须要的)
commons-collections4-4.0.jar tez-common-0.7.0.jar tez-runtime-library-0.7.0.jar
tez-api-0.7.0.jar tez-mapreduce-0.7.0.jar
一个解决上传jar包的方法,是将所有tez依赖的包传递到集群的share lib中