一、安装部署
1、版本说明
OS | Hadoop | Hive | Tez |
---|---|---|---|
CentOS Linux release 7.6.1810 (Core) | 2.9.2 | 2.3.7 | 0.9.2 |
2、Tez配置
1)下载tar包:
wget https://mirror.bit.edu.cn/apache/tez/0.9.2/apache-tez-0.9.2-bin.tar.gz
2)解压缩:
# 解压缩到指定目录下(如/opt)
tar -zxvf apache-tez-0.9.2-bin.tar.gz -C /opt/
3)将Tez的压缩包上传到HDFS集群上:
# 事先在HDFS上创建好目录
hdfs dfs -mkdir -p /user/tez
# 再上传压缩包到该目录下
hdfs dfs -put /opt/apache-tez-0.9.2-bin/share/tez.tar.gz /user/tez
4)在 $HADOOP_HOME/etc/hadoop/ 目录下创建并配置 tez-site.xml 文件:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<!-- 指定在hdfs上的tez包文件 -->
<property>
<name>tez.lib.uris</name>
<value>hdfs://node1:9000/user/tez/tez.tar.gz</value>
</property>
</configuration>
注意:保存后将tez-site.xml文件复制到Hadoop集群其他节点上,并重启 HDFS 和 YARN !!
5)配置环境变量:
在拥有Tez解压缩文件(apache-tez-0.9.2-bin)的节点上配置/etc/profile
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop/
export TEZ_CONF_DIR=$HADOOP_CONF_DIR
export TEZ_HOME=/opt/apache-tez-0.9.2-bin
export TEZ_JARS=$TEZ_HOME/*:$TEZ_HOME/lib/*
export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH
3、Hive配置
如果想让Hive运行在Tez引擎上,那么可通过如下2种方式配置(推荐第1种):
1)在$HIVE_HOME/conf目录下的 hive-site.xml 文件中增加如下配置:
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
注意:这种方式可以让Hive默认运行在Tez引擎上!!
2)在Hive客户端中使用:
如果在hive或beeline命令行方式下,可首先设置:
set hive.execution.engine=tez;
如果在shell命令中使用,可通过hiveconf参数设置:
hive -hiveconf hive.execution.engine=tez -e "select dt, count(user_id) as user_cnt from t_user group by dt;"
二、问题解决
在测试Hive on Tez时(这里使用的是hive命令行方式),有可能会遇到如下异常:
Query ID = root_20201029144417_71f0139b-99fe-43e2-800a-bc7ad66b982f
Total jobs = 1
Launching Job 1 out of 1
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
再进一步查看hive log,发现抛出如下异常:
2020-10-29T14:58:54,224 ERROR [fda0b896-c9d7-4b73-bfc2-a3119382c59b main] exec.Task: Failed to execute tez graph.
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. Application application_1603954618953_0002 failed 2 times due to AM Container for appattempt_16039
54618953_0002_000002 exited with exitCode: -103
Failing this attempt.Diagnostics: [2020-10-29 14:58:53.980]Container [pid=20868,containerID=container_1603954618953_0002_02_000001] is running beyond virtual memory limits.
Current usage: 168.9 MB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container.
大概意思是超出了虚拟内存的使用限制。
那么我们可以想办法关掉对虚拟内存的检查,修改 $HADOOP_HOME/etc/hadoop/yarn-site.xml 文件:
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
然后,重启YARN后再次测试:
Query ID = root_20201029150856_0d1a43c2-1565-47f1-83e1-a15f7cefd1d7
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1603955107461_0002)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 11.30 s
----------------------------------------------------------------------------------------------
OK
OK,一切正常,问题解决!