Tez是一个Hive的运行引擎,性能优于MR.
Tez可以将多个有依赖的作业转换为一个作业,这样只需写一次HDFS,且中间节点较少,从而大大提升作业的计算性能。
1)将tez安装包拷贝到集群,并解压tar包
[louieliu@hadoop102 software]$ mkdir -p /opt/module/tez
[louieliu@hadoop102 software]$ tar -zxvf /opt/software/tez-0.10.1-SNAPSHOT-minimal.tar.gz -C /opt/module/tez
2)上传tez依赖到HDFS
[louieliu@hadoop102 software]$ hadoop fs -mkdir /tez
[louieliu@hadoop102 software]$ hadoop fs -put /opt/software/tez-0.10.1-SNAPSHOT.tar.gz /tez
3)新建tez-site.xml
[louieliu@hadoop102 software]$ vim $HADOOP_HOME/etc/hadoop/tez-site.xml
添加如下内容:
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property><name>tez.lib.uris</name><value>${fs.defaultFS}/tez/tez-0.10.1-SNAPSHOT.tar.gz</value></property><property><name>tez.use.cluster.hadoop-libs</name><value>true</value></property><property><name>tez.am.resource.memory.mb</name><value>1024</value></property><property><name>tez.am.resource.cpu.vcores</name><value>1</value></property><property><name>tez.container.max.java.heap.fraction</name><value>0.4</value></property><property><name>tez.task.resource.memory.mb</name><value>1024</value></property><property><name>tez.task.resource.cpu.vcores</name><value>1</value></property></configuration>
4)修改Hadoop环境变量
[louieliu@hadoop102 software]$ vim $HADOOP_HOME/etc/hadoop/shellprofile.d/tez.sh
添加Tez的Jar包相关信息
hadoop_add_profile tez
function _tez_hadoop_classpath
{
hadoop_add_classpath "$HADOOP_HOME/etc/hadoop" after
hadoop_add_classpath "/opt/module/tez/*" after
hadoop_add_classpath "/opt/module/tez/lib/*" after
}
5)修改Hive的计算引擎
[louieliu@hadoop102 software]$ vim $HIVE_HOME/conf/hive-site.xml
添加
<property><name>hive.execution.engine</name><value>tez</value></property><property><name>hive.tez.container.size</name><value>1024</value></property>
6)如果更换Tez引擎后,执行任务卡住,可以尝试调节容量调度器的资源调度策略
将$HADOOP_HOME/etc/hadoop/capacity-scheduler.xml文件中的
<property><name>yarn.scheduler.capacity.maximum-am-resource-percent</name><value>0.1</value><description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description></property>
改成
<property><name>yarn.scheduler.capacity.maximum-am-resource-percent</name><value>1</value><description>
Maximum percent of resources in the cluster which can be used to run
application masters i.e. controls number of concurrent running
applications.
</description></property>
7)JVM堆内存溢出(tez 需要的内存比较大,并且会对虚拟内存有个校验)
解决:在yarn-site.xml中加入如下代码
<property><name>yarn.nodemanager.vmem-check-enabled</name><value>false</value></property>
8)解决日志Jar包冲突
[louieliu@hadoop102 software]$ rm /opt/module/tez/lib/slf4j-log4j12-1.7.10.jar
2.6. 启动Hive
2.6.1 初始化元数据库
1)登陆MySQL
[louieliu@hadoop102 software]$ mysql -uroot -p123456
2)新建Hive元数据库
mysql>createdatabase metastore;
mysql> quit;3)初始化Hive元数据库
[louieliu@hadoop102 software]$ schematool -initSchema -dbType mysql -verbose
1)启动beeline客户端
[louieliu@hadoop102 hive]$ bin/beeline -u jdbc:hive2://hadoop102:10000 -n louieliu
2)看到如下界面
Connecting to jdbc:hive2://hadoop102:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://hadoop102:10000>
退出: !quit
2.6.4 Hive访问
1)启动hive客户端
[louieliu@hadoop102 hive]$ bin/hive
2)看到如下界面
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/hadoop-3.1.3/bin:/opt/module/hadoop-3.1.3/sbin:/opt/module/hive/bin:/home/louieliu/.local/bin:/home/louieliu/bin)
Hive Session ID = 36f90830-2d91-469d-8823-9ee62b6d0c26
Logging initialized using configuration in jar:file:/opt/module/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive Session ID = 14f96e4e-7009-4926-bb62-035be9178b02
hive>
3)打印 当前库 和 表头
在/opt/module/hive/conf/hive-site.xml中加入如下两个配置:
<property><name>hive.cli.print.header</name><value>true</value><description>Whether to print the names of the columns in query output.</description></property><property><name>hive.cli.print.current.db</name><value>true</value><description>Whether to include the current database in the Hive prompt.</description></property>
2.7 Hive常用交互命令
[louieliu@hadoop102 hive]$ bin/hive -help
usage: hive
-d,--define <key=value> Variable subsitution to apply to hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable subsitution to apply to hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the console)
0)在hive命令行里创建一个表student,并插入1条数据
hive (default)> create table student(id int,name string);
OK
Time taken: 1.291 seconds
hive (default)> insert into student values(1,"zhangsan");
hive (default)> select * from student;
OK
student.id student.name
1 zhangsan
Time taken: 0.144 seconds, Fetched: 1 row(s)
1)“-e”不进入hive的交互窗口执行sql语句
[louieliu@hadoop102 hive]$ bin/hive -e "select id from student;"
2)“-f”执行脚本中sql语句
(1)在/opt/module/hive/下创建datas目录并在datas目录下创建hivef.sql文件
[louieliu@hadoop102 datas]$ vim hivef.sql
(2)文件中写入正确的sql语句
select * from student;
(3)执行文件中的sql语句
[louieliu@hadoop102 hive]$ bin/hive -f /opt/module/hive/datas/hivef.sql
(4)执行文件中的sql语句并将结果写入文件中
[louieliu@hadoop102 hive]$ bin/hive -f /opt/module/hive/datas/hivef.sql > /opt/module/datas/hive_result.txt