Hive 安装
1.安装及配置 Hive
(1)把 Hive 的安装包 apache-hive-3.1.2-bin.tar.gz 上传到 Linux 虚拟机的/opt/download目录下, 解压
$ wget https://mirrors.bfsu.edu.cn/apache/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz
$ tar -zxvf apache-hive-3.1.2-bin.tar.gz
(2)将解压的文件夹移动到/opt/pkg下,同时改名为hive
$ mv apache-hive-3.1.2 /opt/pkg/hive
(3)修改/etc/profile.d/env.sh 文件,添加环境变量。
$ vim /etc/profile.d/env.sh
添加以下内容。
# HIVE_HOME 3.1.2
export HIVE_HOME=/opt/pkg/hive
export PATH=PATH:HIVE_HOME/bin
执行以下命令使环境变量生效。
$ source /etc/profile.d/env.sh
(4)进到/opt/module/hive/lib 目录下执行以下命令,解决日志 jar 包冲突。
$ cd /opt/module/hive/lib
$ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
(5)解决guava版本低于hadoop版本的问题
$ cd /opt/module/hive/lib
$ mv guava-19.0.jar guava-19.0.jar.bak
$ ln -s /opt/pkg/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./
删除或改名lib下较低版本的guava的jar包
从hadoop目录将guava的jar包拷贝或者软链接过来
如果不处理,初始化MySQL元数据的时候就会出错。
2.驱动复制
(1)将mysql-connector-java-5.1.27.tar 驱动包拷贝到Hive的lib目录。
$ cp mysql-connector-java-5.1.27.jar /opt/pkg/hive/lib/
3.配置
3.1 启动dfs+yarn
#定位
cd /opt/pkg/hadoop
#启动dfs+yarn
sbin/start-all.sh
3.2 创建HDFS目录并赋予权限
hdfs dfs -mkdir -p /opt/pkg/hive/warehouse
hdfs dfs -mkdir -p /opt/pkg/hive/tmp
hdfs dfs -mkdir -p /opt/pkg/hive/log
hdfs dfs -chmod g+w /opt/pkg/hive/warehouse
hdfs dfs -chmod 777 /opt/pkg/hive/tmp
hdfs dfs -chmod g+w /opt/pkg/hive/log
3.3 配置hive-env.sh
cp hive-env.sh.template hive-env.sh
vi hive-env.sh
export JAVA_HOME=/opt/pkg/jdk1.8.0_211
export HADOOP_HOME=/opt/pkg/hadoop-2.7.3
export HIVE_HOME=/opt/pkg/hive
# HADOOP_HOME=${bin}/../../hadoop
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=$HIVE_HOME/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/opt/pkg/hive/lib/*
vim hive-site.xml
<configuration>
<!-- WARNING!!! This file is auto generated for documentation purposes ONLY! -->
<!-- WARNING!!! Any changes you make to this file will be ignored by Hive. -->
<!-- WARNING!!! You must make your changes in hive-site.xml instead. -->
<!-- Hive Execution Parameters -->
<property>
<name>hive.exec.scratchdir</name>
<value>/opt/pkg/hive/tmp</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/opt/pkg/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/opt/pkg/hive/log</value>
<description>Location of Hive run time structured log file</description>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://:192.168.220.129/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
4.初始化数据库
初始化 Hive 数据库。
schematool -dbType mysql -initSchema
5.启动 Hive
(1)Hive 2.x 以上版本,要先启动 Metastore 和 Hiveserver2 服务,否则会报错。
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
(2)在/opt/pkg/hive/bin 目录下编写 Hive 服务启动脚本,在脚本中启动 Metastore 和 Hiveserver2 服务。
[hadoop@hadoop100 bin]$ vi /opt/pkg/hive/bin/hive-services.sh
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
mkdir -p $HIVE_LOG_DIR
#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
echo $pid
[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
function hive_start()
{
metapid=$(check_process HiveMetastore 9083)
cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
cmd=$cmd" sleep 4; hdfs dfsadmin -safemode wait >/dev/null 2>&1"
[ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
server2pid=$(check_process HiveServer2 10000)
cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
[ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}
function hive_stop()
{
metapid=$(check_process HiveMetastore 9083)
[ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
server2pid=$(check_process HiveServer2 10000)
[ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}
case $1 in
"start")
hive_start
;;
"stop")
hive_stop
;;
"restart")
hive_stop
sleep 2
hive_start
;;
"status")
check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' start|stop|restart|status'
;;
esac
(3)增加脚本执行权限。
$ sudo chmod +x hive-services.sh
(4)启动 Hive 后台服务。
$ hive-services.sh start
(5)查看 Hive 后台服务运行情况。
需要多试几次,因为服务从启动直到进程出现需要等待约一分钟左右时间。
$ hive-services.sh status
Metastore 服务运行正常
HiveServer2 服务运行正常
(6)启动 Hive 客户端。
[hadoop@master bin]$ hive
which: no hbase in (/opt/rh/llvm-toolset-7.0/root/usr/bin:/opt/pkg/hive/bin:/opt/pkg/flume/bin:/opt/pkg/kafka/bin:/opt/pkg/zookeeper/bin:/opt/pkg/hadoop/bin:/opt/pkg/hadoop/sbin:/opt/pkg/maven/bin:/sbin:/opt/pkg/java/bin:/opt/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/hadoop/.local/bin:/home/hadoop/bin)
Hive Session ID = 2b898c67-6677-4e98-8e23-0d70d358d206
Logging initialized using configuration in jar:file:/opt/pkg/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 8aa334ad-253c-4974-bdb2-971b22bd3afb
hive (default)>
6. 日志
hive和beeline工具的日志默认在/tmp/<操作用户名>下生成,
如果希望执行命令时实时看到日志,也可以创建conf/hive-log4j2.properties以及conf/beeline-log4j2.properties(有模板)
-
日志存放位置可以修改
- property.hive.log.dir = /opt/pkg/hive/logs/
property.hive.log.file = hive.log
- property.hive.log.dir = /opt/pkg/hive/logs/
-
默认日志级别是info,会导致hive客户端中输出很多不必要的信息
- 但是修改级别后并没有效果
hive服务器日志可以查看
$ tail -F /opt/pkg/hive/logs/hiveServer2.log
7. 异常处理
-
Hive启动hiveserver2报错:
Could not open client transport with JDBC Uri
解决方案报错信息:
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node1:10000/hive_metadata;user=hadoop: java.net.ConnectException: 拒绝连接 (Connection refused) (state=08S01,code=0) Beeline version 2.3.3 by Apache Hive
原因:hiveserver2增加了权限控制,需要在hadoop的配置文件中配置代理用户
解决方案:
关闭dfs和yarn
$ stop-dfs.sh $ stop-yarn.sh
在hadoop的core-site.xml中添加如下内容(其中hadoop为允许作为代理用户的用户及用户组)
<property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property>
重启dfs
$ start-dfs.sh
重启yarn
$ start-yarn.sh
重新连接Hive
beeline -u jdbc:hive2://hadoop100:10000 -n hadoop -p