HiveServer2高可用
HiveMetaStore高可用
官网对metastore的High Availability介绍
The Metastore service is stateless. This allows you to start multiple instances of the service to provide for high availability. It also allows you to configure some clients to embed the metastore (e.g. HiveServer2) while still running a Metastore service for other clients. If you are running multiple Metastore services you can put all their URIs into your client’s
metastore.thrift.uris
value and then setmetastore.thrift.uri.selection
( in Hive 2hive.metastore.uri.selection
) toRANDOM
orSEQUENTIAL
.RANDOM
will cause your client to randomly select one of the servers in the list, whileSEQUENTIAL
will cause it to start at the beginning of the list and attempt to connect to each server in order.
存在两种模式RANDOM
和SEQUENTIAL
,RANDOM
是在配置的uris
列表中随机选中一个metastore作为对外开放的元数据服务,SEQUENTIAL
则是尝试从列表的第一个开始连接,连接正常的作为对外开放的元数据服务
部署Hive
在安装前需要先安装外部存储元数据数据库,一般情况下使用mysql作为元数据数据库(部署方式略),并创建对应hive数据库用户及数据库,并将mysql-connect-java.jar
的驱动放入${HIVE_HOME}/lib目录下
根据集群规划中的角色分配,maper-node1和mapper-node3作为HiveServer2和MetaStore节点,在node1节点上下载指定版本的Hive,并进行解压
# 解压后目录为/opt/soft/hive,配置环境变量
cat > /etc/profile.d/hive.sh << EOF
export HIVE_HOME=/opt/soft/hive
export HIVE_CONF_DIR=$HIVE_HOME/conf
export PATH=$PATH:$HIVE_HOME/bin
EOF
修改Hive相关配置
hive-site.xml
hive.exec.local.scratchdir = /tmp/hive
hive.downloaded.resources.dir = /tmp/hive/${hive.session.id}_resources
hive.exec.dynamic.partition = true # 开启动态分区
hive.exec.dynamic.partition.mode = nonstrict # 非严格模式
hive.metastore.uris = thrift://mapper-node1:9083,thrift://mapper-node3:9083 # 配置多个metastore uri地址
javax.jdo.option.ConnectionURL = jdbc:mysql://${MySQL地址}:3306/hive?createDatabaseIfNotExist=true&useSSL=false&useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai
javax.jdo.option.ConnectionPassword = ${MySQL数据库密码}
javax.jdo.option.ConnectionDriverName = com.mysql.jdbc.Driver # 连接mysql驱动
javax.jdo.option.ConnectionUserName = hive # 连接mysql用户名
hive.querylog.location = /tmp/hive/querylog
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
hive.support.concurrency = true
hive.compactor.worker.threads = 1
# 利用zookeeper高可用
hive.server2.zookeeper.namespace = hiveserver2
hive.zookeeper.quorum = mapper-node1:2181,mapper-node2:2181,mapper-node3:2181
hive.zookeeper.client.port = 2181
hive.server2.support.dynamic.service.discovery = true
hive-env.sh
export HADOOP_HEAPSIZE=4096
export HADOOP_HOME=/opt/soft/hadoop
export HIVE_CONF_DIR=/opt/soft/hive/conf
hive-log4j2.properties
property.hive.log.dir = /data/bigdata/logs/hive
property.hive.log.file = hiveserver2.log
core-site.xml
添加用户以及用户组权限
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
core-site.xml更改后,需要重启hdfs集群后生效
初始化Hive元数据库
schematool -dbType mysql -initSchema
将mapper-node1已配置好的相关hive目录拷贝到mapper-node3,依次启动两节点hiveserver2以及metastore
nohup hive --service hiveserver2 &
nohup hive --service metastore &
启动失败可观察对应服务的日志信息,进行错误定位,启动完成后,查看zookeeper中是否存在hiveserver2的znode
echo 'ls /hiveserver2' | zkCli.sh
验证HiveServer2高可用
通过zookeeper连接hiveserver2,HiveConnection会解析到serviceDiscoveryMode后,通过ZooKeeperHiveClientHelper随机选定一个HiveServer2地址进行连接
beeline -u "jdbc:hive2://mapper-node1:2181,mapper-node2:2181,mapper-node3:2181/;serviceDiscoveryMode=zookeeper;zookeeperNamespace=hiveserver2/default" -n hive -e "select version();"
+--------------------------------------------------+
| _c0 |
+--------------------------------------------------+
| 2.3.9 r92dd0159f440ca7863be3232f3a683a510a62b9d |
+--------------------------------------------------+
验证hiveserver2是否为高可用,杀掉其中任意一个节点的HiveServer2,再重新连接
ps -ef | grep HiveServer2 | grep -v grep | awk '{print $2}' | xargs -i kill -15 {}
# 再通过beeline进行连接,正常情况下会在界面上显示连接到了另一个节点的hiveserver2
连接成功并能够正常操作hive,表示hiveserver2高可用完成
验证Metastore高可用
杀掉任意mapper-node1、mapper-node3任意一节点的metastore服务,再通过beeline进行操作,可以观察日志中连接metastore服务的信息
ps -ef | grep HiveMetaStore | grep -v grep | awk '{print $2}' | xargs -i kill -15 {}
hiveserver2首先去连接mapper-node1(已经被kill的metastore服务),连接失败后去尝试连接另外一节点的metastore服务
参考:https://www.studytime.xin/article/hive-knowledge-install-ha.html