部署Hive组件
此文以Hadoop 3.2.2、Hive 3.1.2版本为例!
如未指定,下述命令在所有节点执行!
一、系统资源及组件规划
节点名称 | 系统名称 | CPU/内存 | 网卡 | 磁盘 | IP地址 | OS |
---|---|---|---|---|---|---|
NameNode | namenode | 2C/4G | ens33 | 128G | 192.168.0.11 | CentOS7 |
Secondary NameNode | secondarynamenode | 2C/4G | ens33 | 128G | 192.168.0.12 | CentOS7 |
ResourceManager | resourcemanager | 2C/4G | ens33 | 128G | 192.168.0.13 | CentOS7 |
Worker1 | worker1 | 2C/4G | ens33 | 128G | 192.168.0.21 | CentOS7 |
Worker2 | worker2 | 2C/4G | ens33 | 128G | 192.168.0.22 | CentOS7 |
Worker3 | worker3 | 2C/4G | ens33 | 128G | 192.168.0.23 | CentOS7 |
Hive组件部署在Worker节点上
二、搭建Hadoop集群
Hadoop完全分布式集群搭建过程省略,参考如下:
https://blog.csdn.net/mengshicheng1992/article/details/116757775
三、部署Hive组件
1、安装Metastore
此实验在NameNode节点部署Metastore,以MySQL为例,部署过程省略,参考如下:
https://blog.csdn.net/mengshicheng1992/article/details/115158378
在NameNode节点上配置MySQL权限:
mysql -uroot -pPassWord5.7!
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'PassWord5.7!' WITH GRANT OPTION;
FLUSH PRIVILEGES;
2、安装Hive组件
下载Hive文件:
参考地址:https://hive.apache.org/downloads.html
在Worker节点(Hive节点)上解压Hive安装文件:
tar -xf /root/apache-hive-3.1.2-bin.tar.gz -C /usr/local/
设置环境变量:
export PATH=$PATH:/usr/local/apache-hive-3.1.2-bin/bin/
添加环境变量至/etc/profile文件:
PATH=$PATH:/usr/local/apache-hive-3.1.2-bin/bin/
3、安装JDBC驱动
下载JDBC驱动:
参考地址:https://www.mysql.com/cn/products/connector/
在Worker节点(Hive节点)上解压JDBC驱动,并拷贝到lib目录:
tar -xf /root/mysql-connector-java-8.0.25.tar.gz
cp /root/mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar /usr/local/apache-hive-3.1.2-bin/lib/
4、配置Hive及HiveServer2
在Worker节点(Hive节点)上同步guava文件,解决Hive版本过低问题:
cp /usr/local/hadoop-3.2.2/share/hadoop/common/lib/guava-27.0-jre.jar /usr/local/apache-hive-3.1.2-bin/lib/
rm -rf /usr/local/apache-hive-3.1.2-bin/lib/guava-19.0.jar
在Worker节点(Hive节点)上修改hive-env.sh文件:
cat > /usr/local/apache-hive-3.1.2-bin/conf/hive-env.sh << EOF
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/hadoop-3.2.2/
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/apache-hive-3.1.2-bin/conf/
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/apache-hive-3.1.2-bin/lib/
EOF
在Worker节点(Hive节点)上修改hive-site.xml文件:
cat > /usr/local/apache-hive-3.1.2-bin/conf/hive-site.xml << EOF
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://namenode:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>PassWord5.7!</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>0.0.0.0</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
</configuration>
EOF
修改core-site.xml文件:
cat > /usr/local/hadoop-3.2.2/etc/hadoop/core-site.xml << EOF
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://namenode:9000</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
</configuration>
EOF
修改core-site.xml文件后,需重启Hadoop,重启步骤如下:
在NameNode执行stop-dfs.sh
在ResourceManager执行stop-yarn.sh
在NameNode执行start-dfs.sh
在ResourceManager执行start-yarn.sh
5、启动Hive组件
在任意Worker节点(Hive节点)上初始化数据库:
schematool -dbType mysql -initSchema
在任意Worker节点(Hive节点)上启动Hive Shell
hive
各节点Hive Shell相互独立,各节点数据同步
在任意Worker节点(Hive节点)上启动HiveServer2:
nohup hiveserver2 >/dev/null 2>/dev/null &
各节点HiveServer2进程独立,监听本地地址,各节点数据同步
登录HiveServer2:
http://192.168.0.21:10002
6、Hive组件功能演示
1、Hive Shell演示
创建数据库:
CREATE DATABASE db;
创建表:
USE db;
CREATE TABLE tb (
id int,
name string,
age int
) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;
2、HiveServer2演示
Beeline登录HiveServer2:
beeline
!connect jdbc:hive2://worker1:10000
用户名为root,密码为空
查看数据库:
SHOW DATABASES;
查看表:
USE db;
SHOW TABLES;
创建Hadoop测试文件:
echo 1,user1,21 > hivefile
echo 2,user2,22 >> hivefile
echo 3,user3,23 >> hivefile
echo 4,user4,24 >> hivefile
echo 5,user5,25 >> hivefile
上传文件至数据库目录:
hadoop fs -put /root/hivefile /user/hive/warehouse/db.db/tb
使用Hive Shell和Beeline方式查看上传内容:
SELECT * FROM tb;