Apache Hive 3.x单机部署

Hive简介

Hive是一个构建在Hadoop上的数据仓库框架,可以将结构化的数据文件映射为一张数据库表,并提供类SQL查询功能。hive能够将SQL转换为MapReduce的任务进行运算,底层由HDFS来提供数据的存储。Hive最初由Facebook开发,后来移交由Apache软件基金会开发,并作为一个Apache开源项目。

在这里插入图片描述
hive依赖hadoop,在内部使用MapReduce框架作为执行查询的实际引擎,使用hdfs存储数据,另外采用外部数据库来存储元数据时依赖mysql等数据库。

下面所有操作在单个服务器节点部署hive。

安装java环境

以二进制方式安装OpenJDK8为例,注意,hive当前3.1.2版本应该不支持java11及以上版本。

wget https://mirrors.tuna.tsinghua.edu.cn/AdoptOpenJDK/8/jdk/x64/linux/OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz

解压安装

mkdir /opt/openjdk
tar -zxvf OpenJDK8U-jdk_x64_linux_hotspot_8u292b10.tar.gz -C /opt/openjdk --strip=1

配置环境变量

cat > /etc/profile.d/openjdk.sh <<'EOF'
export JAVA_HOME=/opt/openjdk
export PATH=$PATH:$JAVA_HOME/bin
EOF

source /etc/profile

确认安装成功

java -version

安装hadoop

以单机伪分布式hadop集群安装为例,下载hadoop二进制文件:

wget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-3.3.0/hadoop-3.3.0.tar.gz

解压安装

tar -zxvf hadoop-3.3.0.tar.gz -C /opt
mv /opt/hadoop-3.3.0 /opt/hadoop

配置环境变量

cat > /etc/profile.d/hadoop.sh <<'EOF'
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
EOF

source /etc/profile

查看hadoop版本

hadoop version

配置SSH免密

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

修改hadoop-env.sh,修改环境变量JAVA_HOME为绝对路径,并将用户指定为root

cat >> /opt/hadoop/etc/hadoop/hadoop-env.sh <<EOF
export JAVA_HOME=$JAVA_HOME
export HDFS_NAMENODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_DATANODE_USER=root
EOF

修改yarn-env.sh修改用户为root

cat >> /opt/hadoop/etc/hadoop/yarn-env.sh <<EOF
export YARN_REGISTRYDNS_SECURE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
EOF

修改hadoop配置文件

修改hadoop core-site.xml配置文件

cat > /opt/hadoop/etc/hadoop/core-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
    </property>
    
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
    </property>
	
    <property>
      <name>hadoop.proxyuser.root.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.root.groups</name>
      <value>*</value>
    </property>	
</configuration>
EOF

修改hadoop hdfs-site.xml配置文件

cat > /opt/hadoop/etc/hadoop/hdfs-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
EOF

修改yarn配置文件

修改yarn mapred-site.xml配置文件

cat > $HADOOP_HOME/etc/hadoop/mapred-site.xml <<'EOF'
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
EOF

修改yarn yarn-site.xmll配置文件

cat > $HADOOP_HOME/etc/hadoop/yarn-site.xml <<'EOF'
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>
EOF

格式化hadoop hdfs文件系统:

hdfs namenode -format

使用systemd管理hadoop服务

cat > /usr/lib/systemd/system/hadoop.service <<EOF  
[Unit]  
Description=hadoop
After=syslog.target network.target remote-fs.target nss-lookup.target network-online.target
Requires=network-online.target
  
[Service]  
User=root
Group=root
Type=forking
ExecStart=/opt/hadoop/sbin/start-all.sh  
ExecStop=/opt/hadoop/sbin/stop-all.sh  
WorkingDirectory=/opt/hadoop/
TimeoutStartSec=1min
Restart=no
RestartSec=30
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target
EOF

启动hadoop服务,并设置为开启启动

systemctl enable --now hadoop

查看hadoop服务运行状态

[root@master ~]# systemctl status hadoop
● hadoop.service - hadoop
   Loaded: loaded (/usr/lib/systemd/system/hadoop.service; enabled; vendor preset: disabled)
   Active: active (exited) since Wed 2021-06-23 11:31:50 CST; 1h 17min ago
  Process: 309739 ExecStop=/opt/hadoop/sbin/stop-all.sh (code=exited, status=0/SUCCESS)
  Process: 318250 ExecStart=/opt/hadoop/sbin/start-all.sh (code=exited, status=0/SUCCESS)
 Main PID: 318250 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 49791)
   Memory: 0B
   CGroup: /system.slice/hadoop.service

Jun 23 11:31:39 master start-all.sh[318250]: Starting resourcemanager
Jun 23 11:31:39 master su[319099]: (to root) root on none
Jun 23 11:31:39 master su[319099]: pam_unix(su-l:session): session opened for user root by (uid=0)
Jun 23 11:31:39 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:23 CST 2021
Jun 23 11:31:41 master su[319099]: pam_unix(su-l:session): session closed for user root
Jun 23 11:31:41 master start-all.sh[318250]: Starting nodemanagers
Jun 23 11:31:42 master su[319186]: (to root) root on none
Jun 23 11:31:42 master su[319186]: pam_unix(su-l:session): session opened for user root by (uid=0)
Jun 23 11:31:42 master start-all.sh[318250]: Last login: Wed Jun 23 11:31:39 CST 2021
Jun 23 11:31:50 master systemd[1]: Started hadoop.

查看启动进程

[root@master ~]# jps
318613 DataNode
318470 NameNode
319159 ResourceManager
319293 NodeManager
782663 Jps
318826 SecondaryNameNode

浏览Web界面以访问NameNode:

http://localhost:9870/

安装MySQL

Hive需将元数据保存在数据库中,则在安装Hive前,需先安装数据库软件,并在hive-site.xml中配置数据库相关信息。常见的数据库有Derby、MySQL、MariaDB等,本文档使用docker基于MySQL部署,其余数据库部署方式类似。

安装docker

curl -fsSL https://get.docker.com | bash -s docker --mirror Aliyun

# 启动docker服务
systemctl enable --now docker

# 查看docker服务运行状态
systemctl status dockre

启动mysql数据库

docker run -d --name mysql \
    --restart always \
    -p 3306:3306 \
    -e MYSQL_ROOT_PASSWORD=123456 \
    -v mysql:/var/lib/mysql \
    mysql

使用主机名连接数据库,需要配置hosts解析,IP地址为本机IP:

echo '172.31.112.2 dbserver' >> /etc/hosts

安装hive

下载hive:

wget https://archive.apache.org/dist/hive/hive-3.1.2/apache-hive-3.1.2-bin.tar.gz

解压安装

tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt
mv /opt/apache-hive-3.1.2-bin /opt/hive

配置环境变量

cat >> /etc/profile.d/hive.sh <<'EOF'
export HIVE_HOME=/opt/hive
export PATH=$HIVE_HOME/bin:$PATH
EOF

source /etc/profile

修改hive配置文件

修改hive-env.sh 配置文件

cp /opt/hive/conf/{hive-env.sh.template,hive-env.sh}

cat > /opt/hive/conf/hive-env.sh <<EOF
export JAVA_HOME=/opt/openjdk
export HADOOP_HOME=/opt/hadoop
export HIVE_CONF_DIR=/opt/hive/conf
EOF

复制hive-site.xml文件:

cp /opt/hive/conf/{hive-default.xml.template,hive-site.xml}

修改hive-site.xml文件相关参数的值

cat >/opt/hive/conf/hive-site.xml<<'EOF'
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://dbserver:3306/hive?createDatabaseIfNotExist=true</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.cj.jdbc.Driver</value>
  </property>
  
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
  </property>

  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
  </property>

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/tmp/hive</value>
  </property>

  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/tmp/${hive.session.id}_resources</value>
  </property>

  <property>
    <name>hive.querylog.location</name>
    <value>/tmp/hive</value>
  </property>

  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/tmp/hive/operation_logs</value>
  </property>

  <property>
    <name>hive.server2.enable.doAs</name>
    <value>FALSE</value>
  </property>  
</configuration>
EOF

配置文件参考:https://github.com/apache/hive/blob/master/data/conf/hive-site.xml

启动与验证Hive

1、启动Hive前的准备工作,下载对应的JDBC驱动,并放置于/opt/hive/lib目录。

mysql下载链接:https://dev.mysql.com/downloads/connector/j/

wget https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-8.0.25.tar.gz
tar -zxvf mysql-connector-java-8.0.25.tar.gz
cp mysql-connector-java-8.0.25/mysql-connector-java-8.0.25.jar $HIVE_HOME/lib

创建Hive数据存放目录。

hadoop fs -mkdir /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod g+w /tmp
hadoop fs -chmod g+w /user/hive/warehouse

2、初始化Hive,先处理guava版本问题,升级到新版

mv /opt/hive/lib/guava-19.0.jar{,.bak}
cp /opt/hadoop/share/hadoop/hdfs/lib/guava-27.0-jre.jar /opt/hive/lib/

执行初始化

schematool -dbType mysql -initSchema

使用hive命令交互

直接执行hive命令:

[root@master ~]# hive
which: no hbase in (/opt/hive/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/hadoop/bin:/opt/hadoop/sbin:/opt/openjdk/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 2b569685-2897-45ff-b66a-cb6a530fc28d

Logging initialized using configuration in jar:file:/opt/hive/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = 71a5cbbc-3947-4c3c-aba8-3d92a5dce41a
hive>

在 Mysql 中也能看到 Hive 创建的库和存放元数据信息的表

[root@master ~]# docker exec -it mysql bash
root@2fb8b0ccd87c:/# mysql -uroot -p123456
mysql> show databases;
mysql> use hive;
mysql> show tables;

使用Beeline Client交互

Hive 内置了 HiveServer 和 HiveServer2 服务,两者都允许客户端使用多种编程语言进行连接,但是 HiveServer 不能处理多个客户端的并发请求,因此产生了 HiveServer2。HiveServer2(HS2)允许远程客户端可以使用各种编程语言向 Hive 提交请求并检索结果,支持多客户端并发访问和身份验证。HS2 是由多个服务组成的单个进程,其包括基于 Thrift 的 Hive 服务(TCP 或 HTTP)和用于 Web UI 的 Jetty Web 服务。
HiveServer2 拥有自己的 CLI 工具——Beeline。Beeline 是一个基于 SQLLine 的 JDBC 客户端。由于目前 HiveServer2 是 Hive 开发维护的重点,所以官方更加推荐使用 Beeline 而不是 Hive CLI。以下主要讲解 Beeline 的配置方式。

1、修改Hadoop配置

修改 hadoop 集群的 core-site.xml 配置文件,增加如下后两段配置,指定 hadoop 的 root 用户可以代理本机上所有的用户。

cat > /opt/hadoop/etc/hadoop/core-site.xml <<EOF
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
      <name>fs.defaultFS</name>
      <value>hdfs://localhost:9000</value>
    </property>
    
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/opt/hadoop/tmp</value>
    </property>
	
    <property>
      <name>hadoop.proxyuser.root.hosts</name>
      <value>*</value>
    </property>
    
    <property>
      <name>hadoop.proxyuser.root.groups</name>
      <value>*</value>
    </property>	
</configuration>
EOF

之所以要配置这一步,是因为 hadoop 2.0 以后引入了安全伪装机制,使得 hadoop 不允许上层系统(如 hive)直接将实际用户传递到 hadoop 层,而应该将实际用户传递给一个超级代理,由该代理在 hadoop 上执行操作,以避免任意客户端随意操作 hadoop。如果不配置这一步,在之后的连接中可能会抛出AuthorizationException异常。

2、systemd管理metastore服务

cat > /etc/systemd/system/hive-meta.service <<EOF
[Unit] 
Description=Hive metastore 
After=network.target 
 
[Service] 
User=root
Group=root
ExecStart=/opt/hive/bin/hive --service metastore 
 
[Install] 
WantedBy=multi-user.target
EOF

启动hive metastore服务,并设为开机启动

systemctl enable --now hive-meta

查看metasotre服务运行状态

[root@master ~]# systemctl status hive-meta
● hive-meta.service - Hive metastore
   Loaded: loaded (/etc/systemd/system/hive-meta.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-06-23 15:29:15 CST; 38min ago
 Main PID: 1008705 (java)
    Tasks: 17 (limit: 49791)
   Memory: 225.6M
   CGroup: /system.slice/hive-meta.service
           └─1008705 /opt/openjdk/bin/java -Dproc_jar -Dproc_metastore -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.fi>

Jun 23 15:29:15 master systemd[1]: hive-meta.service: Main process exited, code=exited, status=143/n/a
Jun 23 15:29:15 master systemd[1]: hive-meta.service: Failed with result 'exit-code'.
Jun 23 15:29:15 master systemd[1]: Stopped Hive metastore.
Jun 23 15:29:15 master systemd[1]: Started Hive metastore.
Jun 23 15:29:17 master hive[1008705]: 2021-06-23 15:29:17: Starting Hive Metastore Server
Jun 23 15:29:17 master hive[1008705]: SLF4J: Class path contains multiple SLF4J bindings.
Jun 23 15:29:17 master hive[1008705]: SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.>
Jun 23 15:29:17 master hive[1008705]: SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/>
Jun 23 15:29:17 master hive[1008705]: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 23 15:29:17 master hive[1008705]: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

查看端口,如下所示即为启动成功

# netstat -anp|grep 9083
tcp6       0      0 :::9083                 :::*                    LISTEN      1008705/java

3、systemd管理hive-server2服务

cat > /etc/systemd/system/hive-server2.service <<EOF
[Unit] 
Description=hive-server2
After=network.target 
 
[Service] 
User=root
Group=root
ExecStart=/opt/hive/bin/hive --service hiveserver2
 
[Install] 
WantedBy=multi-user.target
EOF

启动hive-server2服务,并设为开机启动

systemctl enable --now hive-server2

查看hive-server2服务运行状态

[root@master ~]# systemctl status hive-server2.service 
● hive-server2.service - hive-server2
   Loaded: loaded (/etc/systemd/system/hive-server2.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-06-23 15:29:22 CST; 39min ago
 Main PID: 1008846 (java)
    Tasks: 36 (limit: 49791)
   Memory: 359.0M
   CGroup: /system.slice/hive-server2.service
           └─1008846 /opt/openjdk/bin/java -Dproc_jar -Dproc_hiveserver2 -Dlog4j.configurationFile=hive-log4j2.properties -Djava.util.logging.config.>

Jun 23 15:29:27 master hive[1008846]: SLF4J: Class path contains multiple SLF4J bindings.
Jun 23 15:29:27 master hive[1008846]: SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.>
Jun 23 15:29:27 master hive[1008846]: SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/>
Jun 23 15:29:27 master hive[1008846]: SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
Jun 23 15:29:27 master hive[1008846]: SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Jun 23 15:29:39 master hive[1008846]: Hive Session ID = 252cab7f-5eff-4401-848e-8822642869b3
Jun 23 15:29:59 master hive[1008846]: Hive Session ID = aebd77dd-7d13-4d35-a8d6-10ee4ee55504
Jun 23 15:31:01 master hive[1008846]: Hive Session ID = 2200bb4e-9556-40bd-abf3-377c344ac9fe
Jun 23 15:31:02 master hive[1008846]: Hive Session ID = 77eb124b-af8f-449e-b60d-29710cb39f1f
Jun 23 15:34:40 master hive[1008846]: OK

查看端口,如下所示即为启动成功

# netstat -anp|grep 10000
tcp6 0 0 :::10000 :::* LISTEN 27800/java

4、使用beeline连接,回显信息如下:

[root@master ~]# beeline -u jdbc:hive2://dbserver:10000
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://dbserver:10000
Connected to: Apache Hive (version 3.1.2)
Driver: Hive JDBC (version 3.1.2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 3.1.2 by Apache Hive
0: jdbc:hive2://dbserver:10000> 

查看已创建的数据库,验证功能如下即成功。

0: jdbc:hive2://dbserver:10000> show databases;
INFO  : Compiling command(queryId=root_20210623174333_193ee409-d927-4c46-a899-47217d128b4e): show databases
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20210623174333_193ee409-d927-4c46-a899-47217d128b4e); Time taken: 0.059 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20210623174333_193ee409-d927-4c46-a899-47217d128b4e): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20210623174333_193ee409-d927-4c46-a899-47217d128b4e); Time taken: 0.09 seconds
INFO  : OK
INFO  : Concurrency mode is disabled, not creating a lock manager
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (0.459 seconds)

5、退出Hive界面。

quit;
评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

willops

你的鼓励将是我创作的最大动力!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值