前言
本文隶属于专栏《100个问题搞定大数据安装部署》,该专栏为笔者原创,引用请注明来源,不足和错误之处请在评论区帮忙指出,谢谢!
本专栏目录结构和参考文献请见100个问题搞定大数据安装部署
正文
1. 准备 Hive 安装包
Hive 3.x 版本需要搭配 Hadoop 3.x 环境
自行按照教程提示下载 Hive 安装包 或者使用笔者提供的 Hive 安装包——Hive3.1.2安装包
下载hive
下载地址 打开下载地址后,如下图点击 apache-hive-3.1.2-bin.tar.gz 下载
1.2 上传 Hive 安装包
笔者选择 伪分布式三节点集群环境 中的 node2 作为 Hive 安装包的上传地址。
MAC 系统推荐使用 Termius 作为 SSH 客户端工具,Windows 系统推荐使用 xftp/xshell。
向右拖拽
安装包上传成功
1.3 解压 Hive 安装包
1.3.1 把hive的压缩安装包解压到/opt/bigdata/目录下
[root@node2 ~]# tar -zxvf apache-hive-3.1.2-bin.tar -C /opt/bigdata/
gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error is not recoverable: exiting now
使用 tar -zxvf 报错了,我们使用 file 命令查看文件类型
[root@node2 ~]# file apache-hive-3.1.2-bin.tar
apache-hive-3.1.2-bin.tar: POSIX tar archive
不是 gzip 类型,于是我们改成 tar -xvf 命令来解压
[root@node2 ~]# tar -xvf apache-hive-3.1.2-bin.tar -C /opt/bigdata/
apache-hive-3.1.2-bin/LICENSE
apache-hive-3.1.2-bin/NOTICE
apache-hive-3.1.2-bin/RELEASE_NOTES.txt
apache-hive-3.1.2-bin/binary-package-licenses/asm-LICENSE
...
1.3.2 切换到bigdata目录下
[root@node2 ~]# cd /opt/bigdata/
[root@node2 bigdata]#
1.3.3 修改hive安装目录的所属用户和组为hadoop:hadoop
[root@node2 bigdata]# chown -R hadoop:hadoop apache-hive-3.1.2-bin/
[root@node2 bigdata]# ll
总用量 0
drwxr-xr-x. 10 hadoop hadoop 184 6月 19 10:14 apache-hive-3.1.2-bin
drwxr-xr-x. 9 hadoop hadoop 149 2月 25 01:17 hadoop-2.7.3
drwxr-xr-x. 12 hadoop hadoop 190 2月 26 00:28 hadoop-3.2.2
drwxr-xr-x. 14 hadoop hadoop 226 3月 3 00:28 spark-3.2.0
1.3.4 修改hive安装目录的读写权限
[root@node2 bigdata]# chmod -R 755 apache-hive-3.1.2-bin/
2. 内嵌模式
元数据保存在内嵌的 derby 数据库中,只允许一个会话链接,尝试多个会话链接时会报错
2.1 启动 Hive
[root@node2 apache-hive-3.1.2-bin]# ll
总用量 56
drwxr-xr-x. 3 hadoop hadoop 157 6月 19 10:14 bin
drwxr-xr-x. 2 hadoop hadoop 4096 6月 19 10:14 binary-package-licenses
drwxr-xr-x. 2 hadoop hadoop 4096 6月 19 10:14 conf
drwxr-xr-x. 4 hadoop hadoop 34 6月 19 10:14 examples
drwxr-xr-x. 7 hadoop hadoop 68 6月 19 10:14 hcatalog
drwxr-xr-x. 2 hadoop hadoop 44 6月 19 10:14 jdbc
drwxr-xr-x. 4 hadoop hadoop 12288 6月 19 10:14 lib
-rwxr-xr-x. 1 hadoop hadoop 20798 8月 23 2019 LICENSE
-rwxr-xr-x. 1 hadoop hadoop 230 8月 23 2019 NOTICE
-rwxr-xr-x. 1 hadoop hadoop 2469 8月 23 2019 RELEASE_NOTES.txt
drwxr-xr-x. 4 hadoop hadoop 35 6月 19 10:14 scripts
[root@node2 apache-hive-3.1.2-bin]# ./bin/hive
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/java/bin:/opt/maven/bin:/opt/bigdata/hadoop-3.2.2/bin:/opt/bigdata/hadoop-3.2.2/sbin:/opt/bigdata/spark-3.2.0/bin:/opt/bigdata/spark-3.2.0/sbin:/opt/java/bin:/opt/maven/bin:/root/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5099)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:97)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
[root@node2 apache-hive-3.1.2-bin]#
又报错了-_-|| 从报错日志可以明显的看出,原因是 guava jar 包冲突了。(没报错可以跳过这段)
关于 问题解决方案 请参考我的这篇博客——Hive 3.1.2启动报错 guava 版本冲突问题解决
问题解决后顺利进入 Hive 命令行界面
[hadoop@node2 apache-hive-3.1.2-bin]$ ./bin/hive
which: no hbase in (/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/java/bin:/opt/maven/bin:/opt/bigdata/hadoop-3.2.2/bin:/opt/bigdata/hadoop-3.2.2/sbin:/opt/bigdata/spark-3.2.0/bin:/opt/bigdata/spark-3.2.0/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 901d74e6-fa5d-45b5-b949-bebdacb582ed
Logging initialized using configuration in jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
第一次安装先显示下hive中默认有哪些数据库.
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
报错了,此时不要急,跟着我的教程往下走~
2.2 设置 MySQL
2.2.1 修改 hive 配置文件
请参考我的这篇博客在 Cent OS 8 系统上面安装 MySQL 8 ——centos 8 安装 mysql 8
#1.在conf目录下从hive-default.xml.template复制出来一个配置文件hive-site.xml文件
#hive-site.xml文件中配置链接mysql的相关信息
[hadoop@node2 conf]$ cp hive-default.xml.template hive-site.xml
参考如下修改 hive-site.xml 文件
使用vi命令编辑hive-site.xml文件并设置行号,按shift 输入:set number回车发现行号设置成功
<!--这个是用于存放hive元数据的目录位置。 -->
450 <property>
451 <name>hive.metastore.warehouse.dir</name>
452 <value>/user/hive/warehouse</value>
453 <description>location of default database for the warehouse</description>
454 </property>
<!-- JDBC连接字符串,默认jdbc:derby:;databaseName=metastore_db;create=true;-->
582 <property>
583 <name>javax.jdo.option.ConnectionURL</name>
584 <value>jdbc:mysql://localhost/hive_remote?createDatabaseIfNotExist=true</value>
585 <description>
586 JDBC connect string for a JDBC metastore.
587 To use SSL to encrypt/authenticate the connection, provide database-specific SSL flag in the connection URL.
588 For example, jdbc:postgresql://myhost/db?ssl=true for postgres database.
589 </description>
590 </property>
<!--JDBC的driver,默认org.apache.derby.jdbc.EmbeddedDriver; -->
1100 <property>
1101 <name>javax.jdo.option.ConnectionDriverName</name>
1102 <value>com.mysql.jdbc.Driver</value>
1103 <description>Driver class name for a JDBC metastore</description>
1104 </property>
<!-- username,默认APP;-->
1125 <property>
1126 <name>javax.jdo.option.ConnectionUserName</name>
1127 <value>root</value>
1128 <description>Username to use against metastore database</description>
1129 </property>
<!--password,默认mine;-->
567 <property>
568 <name>javax.jdo.option.ConnectionPassword</name>
569 <value>xxxxxxxx</value>
570 <description>password to use against metastore database</description>
571 </property>
2.2.2 下载 mysql 驱动包
2.2.3 将下载的mysql驱动包通过 Termius 上传到 node2,并复制到hive的lib目录下
[root@node2 bigdata]# cd /home/hadoop/
[root@node2 hadoop]# cp mysql-connector-java-8.0.24.jar /opt/bigdata/apache-hive-3.1.2-bin/lib/
[root@node2 hadoop]# chmod 755 /opt/bigdata/apache-hive-3.1.2-bin/lib/mysql-connector-java-8.0.24.jar
[root@node2 hadoop]# chown hadoop:hadoop /opt/bigdata/apache-hive-3.1.2-bin/lib/mysql-connector-java-8.0.24.jar
3. 启动 Hive
[root@node2 bin]# ./schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost/hive_remote?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql
# 等待,此处有大段空白
Initialization script completed
schemaTool completed
[root@node2 bin]#
此时再启动 hive
[hadoop@node2 apache-hive-3.1.2-bin]$ ./bin/hive
which: no hbase in (/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/java/bin:/opt/maven/bin:/opt/bigdata/hadoop-3.2.2/bin:/opt/bigdata/hadoop-3.2.2/sbin:/opt/bigdata/spark-3.2.0/bin:/opt/bigdata/spark-3.2.0/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 50ff8ea3-7040-4266-bbd6-ed1459e03a36
Logging initialized using configuration in jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive> show databases;
FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
hive> show databases;
OK
default
Time taken: 0.672 seconds, Fetched: 1 row(s)
hive>
此时再登录 MySQL 查看 hive 的元数据
[root@node2 bin]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 39
Server version: 8.0.24 MySQL Community Server - GPL
Copyright (c) 2000, 2021, Oracle and/or its affiliates.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| hive_remote |
| information_schema |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql>
到此 hive 已经安装成功。
问题记录
在启动 Hive 的时候出现下面的错误
[hadoop@node2 apache-hive-3.1.2-bin]$ ./bin/hive
which: no hbase in (/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/java/bin:/opt/maven/bin:/opt/bigdata/hadoop-3.2.2/bin:/opt/bigdata/hadoop-3.2.2/sbin:/opt/bigdata/spark-3.2.0/bin:/opt/bigdata/spark-3.2.0/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8
at [row,col,system-id]: [3215,96,"file:/opt/bigdata/apache-hive-3.1.2-bin/conf/hive-site.xml"]
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3040)
at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2989)
at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2864)
at org.apache.hadoop.conf.Configuration.addResourceObject(Configuration.java:1012)
at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:917)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5151)
at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5099)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4jCommon(LogUtils.java:97)
at org.apache.hadoop.hive.common.LogUtils.initHiveLog4j(LogUtils.java:81)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal character entity: expansion character (code 0x8
at [row,col,system-id]: [3215,96,"file:/opt/bigdata/apache-hive-3.1.2-bin/conf/hive-site.xml"]
at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:621)
at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:491)
at com.ctc.wstx.sr.StreamScanner.reportIllegalChar(StreamScanner.java:2456)
at com.ctc.wstx.sr.StreamScanner.validateChar(StreamScanner.java:2403)
at com.ctc.wstx.sr.StreamScanner.resolveCharEnt(StreamScanner.java:2369)
at com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1515)
at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2828)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123)
at org.apache.hadoop.conf.Configuration$Parser.parseNext(Configuration.java:3336)
at org.apache.hadoop.conf.Configuration$Parser.parse(Configuration.java:3130)
at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3023)
... 16 more
这个比较明显,第 3215 行有语法错误,该行代码为描述,直接全部删除即可
[hadoop@node2 apache-hive-3.1.2-bin]$ ./bin/hive
which: no hbase in (/home/hadoop/.local/bin:/home/hadoop/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/java/bin:/opt/maven/bin:/opt/bigdata/hadoop-3.2.2/bin:/opt/bigdata/hadoop-3.2.2/sbin:/opt/bigdata/spark-3.2.0/bin:/opt/bigdata/spark-3.2.0/sbin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/bigdata/hadoop-3.2.2/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = edd7bcb2-dcbe-4dce-a4ca-fefcabf442f2
Logging initialized using configuration in jar:file:/opt/bigdata/apache-hive-3.1.2-bin/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at org.apache.hadoop.fs.Path.initialize(Path.java:263)
at org.apache.hadoop.fs.Path.<init>(Path.java:221)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:710)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:627)
at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.<init>(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:260)
... 12 more
这个是由于 hive-site.xml 文件里面有几个参数没有改,直接全局搜索 “system:java.io.tmpdir” 替换成 自定义的路径。
第一次安装 Hive3.x 版本的,期间遇到了好多问题,庆幸的是最后都解决了,遇到的问题我顺手整理成了一些博客