简介:这是大数据里安装hive比较详细的教程,包括安装hive时hadoop的相关配置,hive安装,hive 之后的mysql安装,hive的元素配置到mysql相关配置,hive常见属性的相关配置。
注意:在安装hive之前,请确保您的hadoop集群模式已经安装完毕并配置好。
目录
hive相关内容官网地址
hive的官网地址请点击这里
hive的官网下载地址请点击这里
hive的官方文档查看请点击这里
hive的github地址请点击这里
安装之前的hadoop相关配置
ps:如果您的hadoop没有安装配置好,可以参考我的另外一篇博客:基于Linux(最小安装)平台的hadoop完全分布式系统搭建详细搭建过程
1、在hadoop的配置文件(core-site.xml)里增加如下配置
<!-- 配置该lqs(superUser)允许通过代理访问的主机节点 -->
<property>
<name>hadoop.proxyuser.lqs.hosts</name>
<value>*</value>
</property>
<!-- 配置该lqs(superUser)允许通过代理用户所属组 -->
<property>
<name>hadoop.proxyuser.lqs.groups</name>
<value>*</value>
</property>
<!-- 配置该lqs(superUser)允许通过代理的用户-->
<property>
<name>hadoop.proxyuser.lqs.users</name>
<value>*</value>
</property>
2、在hadoop的配置文件(yarn-site.xml)里增加如下配置
<!-- NodeManager使用内存数,默认8G,修改为4G内存 -->
<property>
<description>Amount of physical memory, in MB, that can be allocated for containers. If set to -1 andyarn.nodemanager.resource.detect-hardware-capabilities is true, it isautomatically calculated(in case of Windows and Linux).In other cases, the default is 8192MB.
可以分配的物理内存量,以 MB 为单位
用于容器。 如果设置为 -1 和yarn.nodemanager.resource.detect-hardware-capabilities 是真的,它是自动计算(在 Windows 和 Linux 的情况下)。
在其他情况下,默认值为 8192MB。
</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<!-- 容器最小内存,默认512M -->
<property>
<description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.
RM 上每个容器请求的最小分配(以 MB 为单位)。 低于此值的内存请求将设置为此属性的值。 此外,配置为内存少于此值的节点管理器将被资源管理器关闭。
</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<!-- 容器最大内存,默认8G,修改为4G -->
<property>
<description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.
RM 上每个容器请求的最大分配(以 MB 为单位)。 高于此值的内存请求将引发 InvalidResourceRequestException。
</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<!-- 虚拟内存检查,默认打开,修改为关闭 -->
<property>
<description>Whether virtual memory limits will be enforced forcontainers.
是否对容器实施虚拟内存限制。
</description>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
注意:两个文件修改完毕后,记得分发,然后重启集群
安装hive
1、解压apache-hive-3.1.2-bin.tar.gz到/home/lqs/module/目录下面
[lqs@bdc112 module]$ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /home/lqs/module/
2、修改解压后的apache-hive-3.1.2-bin名字为hive
[lqs@bdc112 module]$ mv apache-hive-3.1.2-bin/ hive
3、修改/etc/profile.d/my_env.sh,添加hive的环境变量
[lqs@bdc112 module]$ sudo vim /etc/profile.d/my_env.sh
添加以下配置信息:
#HIVE_HOME
export HIVE_HOME=/home/lqs/module/hive
export PATH=$PATH:$HIVE_HOME/bin
4、解决日志jar包冲突
[lqs@bdc112 module]$ cd hive/lib/
[lqs@bdc112 lib]$ pwd
/home/lqs/module/hive/lib
[lqs@bdc112 lib]$ mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
5、初始化hive元数据库
[lqs@bdc112 hive]$ bin/schematool -dbType derby -initSchema
启动和使用hive
1、启动hive
[lqs@bdc112 hive]$ bin/hive
2、使用hive
#查看数据库
hive> show databases;
OK
default
Time taken: 0.545 seconds, Fetched: 1 row(s)
#查看数据表
hive> show tables;
OK
Time taken: 0.048 seconds
#创建数据表test
hive> create table test(id int);
OK
Time taken: 0.627 seconds
#查看数据表
hive> show tables;
OK
test
Time taken: 0.034 seconds, Fetched: 1 row(s)
# 向数据表test中插入值666
hive> insert into test values(666);
Query ID = lqs_20211219191533_25b60265-76a8-486f-9bbe-1cd0aa3d6fca
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1639912074286_0001, Tracking URL = http://bdc113:8088/proxy/application_1639912074286_0001/
Kill Command = /home/lqs/module/hadoop-3.1.3/bin/mapred job -kill job_1639912074286_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-12-19 19:15:48,311 Stage-1 map = 0%, reduce = 0%
2021-12-19 19:15:56,670 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.61 sec
2021-12-19 19:16:02,860 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.5 sec
MapReduce Total cumulative CPU time: 7 seconds 500 msec
Ended Job = job_1639912074286_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://bdc112:8020/user/hive/warehouse/test/.hive-staging_hive_2021-12-19_19-15-33_239_8332044782395566127-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 7.5 sec HDFS Read: 12690 HDFS Write: 205 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 500 msec
OK
Time taken: 31.221 seconds
#查看来自表test的全部值
hive> select * from test;
OK
666
Time taken: 0.177 seconds, Fetched: 1 row(s)
3、在xshell窗口中开启另一个窗口开启Hive,在/tmp/lqs目录下监控hive.log文件
具体操作步骤见我的另外一篇博客:[lqs@bdc112 hive]$ rm -rf derby.log metastore_db
[lqs@bdc112 hive]$ hadoop fs -rm -r /user
mysql安装
1、检查当前系统是否安装过Mysql
[lqs@bdc112 hive]$ rpm -qa | grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
# 如果存在,用下面的命令删除
[lqs@bdc112 hive]$ sudo rpm -e --nodeps mariadb-libs
#再次查看已经没有了
[lqs@bdc112 hive]$ rpm -qa | grep mariadb
[lqs@bdc112 hive]$
2、解压mysql安装包
[lqs@bdc112 software]$ mkdir mysql-lib-sources
[lqs@bdc112 software]$ tar -xf mysql-5.7.28-1.el7.x86_64.rpm-bundle.tar -C mysql-lib-sources
3、在mysql-lib-sources目录下执行rpm安装
注意:mysql的安装,请安装我下面的安装顺序安装!!!
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-common-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-common-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-common-5.7.28-1.e################################# [100%]
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-libs-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-libs-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-libs-5.7.28-1.el7################################# [100%]
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-libs-compat-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-libs-compat-5.7.2################################# [100%]
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-client-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-client-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-client-5.7.28-1.e################################# [100%]
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-server-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
错误:依赖检测失败:
libaio.so.1()(64bit) 被 mysql-community-server-5.7.28-1.el7.x86_64 需要
libaio.so.1(LIBAIO_0.1)(64bit) 被 mysql-community-server-5.7.28-1.el7.x86_64 需要
libaio.so.1(LIBAIO_0.4)(64bit) 被 mysql-community-server-5.7.28-1.el7.x86_64 需要
#这里报错是因为的linux是最小安装,需要在安装依赖即可
[lqs@bdc112 mysql-lib-sources]$ sudo yum install -y libaio
已加载插件:fastestmirror
Determining fastest mirrors
epel/x86_64/metalink | 5.2 kB 00:00:00
* base: mirrors.aliyun.com
* epel: mirror.sjtu.edu.cn
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
base | 3.6 kB 00:00:00
epel | 4.7 kB 00:00:00
extras | 2.9 kB 00:00:00
updates | 2.9 kB 00:00:00
epel/x86_64/primary_db FAILED B/s | 1.5 MB --:--:-- ETA
http://ftp.riken.jp/Linux/fedora/epel/7/x86_64/repodata/f3f0a7c647d09319af31f75720ff0ac5d209036393c09581c0bc281b3c7bf80c-primary.sqlite.bz2: [Errno 14] HTTP Error 404 - Not Found
正在尝试其它镜像。
To address this issue please refer to the below wiki article
https://wiki.centos.org/yum-errors
If above article doesn't help to resolve this issue please use https://bugs.centos.org/.
(1/3): epel/x86_64/updateinfo | 1.0 MB 00:00:00
(2/3): updates/7/x86_64/primary_db | 13 MB 00:00:02
epel/x86_64/primary_db FAILED
http://ftp.jaist.ac.jp/pub/Linux/Fedora/epel/7/x86_64/repodata/f3f0a7c647d09319af31f75720ff0ac5d209036393c09581c0bc281b3c7bf80c-primary.sqlite.bz2: [Errno 14] HTTP Error 404 - Not Found
正在尝试其它镜像。
(3/3): epel/x86_64/primary_db | 7.0 MB 00:00:00
正在解决依赖关系
--> 正在检查事务
---> 软件包 libaio.x86_64.0.0.3.109-13.el7 将被 安装
--> 解决依赖关系完成
依赖关系解决
===========================================================================================================
Package 架构 版本 源 大小
===========================================================================================================
正在安装:
libaio x86_64 0.3.109-13.el7 base 24 k
事务概要
===========================================================================================================
安装 1 软件包
总下载量:24 k
安装大小:38 k
Downloading packages:
libaio-0.3.109-13.el7.x86_64.rpm | 24 kB 00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
警告:RPM 数据库已被非 yum 程序修改。
正在安装 : libaio-0.3.109-13.el7.x86_64 1/1
验证中 : libaio-0.3.109-13.el7.x86_64 1/1
已安装:
libaio.x86_64 0:0.3.109-13.el7
完毕!
[lqs@bdc112 mysql-lib-sources]$ sudo rpm -ivh mysql-community-server-5.7.28-1.el7.x86_64.rpm
警告:mysql-community-server-5.7.28-1.el7.x86_64.rpm: 头V3 DSA/SHA1 Signature, 密钥 ID 5072e1f5: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-server-5.7.28-1.e################################# [100%]
4、删除/etc/my.cnf文件中datadir指向的目录下的所有内容,如果有内容的情况下
查看datadir的值
[lqs@bdc112 mysql-lib-sources]$ cat /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
删除/var/lib/mysql目录下的所有内容
[lqs@bdc112 mysql-lib-sources]$ cd /var/lib/mysql
[lqs@bdc112 mysql]$ sudo rm -rf *
5、初始化数据库
[lqs@bdc112 ~]$ sudo mysqld --initialize --user=mysql
6、查看临时生成的root用户的密码
[lqs@bdc112 ~]$ sudo cat /var/log/mysqld.log
7、启动MySQL服务
[lqs@bdc112 ~]$ sudo systemctl start mysqld
#查看mysql服务的状态
[lqs@bdc112 ~]$ sudo systemctl status mysqld
● mysqld.service - MySQL Server
Loaded: loaded (/usr/lib/systemd/system/mysqld.service; enabled; vendor preset: disabled)
Active: active (running) since 日 2021-12-19 20:26:52 CST; 8s ago
Docs: man:mysqld(8)
http://dev.mysql.com/doc/refman/en/using-systemd.html
Process: 5292 ExecStart=/usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid $MYSQLD_OPTS (code=exited, status=0/SUCCESS)
Process: 5274 ExecStartPre=/usr/bin/mysqld_pre_systemd (code=exited, status=0/SUCCESS)
Main PID: 5295 (mysqld)
CGroup: /system.slice/mysqld.service
└─5295 /usr/sbin/mysqld --daemonize --pid-file=/var/run/mysqld/mysqld.pid
12月 19 20:26:51 bdc112 systemd[1]: Starting MySQL Server...
12月 19 20:26:52 bdc112 systemd[1]: Started MySQL Server.
8、登录MySQL数据库
[lqs@bdc112 ~]$ mysql -u root -p
Enter password: #这里输入刚刚的临时密码
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.28
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql>
9、必须先修改root用户的密码,否则执行其他的操作会报错
mysql> set password = password("912811");
Query OK, 0 rows affected, 1 warning (0.00 sec)
10、修改mysql库下的user表中的root用户允许任意ip连接
mysql> update mysql.user set host='%' where user='root';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
hive元数据配置到mysql
拷贝驱动
#将MySQL的JDBC驱动拷贝到Hive的lib目录下
[lqs@bdc112 lib]$ cp /home/lqs/software/mysql-connector-java-5.1.37.jar ./
配置Metastore到mysql
1、在$HIVE_HOME/conf目录下新建hive-site.xml文件
[lqs@bdc112 conf]$ vim hive-site.xml
添加如下内容
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!-- jdbc连接的URL -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://bdc112:3306/metastore?useSSL=false</value>
</property>
<!-- jdbc连接的Driver-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<!-- jdbc连接的username-->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<!-- jdbc连接的password -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>912811</value>
</property>
<!-- Hive默认在HDFS的工作目录 -->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
</configuration>
2、登录mysql
[lqs@bdc112 conf]$ mysql -u root -p
3、新建Hive元数据库
mysql> create database metastore;
Query OK, 1 row affected (0.00 sec)
mysql> quit;
4、初始化Hive元数据库
[lqs@bdc112 conf]$ schematool -initSchema -dbType mysql -verbose
最后出现一下内容则代表初始化成功
再次启动Hive
1、启动hive
[lqs@bdc112 hive]$ bin/hive
2、使用hive
hive> show databases;
OK
default
Time taken: 0.578 seconds, Fetched: 1 row(s)
hive> show tables;
OK
Time taken: 0.045 seconds
hive> create table test (id int);
OK
Time taken: 0.563 seconds
hive> show tables;
OK
test
Time taken: 0.048 seconds, Fetched: 1 row(s)
#向test表插入值1
hive> insert into test values(1);
Query ID = lqs_20211219204400_e606cdc4-700f-4811-99af-20cbfcbd2d86
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1639912074286_0002, Tracking URL = http://bdc113:8088/proxy/application_1639912074286_0002/
Kill Command = /home/lqs/module/hadoop-3.1.3/bin/mapred job -kill job_1639912074286_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-12-19 20:44:13,205 Stage-1 map = 0%, reduce = 0%
2021-12-19 20:44:24,640 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 12.23 sec
2021-12-19 20:44:29,823 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 14.57 sec
MapReduce Total cumulative CPU time: 14 seconds 570 msec
Ended Job = job_1639912074286_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://bdc112:8020/user/hive/warehouse/test/.hive-staging_hive_2021-12-19_20-44-00_691_7023649240777723976-1/-ext-10000
Loading data to table default.test
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 14.57 sec HDFS Read: 12686 HDFS Write: 199 SUCCESS
Total MapReduce CPU Time Spent: 14 seconds 570 msec
OK
Time taken: 30.778 seconds
#查看刚刚插入的内容
hive> select * from test;
OK
1
Time taken: 0.19 seconds, Fetched: 1 row(s)
使用元数据服务的方式访问Hive
1、在hive-site.xml文件中添加如下配置信息
<!-- 指定存储元数据要连接的地址 -->
<property>
<name>hive.metastore.uris</name>
<value>thrift://bdc112:9083</value>
</property>
2、启动metastore
[lqs@bdc112 hive]$ hive --service metastore
注意: 启动后窗口不能再操作,需打开一个新的shell窗口做别的操作
3、启动hive
[lqs@bdc112 hive]$ bin/hive
使用JDBC方式访问Hive
1、在hive-site.xml文件中添加如下配置信息
<!-- 指定hiveserver2连接的host -->
<property>
<name>hive.server2.thrift.bind.host</name>
<value>bdc112</value>
</property>
<!-- 指定hiveserver2连接的端口号 -->
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
</property>
<!-- hiveserver2的高可用参数,开启此参数可以提高hiveserver2的启动速度 -->
<property>
<name>hive.server2.active.passive.ha.enable</name>
<value>true</value>
</property>
2、启动hiveserver2
[lqs@bdc112 hive]$ bin/hive --service hiveserver2
3、启动beeline客户端
[lqs@bdc112 hive]$ bin/beeline -u jdbc:hive2://bdc112:10000 -n lqs
如果这里报错:
Connecting to jdbc:hive2://bdc112:10000
21/12/19 21:05:23 [main]: WARN jdbc.HiveConnection: Failed to connect to bdc112:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://bdc112:10000: java.net.ConnectException: 拒绝连接 (Connection refused) (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive
请参考我的另外一篇博客:Connecting to jdbc:hive2://bdc112:10000 21/12/19 21:05:23 [main]: WARN jdbc.HiveConnection: Failed t
看到如下界面则配置成功
编写hive服务启动脚本
1、相关命令字段说明:
(1) 前台启动的方式导致需要打开多个shell窗口,可以使用如下方式后台方式启动
nohup: 放在命令开头,表示不挂起,也就是关闭终端进程也继续保持运行状态
/dev/null:是Linux文件系统中的一个文件,被称为黑洞,所有写入改文件的内容都会被自动丢弃
2>&1 : 表示将错误重定向到标准输出上
&: 放在命令结尾,表示后台运行
一般会组合使用: nohup [xxx命令操作]> file 2>&1 &,表示将xxx命令运行的结果输出到file中,并保持命令启动的进程在后台运行。
如上命令可以不掌握。
[lqs@bdc112 hive]$ nohup hive --service metastore 2>&1 &
[lqs@bdc112 hive]$ nohup hive --service hiveserver2 2>&1 &
2、编写脚本来管理hive相关服务的启动和关闭
[lqs@bdc112 conf]$ cd /home/lqs/bin/
[lqs@bdc112 bin]$ touch myhiveservices.sh
[lqs@bdc112 bin]$ vim myhiveservices.sh
然后输入以下内容:
#!/bin/bash
HIVE_LOG_DIR=$HIVE_HOME/logs
if [ ! -d $HIVE_LOG_DIR ]
then
mkdir -p $HIVE_LOG_DIR
fi
#检查进程是否运行正常,参数1为进程名,参数2为进程端口
function check_process()
{
pid=$(ps -ef 2>/dev/null | grep -v grep | grep -i $1 | awk '{print $2}')
ppid=$(netstat -nltp 2>/dev/null | grep $2 | awk '{print $7}' | cut -d '/' -f 1)
echo $pid
[[ "$pid" =~ "$ppid" ]] && [ "$ppid" ] && return 0 || return 1
}
function hive_start()
{
metapid=$(check_process HiveMetastore 9083)
#2>&1 日志的合并,把错误和正确的日志合并到一起
cmd="nohup hive --service metastore >$HIVE_LOG_DIR/metastore.log 2>&1 &"
[ -z "$metapid" ] && eval $cmd || echo "Metastroe服务已启动"
server2pid=$(check_process HiveServer2 10000)
cmd="nohup hive --service hiveserver2 >$HIVE_LOG_DIR/hiveServer2.log 2>&1 &"
[ -z "$server2pid" ] && eval $cmd || echo "HiveServer2服务已启动"
}
function hive_stop()
{
metapid=$(check_process HiveMetastore 9083)
[ "$metapid" ] && kill $metapid || echo "Metastore服务未启动"
server2pid=$(check_process HiveServer2 10000)
[ "$server2pid" ] && kill $server2pid || echo "HiveServer2服务未启动"
}
case $1 in
"start")
hive_start
;;
"stop")
hive_stop
;;
"restart")
hive_stop
sleep 2
hive_start
;;
"status")
check_process HiveMetastore 9083 >/dev/null && echo "Metastore服务运行正常" || echo "Metastore服务运行异常"
check_process HiveServer2 10000 >/dev/null && echo "HiveServer2服务运行正常" || echo "HiveServer2服务运行异常"
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' start|stop|restart|status'
;;
esac
3、添加执行权限
[lqs@bdc112 bin]$ chmod +x myhiveservices.sh
4、后台启动hive服务
[lqs@bdc112 bin]$ myhiveservices.sh start
hive常用交货命令
1、查看hive相关命令
[lqs@bdc112 hive]$ bin/hive -help
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/lqs/module/jdk1.8.0_212/bin:/home/lqs/module/hadoop-3.1.3/bin:/home/lqs/module/hadoop-3.1.3/sbin:/home/lqs/module/apache-maven-3.6.3/bin:/home/lqs/module/zookeeper-3.5.7/bin:/home/lqs/module/hive/bin:/home/lqs/.local/bin:/home/lqs/bin)
Hive Session ID = 86edfbff-6711-4ea8-b5b0-c27900d604bc
usage: hive
-d,--define <key=value> Variable substitution to apply to Hive
commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to Hive
commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the
console)
2、在hive命令行里创建一个表student,并插入1条数据
hive> use default
hive> insert into table user1 values(666,"法外狂徒");
Query ID = lqs_20211220115945_ccf3809c-d9a5-440b-8916-3b76f4d2d659
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1639965104167_0001, Tracking URL = http://bdc113:8088/proxy/application_1639965104167_0001/
Kill Command = /home/lqs/module/hadoop-3.1.3/bin/mapred job -kill job_1639965104167_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-12-20 12:00:01,118 Stage-1 map = 0%, reduce = 0%
2021-12-20 12:00:09,606 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 5.77 sec
2021-12-20 12:00:25,035 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 24.43 sec
MapReduce Total cumulative CPU time: 24 seconds 430 msec
Ended Job = job_1639965104167_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://bdc112:8020/user/hive/warehouse/user1/.hive-staging_hive_2021-12-20_11-59-45_450_3726745333681390326-1/-ext-10000
Loading data to table default.user1
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 24.43 sec HDFS Read: 15211 HDFS Write: 253 SUCCESS
Total MapReduce CPU Time Spent: 24 seconds 430 msec
OK
Time taken: 43.157 seconds
3、“-e”不进入hive的交互窗口执行sql语句
[lqs@bdc112 hive]$ bin/hive -e "select * from user1;"
4、“-f”执行脚本中sql语句
【1】在/home/lqs/module/hive/下创建data目录,并在data目录下创建hivetest.sql文件
[lqs@bdc112 hive]$ vim data/hivetest.sql
在hivetest.sql文件中写入一下内容:
select * from user1;
【2】执行文件中的sql语句
[lqs@bdc112 hive]$ bin/hive -f /home/lqs/module/hive/data/hivetest.sql
【3】执行文件中的sql语句并将结果写入文件中
[lqs@bdc112 hive]$ bin/hive -f /home/lqs/module/hive/data/hivetest.sql > /home/lqs/module/hive/data/hivetest_result.txt
Hive其他命令操作
1、退出hive窗口
#新版的hive中没区别了,在以前的版本是有的:
#exit:先隐性提交数据,再退出;
#quit:不提交数据,退出;
hive>exit;
hive>quit;
2、在hive cli命令窗口中如何查看hdfs文件系统
hive> dfs -ls /;
Found 2 items
drwxrwx--- - lqs supergroup 0 2021-12-20 11:59 /tmp
drwxr-xr-x - lqs supergroup 0 2021-12-20 11:59 /user
3、查看在hive中输入的所有历史命令
#记住要进入当前自己用户目录里面
[lqs@bdc112 hive]$ cd /home/lqs/
[lqs@bdc112 ~]$ cat .hivehistory
use default
;
create table user(id int,name string);
create table user1(id int,name string);
insert into table user1 values(666,"法外狂徒");
exit
;
dfs -ls /;
exit
;
4、查看在beeline中输入的所有历史命令
#记住要进入当前自己用户目录里面
[lqs@bdc112 hive]$ cd /home/lqs/
[lqs@bdc112 ~]$ cat .beeline/history
Hive常见属性配置(修改配置需要先关两个hive服务)
Hive运行日志信息配置
注意:改之前请一定停掉您所有的hive进程!!!
1、Hive的log默认存放在/tmp/lqs/hive.log目录下(当前用户名下)
2、修改hive的log存放日志到/home/lqs/module/hive/logs
修改步骤:
setup1:修改$HIVE_HOME/conf/hive-log4j.properties.template文件名称为hive-log4j.properties
[lqs@bdc112 conf]$ pwd
/home/lqs/module/hive/conf
[lqs@bdc112 conf]$ mv hive-log4j2.properties.template hive-log4j2.properties
setup2:在hive-log4j.properties文件中修改log存放位置
[lqs@bdc112 conf]$ vim hive-log4j2.properties
改为后再次启动的效果如下图
Hive启动jvm堆内存设置
新版本的hive启动的时候,默认申请的jvm堆内存大小为256M,jvm堆内存申请的太小,导致后期开启本地模式,执行复杂的sql时经常会报错:java.lang.OutOfMemoryError: Java heap space,因此最好提前调整一下HADOOP_HEAPSIZE这个参数。
1、(1) 修改$HIVE_HOME/conf下的hive-env.sh.template为hive-env.sh
[lqs@bdc112 conf]$ pwd
/home/lqs/module/hive/conf
[lqs@bdc112 conf]$ mv hive-env.sh.template hive-env.sh
2、将hive-env.sh其中的参数 export HADOOP_HEAPSIZE=1024的注释放开。
注意:记得一定要重新启动hive
注意:
如果yarn-site.xml中没有添加虚拟内存校验的配置,记得一定要添加,已经添加了就不用管了。
[lqs@bdc112 conf]$ vim /home/lqs/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml
没有的话输入以下内容:
<!-- 虚拟内存检查,默认打开,修改为关闭 -->
<property>
<description>Whether virtual memory limits will be enforced forcontainers.
是否对容器实施虚拟内存限制。
</description>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
打印当前库和表
在hive-site.xml中加入如下两个配置:
<!-- 打印当前表头 -->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!-- 打印当前库 -->
<property>
<name>hive.cli.print.current.db</name>
<value>true</value>
</property>
参数配置方式
1、查看当前所有的配置信息
hive (default)> set;
2、参数的配置三种方式
【1】配置文件方式
默认配置文件:hive-default.xml
用户自定义配置文件:hive-site.xml
注意:用户自定义配置会覆盖默认配置。另外,Hive也会读入Hadoop的配置,因为Hive是作为Hadoop的客户端启动的,Hive的配置会覆盖Hadoop的配置。配置文件的设定对本机启动的所有Hive进程都有效。
【2】命令行参数方式
启动Hive时,可以在命令行添加-hiveconf param=value来设定参数。
例如:
[lqs@bdc112 hive]$ bin/hive -hiveconf mapreduce.job.reduces=12;
或者
[lqs@bdc112 hive]$ bin/beeline -u jdbc:hive2://bdc112:10000 -n lqs -hiveconf mapreduce.job.reduces=12;
注意:仅对本次hive启动有效
查看参数设置:
hive (default)> set mapreduce.job.reduces;
【3】参数声明方式
可以在HQL中使用SET关键字设定参数
例如:
hive (default)> set mapreduce.job.reduces=12;
注意:仅对本次hive启动有效。
查看参数设置
hive (default)> set mapreduce.job.reduces;
注意:
上述三种设定方式的优先级依次递增。即配置文件<命令行参数<参数声明。注意某些系统级的参数,例如log4j相关的设定,必须用前两种方式设定,因为那些参数的读取在会话建立以前已经完成了。