第11章 Hive:SQL on Hadoop
11.2 Hive安装与配置
11.2.1 安装元数据库
Hive的元数据和数据是分开存放的,数据存放在HDFS上,而元数据库默认是存储在Hive自带的Derby数据库。由于Derby只支持同时一个用户访问Hive,所以我们将换成MySQL/MariaDB作为Hive元数据库。
(1)安装数据库
CentOS 6.x系统可以安装MySQL,CentOS 7.x系统可以直接安装MariaDB
[root@node3 ~]# yum install -y mariadb mariadb-server
Loaded plugins: fastestmirror
base | 3.6 kB 00:00:00
extras | 3.4 kB 00:00:00
updates | 3.4 kB 00:00:00
updates/7/x86_64/primary_db | 7.8 MB 00:00:07
Determining fastest mirrors
* base: mirrors.aliyun.com
* extras: mirrors.aliyun.com
* updates: mirrors.aliyun.com
Resolving Dependencies
--> Running transaction check
---> Package mariadb.x86_64 1:5.5.52-1.el7 will be installed
---> Package mariadb-server.x86_64 1:5.5.52-1.el7 will be installed
--> Processing Dependency: perl-DBI for package: 1:mariadb-server-5.5.52-1.el7.x86_64
--> Processing Dependency: perl-DBD-MySQL for package: 1:mariadb-server-5.5.52-1.el7.x86_64
--> Processing Dependency: perl(Data::Dumper) for package: 1:mariadb-server-5.5.52-1.el7.x86_64
--> Processing Dependency: perl(DBI) for package: 1:mariadb-server-5.5.52-1.el7.x86_64
--> Running transaction check
---> Package perl-DBD-MySQL.x86_64 0:4.023-5.el7 will be installed
---> Package perl-DBI.x86_64 0:1.627-4.el7 will be installed
--> Processing Dependency: perl(RPC::PlServer) >= 0.2001 for package: perl-DBI-1.627-4.el7.x86_64
--> Processing Dependency: perl(RPC::PlClient) >= 0.2000 for package: perl-DBI-1.627-4.el7.x86_64
---> Package perl-Data-Dumper.x86_64 0:2.145-3.el7 will be installed
--> Running transaction check
---> Package perl-PlRPC.noarch 0:0.2020-14.el7 will be installed
--> Processing Dependency: perl(Net::Daemon) >= 0.13 for package: perl-PlRPC-0.2020-14.el7.noarch
--> Processing Dependency: perl(Net::Daemon::Test) for package: perl-PlRPC-0.2020-14.el7.noarch
--> Processing Dependency: perl(Net::Daemon::Log) for package: perl-PlRPC-0.2020-14.el7.noarch
--> Processing Dependency: perl(Compress::Zlib) for package: perl-PlRPC-0.2020-14.el7.noarch
--> Running transaction check
---> Package perl-IO-Compress.noarch 0:2.061-2.el7 will be installed
--> Processing Dependency: perl(Compress::Raw::Zlib) >= 2.061 for package: perl-IO-Compress-2.061-2.el7.noarch
--> Processing Dependency: perl(Compress::Raw::Bzip2) >= 2.061 for package: perl-IO-Compress-2.061-2.el7.noarch
---> Package perl-Net-Daemon.noarch 0:0.48-5.el7 will be installed
--> Running transaction check
---> Package perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7 will be installed
---> Package perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
=============================================================================================================================================================================================
Package Arch Version Repository Size
=============================================================================================================================================================================================
Installing:
mariadb x86_64 1:5.5.52-1.el7 base 8.7 M
mariadb-server x86_64 1:5.5.52-1.el7 base 11 M
Installing for dependencies:
perl-Compress-Raw-Bzip2 x86_64 2.061-3.el7 base 32 k
perl-Compress-Raw-Zlib x86_64 1:2.061-4.el7 base 57 k
perl-DBD-MySQL x86_64 4.023-5.el7 base 140 k
perl-DBI x86_64 1.627-4.el7 base 802 k
perl-Data-Dumper x86_64 2.145-3.el7 base 47 k
perl-IO-Compress noarch 2.061-2.el7 base 260 k
perl-Net-Daemon noarch 0.48-5.el7 base 51 k
perl-PlRPC noarch 0.2020-14.el7 base 36 k
Transaction Summary
=============================================================================================================================================================================================
Install 2 Packages (+8 Dependent packages)
Total download size: 21 M
Installed size: 107 M
Downloading packages:
(1/10): perl-Compress-Raw-Bzip2-2.061-3.el7.x86_64.rpm | 32 kB 00:00:00
(2/10): perl-Compress-Raw-Zlib-2.061-4.el7.x86_64.rpm | 57 kB 00:00:00
(3/10): perl-DBD-MySQL-4.023-5.el7.x86_64.rpm | 140 kB 00:00:00
(4/10): perl-DBI-1.627-4.el7.x86_64.rpm | 802 kB 00:00:00
(5/10): perl-Data-Dumper-2.145-3.el7.x86_64.rpm | 47 kB 00:00:00
(6/10): perl-IO-Compress-2.061-2.el7.noarch.rpm | 260 kB 00:00:00
(7/10): perl-Net-Daemon-0.48-5.el7.noarch.rpm | 51 kB 00:00:00
(8/10): perl-PlRPC-0.2020-14.el7.noarch.rpm | 36 kB 00:00:00
(9/10): mariadb-5.5.52-1.el7.x86_64.rpm | 8.7 MB 00:00:09
mariadb-server-5.5.52-1.el7.x8 FAILED
http://mirrors.aliyuncs.com/centos/7/os/x86_64/Packages/mariadb-server-5.5.52-1.el7.x86_64.rpm: [Errno 14] curl#7 - "Failed connect to mirrors.aliyuncs.com:80; Connection refused"00:10 ETA
Trying other mirror.
(10/10): mariadb-server-5.5.52-1.el7.x86_64.rpm | 11 MB 00:00:11
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 662 kB/s | 21 MB 00:00:32
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : perl-Data-Dumper-2.145-3.el7.x86_64 1/10
Installing : perl-Net-Daemon-0.48-5.el7.noarch 2/10
Installing : 1:perl-Compress-Raw-Zlib-2.061-4.el7.x86_64 3/10
Installing : 1:mariadb-5.5.52-1.el7.x86_64 4/10
Installing : perl-Compress-Raw-Bzip2-2.061-3.el7.x86_64 5/10
Installing : perl-IO-Compress-2.061-2.el7.noarch 6/10
Installing : perl-PlRPC-0.2020-14.el7.noarch 7/10
Installing : perl-DBI-1.627-4.el7.x86_64 8/10
Installing : perl-DBD-MySQL-4.023-5.el7.x86_64 9/10
Installing : 1:mariadb-server-5.5.52-1.el7.x86_64 10/10
Verifying : perl-Compress-Raw-Bzip2-2.061-3.el7.x86_64 1/10
Verifying : 1:mariadb-5.5.52-1.el7.x86_64 2/10
Verifying : perl-Data-Dumper-2.145-3.el7.x86_64 3/10
Verifying : 1:mariadb-server-5.5.52-1.el7.x86_64 4/10
Verifying : perl-PlRPC-0.2020-14.el7.noarch 5/10
Verifying : 1:perl-Compress-Raw-Zlib-2.061-4.el7.x86_64 6/10
Verifying : perl-Net-Daemon-0.48-5.el7.noarch 7/10
Verifying : perl-DBI-1.627-4.el7.x86_64 8/10
Verifying : perl-IO-Compress-2.061-2.el7.noarch 9/10
Verifying : perl-DBD-MySQL-4.023-5.el7.x86_64 10/10
Installed:
mariadb.x86_64 1:5.5.52-1.el7 mariadb-server.x86_64 1:5.5.52-1.el7
Dependency Installed:
perl-Compress-Raw-Bzip2.x86_64 0:2.061-3.el7 perl-Compress-Raw-Zlib.x86_64 1:2.061-4.el7 perl-DBD-MySQL.x86_64 0:4.023-5.el7 perl-DBI.x86_64 0:1.627-4.el7
perl-Data-Dumper.x86_64 0:2.145-3.el7 perl-IO-Compress.noarch 0:2.061-2.el7 perl-Net-Daemon.noarch 0:0.48-5.el7 perl-PlRPC.noarch 0:0.2020-14.el7
Complete!
[root@node3 ~]#
(2)数据库配置
启动MariaDB
[root@node3 ~]# systemctl start mariadb
设置开机启动
[root@node3 ~]# systemctl enable mariadb
Created symlink from /etc/systemd/system/multi-user.target.wants/mariadb.service to /usr/lib/systemd/system/mariadb.service.
运行脚本mysql_secure_installation
[root@node3 ~]# mysql_secure_installation
NOTE: RUNNING ALL PARTS OF THIS SCRIPT IS RECOMMENDED FOR ALL MariaDB
SERVERS IN PRODUCTION USE! PLEASE READ EACH STEP CAREFULLY!
In order to log into MariaDB to secure it, we'll need the current
password for the root user. If you've just installed MariaDB, and
you haven't set the root password yet, the password will be blank,
so you should just press enter here.
Enter current password for root (enter for none):
OK, successfully used password, moving on...
Setting the root password ensures that nobody can log into the MariaDB
root user without the proper authorisation.
Set root password? [Y/n] y
New password:
Re-enter new password:
Password updated successfully!
Reloading privilege tables..
... Success!
By default, a MariaDB installation has an anonymous user, allowing anyone
to log into MariaDB without having to have a user account created for
them. This is intended only for testing, and to make the installation
go a bit smoother. You should remove them before moving into a
production environment.
Remove anonymous users? [Y/n] y
... Success!
Normally, root should only be allowed to connect from 'localhost'. This
ensures that someone cannot guess at the root password from the network.
Disallow root login remotely? [Y/n] n
... skipping.
By default, MariaDB comes with a database named 'test' that anyone can
access. This is also intended only for testing, and should be removed
before moving into a production environment.
Remove test database and access to it? [Y/n] n
... skipping.
Reloading the privilege tables will ensure that all changes made so far
will take effect immediately.
Reload privilege tables now? [Y/n] y
... Success!
Cleaning up...
All done! If you've completed all of the above steps, your MariaDB
installation should now be secure.
Thanks for using MariaDB!
[root@node3 ~]#
登录MySQL,查询mysql.user表。如果存在用户名空或密码空的记录,请删除。
[root@node3 ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 7
Server version: 5.5.52-MariaDB MariaDB Server
Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> select Host,User,Password from mysql.user;
+-----------+------+-------------------------------------------+
| Host | User | Password |
+-----------+------+-------------------------------------------+
| localhost | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| node3 | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| 127.0.0.1 | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| ::1 | root | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+-----------+------+-------------------------------------------+
4 rows in set (0.00 sec)
授权,可以远程访问
MariaDB [(none)]> grant all on *.* to root@'%' identified by '123456' with grant option;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> flush privileges;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> quit
Bye
[root@node3 ~]#
11.2.2 下载Hive软件包
Downloads–>Download a release now!
http://www.apache.org/dyn/closer.cgi/hive/
选择stable-2 –> 下载apache-hive-2.1.1-bin.tar.gz
由于Hive是Hadoop的一个客户端,可以独立于集群之外。由于我们的Linux虚拟机只有3台,我们将刚才下载的Hive软件包通过XShell上传到node3节点。
11.2.3 Hive安装与配置
(1)解压缩Hive软件包到/opt目录
[root@node3 ~]# tar -zxvf apache-hive-2.1.1-bin.tar.gz -C /opt
(2)为了与其他组件保持一致,重命名Hive根目录
[root@node3 ~]# cd /opt
[root@node3 opt]# ls
apache-hive-2.1.1-bin data hadoop-2.7.3 hbase-1.2.6 jdk1.8.0_112 zookeeper-3.4.10
[root@node3 opt]# mv apache-hive-2.1.1-bin/ hive-2.1.1
[root@node3 opt]# ll
total 4
drwxr-xr-x 3 root root 25 Jul 12 10:13 data
drwxr-xr-x 10 root root 161 May 14 09:14 hadoop-2.7.3
drwxr-xr-x 8 root root 172 Jul 23 10:37 hbase-1.2.6
drwxr-xr-x 9 root root 171 Aug 6 03:56 hive-2.1.1
drwxr-xr-x. 8 10 143 255 Sep 23 2016 jdk1.8.0_112
drwxr-xr-x 10 root root 4096 Jul 7 10:17 zookeeper-3.4.10
[root@node3 opt]#
(3)设置Hive环境变量
编辑文件
[root@node3 opt]# vi /etc/profile.d/custom.sh
增加下面内容
#Hive path
export HIVE_HOME=/opt/hive-2.1.1
export PATH=$PATH:$HIVE_HOME/bin
source一下
[root@node3 opt]# source /etc/profile.d/custom.sh
(4)将MySQL驱动程序复制到Hive的lib目录下面
MySQL的JDBC驱动程序可以从MySQL官网下载
https://dev.mysql.com/downloads/connector/j/
选择一个熟悉的压缩方式,单击Download按钮
找到“No thanks, just start my download.”超链接,单击该链接即可下载。这里我们右键复制该链接的地址https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.43.tar.gz
下面可以直接在node3节点直接下载JDBC驱动程序了
[root@node3 ~]# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.43.tar.gz
--2017-08-06 04:22:47-- https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.43.tar.gz
Resolving dev.mysql.com (dev.mysql.com)... 137.254.60.11
Connecting to dev.mysql.com (dev.mysql.com)|137.254.60.11|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.43.tar.gz [following]
--2017-08-06 04:22:49-- https://cdn.mysql.com//Downloads/Connector-J/mysql-connector-java-5.1.43.tar.gz
Resolving cdn.mysql.com (cdn.mysql.com)... 184.85.114.192
Connecting to cdn.mysql.com (cdn.mysql.com)|184.85.114.192|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3462610 (3.3M) [application/x-tar-gz]
Saving to: ‘mysql-connector-java-5.1.43.tar.gz’
100%[===================================================================================================================================================>] 3,462,610 19.8KB/s in 2m 39s
2017-08-06 04:25:29 (21.2 KB/s) - ‘mysql-connector-java-5.1.43.tar.gz’ saved [3462610/3462610]
[root@node3 ~]#
然后解压缩
[root@node3 ~]# tar -zxvf mysql-connector-java-5.1.43.tar.gz
[root@node3 ~]# cd mysql-connector-java-5.1.43
[root@node3 mysql-connector-java-5.1.43]# ls
build.xml CHANGES COPYING mysql-connector-java-5.1.43-bin.jar README README.txt src
其中,mysql-connector-java-5.1.43-bin.jar文件就是我们需要jar包,直接将该jar移动到hive的lib目录下即可。
[root@node3 mysql-connector-java-5.1.43]# mv mysql-connector-java-5.1.43-bin.jar /opt/hive-2.1.1/lib/
(5)编辑配置文件
[root@node3 ~]# cd /opt/hive-2.1.1/conf/
[root@node3 conf]# vi hive-site.xml
[root@node3 conf]# cat hive-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<!--本地存储,MySQL和hive安装在同一节点-->
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node3:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<!--mysql数据库用户名-->
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
[root@node3 conf]#
(6)元数据初始化
从官方文档https://cwiki.apache.org/confluence/display/Hive/GettingStarted可知”Starting from Hive 2.1, we need to run the schematool command below as an initialization step.”,也就是说从Hive 2.1开始,我们需要在下面运行schematool命令作为初始化步骤。
[root@node3 ~]# schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://node3:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 2.1.0
Initialization script hive-schema-2.1.0.mysql.sql
Initialization script completed
schemaTool completed
[root@node3 ~]#
登录MySQL/MariaDB即可看到生成的hive数据库
[root@node3 ~]# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 11
Server version: 5.5.52-MariaDB MariaDB Server
Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> use hive;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
MariaDB [hive]> show tables;
+---------------------------+
| Tables_in_hive |
+---------------------------+
| AUX_TABLE |
| BUCKETING_COLS |
| CDS |
| COLUMNS_V2 |
| COMPACTION_QUEUE |
| COMPLETED_COMPACTIONS |
| COMPLETED_TXN_COMPONENTS |
| DATABASE_PARAMS |
| DBS |
| DB_PRIVS |
| DELEGATION_TOKENS |
| FUNCS |
| FUNC_RU |
| GLOBAL_PRIVS |
| HIVE_LOCKS |
| IDXS |
| INDEX_PARAMS |
| KEY_CONSTRAINTS |
| MASTER_KEYS |
| NEXT_COMPACTION_QUEUE_ID |
| NEXT_LOCK_ID |
| NEXT_TXN_ID |
| NOTIFICATION_LOG |
| NOTIFICATION_SEQUENCE |
| NUCLEUS_TABLES |
| PARTITIONS |
| PARTITION_EVENTS |
| PARTITION_KEYS |
| PARTITION_KEY_VALS |
| PARTITION_PARAMS |
| PART_COL_PRIVS |
| PART_COL_STATS |
| PART_PRIVS |
| ROLES |
| ROLE_MAP |
| SDS |
| SD_PARAMS |
| SEQUENCE_TABLE |
| SERDES |
| SERDE_PARAMS |
| SKEWED_COL_NAMES |
| SKEWED_COL_VALUE_LOC_MAP |
| SKEWED_STRING_LIST |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES |
| SORT_COLS |
| TABLE_PARAMS |
| TAB_COL_STATS |
| TBLS |
| TBL_COL_PRIVS |
| TBL_PRIVS |
| TXNS |
| TXN_COMPONENTS |
| TYPES |
| TYPE_FIELDS |
| VERSION |
| WRITE_SET |
+---------------------------+
57 rows in set (0.00 sec)
MariaDB [hive]>