安装 Hadoop 及MySQL（MariaDB）需要配置前置环境（hive的配置）

最新推荐文章于 2024-07-17 09:12:06 发布

uyi31

最新推荐文章于 2024-07-17 09:12:06 发布

阅读量1.6k

点赞数

文章标签： hive hadoop mysql

本文链接：https://blog.csdn.net/LawrenceUyi31/article/details/127866544

版权

任务描述:

本环节需要使用 root 用户完成相关配置，已安装 Hadoop 及MySQL（MariaDB）需要配置前置环境，具体部署要求如下：

1. 解压Hive安装包到“/opt/”路径
2. 设置Hive环境变量
3. 新建并配置hive-site.xml文件，实现Hive元数据存储位置为MySQL（MariaDB）数
据库
4. 初始化Hive元数据（将MySQL JDBC驱动拷贝到Hive安装目录的lib下）
5. 启动Hive，检查是否安装成功
6. 创建任意数据库及表

答案：

步骤一：MariaDB安装

MariaDB安装时需要当前电脑能够连接外网。
使用下面的指令进行MariaDB安装：
如果在安装过程中出现了询问信息，直接输入 y 即可。
修改MariaDB的字符集。打开配置文件 /etc/my.cnf，然后添加 character-setserver=utf8。如下所示
从配置文件可以看出，数据存放目录为 /var/lib/mysql。
通过下面的指令启动MariaDB服务，并设置开机启动。
yum -y install mariadb*
vim /etc/my.cnf
通过下面的指令会对 MariaDB 进行初始化设置。
初始化时需要注意如下几个地方：

启动Mariadb服务

[root@hadoop mysql]# systemctl start mariadb.service

设置mariadb服务开机启动

[root@hadoop mysql]# systemctl enable mariadb.service
Created symlink from /etc/systemd/system/multiuser.target.wants/mariadb.service to
/usr/lib/systemd/system/mariadb.service.
mysql_secure_installation

MariaDB初始化成功之后，就可以登录了。MariaDB是MySQL的一个分支，所以可以直接
MySQL的大部分指令，登录也和MySQL指令相同。
为了便于之后Hive使用MariaDB存储元数据，提前为Hive配置权限。登录MariaDB之后，使
用下面的指令为 root用户在 hadoop主机上可以使用 root密码登录。
配置完成之后，刷新权限信息，

然后查看user表中的 host、user、password的内容：
[root@hadoop ~]# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 10
Server version: 5.5.68-MariaDB MariaDB Server
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input
statement.
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)
grant all on *.* to root@'hadoop' identified by 'root';
flush privileges;
select host,user,password from mysql.user;
步骤二：Hive安装
利用 xftp 将hive的安装包上传到 Linux的 /opt/ 目录下。
使用下面的指令解压Hive的安装apache-hive-2.3.5-bin.tar.gz:
对hive的解压目录进行重命名：重命名为 hive-2.3.5
[root@hadoop opt]# tar -zxf apache-hive-2/3/5-bin.tar.gz -C /opt/
[root@hadoop opt]# ls
apache-hive-2.3.5-bin hadoop-2.7.3 jdk1.8.0_11
为Hive添加环境变量。
打开自定义环境变量配置文件 /etc/profile.d/my_env.sh :
将自定义配置文件文件修改为如下所示：
使用配置文件立即生效：
步骤三：Hive配置
Hive的配置文件所在目录为 $HIVE_HOME/conf 目录中。先进入该目录：
打开配置文件：
[root@hadoop opt]# mv apache-hive-2.3.5-bin/ hive-2.3.5
[root@hadoop opt]# ls
hadoop-2.7.3 hive-2.3.5 jdk1.8.0_11
vim /etc/profile.d/my_env.sh

配置jdk环境变量

export JAVA_HOME=/opt/jdk1.8.0_11

配置Hadoop环境变量

export HADOOP_HOME=/opt/hadoop-2.7.3

配置Hive环境变量

export HIVE_HOME=/opt/hive-2.3.5

拼接系统 PATH 路径

export
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_H
OME/bin
source /etc/profile
cd /opt/hive-2.3.5/conf/
vim hive-env.sh

在文件中添加如下所示的内容：
保存退出。
打开配置文件：vim hive-site.xml在文件中添加如下所示的内容：

export HADOOP_HOME=/opt/hadoop-2.7.3
export JAVA_HOME=/opt/jdk1.8.0_11
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<!--指定数据库的驱动、Driver、用户名和密码-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop:3306/hive?
createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<!--hive默认工作目录-->
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
</property>
<!-- 元数据存储授权 -->
<property>
<name>hive.metastore.event.db.notification.api.auth</name>
<value>false</value>

步骤四：导入驱动包

将MySQL连接驱动包导入 ${HIVE_HOME}/lib 目录中。因为在 hive-site.xml 文件中指定的
Driver是MySQL8.x 版本的Driver，所以这里导入MySQL 8.x版本的jar。
这里使用 xftp 将 mysql-connector-java-8.0.15.jar 导入到 /opt/modules/hive3.1.3/lib 目录中即可。

步骤五：解决SLF4J重复问题
在 ${HIVE_HOME}/lib/ 目录中的日志文件和Hadoop中的日志文件因为版本不一致文件，在
启动、执行hive的时候，会看到如下的信息：

</property>
<!--hive元素据存储版本的验证-->
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<!--查询表中数据的时候显示字段名称，默认会显示 表名.字段名称-->
<property>
<name>hive.cli.print.header</name>
<value>true</value>
</property>
<!-- 将字段名称前面的表名去除 -->
<property>
<name>hive.resultset.use.unique.column.names</name>
<value>false</value>
</property>
</configuration>
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/hive-2.3.5/lib/log4j-slf4jimpl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop2.7.3/share/hadoop/common/lib/slf4j-reload4j1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type
[org.apache.logging.slf4j.Log4jLoggerFactory]

这些不影响hive的正常运行。但是如果不想让它出现这些信息的话，可以执行下面的指令来解决该问题：

步骤六：初始化Hive元数据库
在第一次启动hive之前，需要初始化Hive元数据库：
执行完毕之后显示内容如下所示：
此时可以登录 mariaDB，利用 show databases;可以查看，发现自动新建了一个
hive 数据库
在hive数据库中保存了许多用来保存hive元数据的表信息。

rm -f /opt/modules/hive-2.3.5/lib/log4j-slf4j-impl-2.17.1.jar
schematool -dbType mysql -initSchema
Metastore connection URL: jdbc:mysql://hadoop:3306/hive?
createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql
Initialization script completed
schemaTool completed
MariaDB [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

步骤七：启动Hive

通过 hive 指令来启动hive的metastore服务和打开一个客户端：
此时打开一个新的命令行窗口，然后通过 jps 查看，可以发现有如下所示的进程内容：
此时表示hive启动成功。
要暂停hive，只需要 Ctrl + C 从当前窗口中退出即可。

[root@hadoop ~]# hive
which: no hbase in
(/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/opt/jd
k1.8.0_11/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop2.7.3/sbin:/opt/jdk1.8.0_11/bin:/opt/hadoop-2.7.3/bin:/opt/hadoop2.7.3/sbin:/opt/hive-2.3.5/bin)
Hive Session ID = 6b78f086-50be-44d0-a378-4a9259c85a49
Logging initialized using configuration in jar:file:/opt/hive2.3.5/lib/hive-common-2.3.5.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the
future versions. Consider using a different execution engine (i.e.
spark, tez) or using Hive 1.X releases.
Hive Session ID = 845cff54-eff5-445b-8e6a-f218f774640c
hive>

HDFS进程

NameNode
DataNode
SecondaryNameNode

Yarn进程

NodeManager
ResourceManager

历史服务进程

JobHistoryServer

Java进程

Jps

hive进程

RunJar

步骤八：Hive测试

在Linux系统中准备 /root/stu.txt 文件，文件内容如下所示:
然后在hive创建创建测试数据库 test：create database test;创建test数据库，show
databases;查询数据库。
在test数据库下创建stu表：
查看新建的test.stu表：show tables in test;查询test数据库中的表，desc test.stu:查
询stu表的结构
1,张三,男,90
2,李四,男,99
3,王五,女,89
4,赵六,男,90
5,田七,女,88

hive> create database test;
OK
Time taken: 0.596 seconds
hive> show databases;
OK
default
test
Time taken: 0.273 seconds, Fetched: 2 row(s)
create table test.stu(
id int,
name string,
sex string,
score int
)row format delimited fields terminated by ',';

将 /root/stu.txt 文件中的数据导入 stu表:load data local inpath ‘/root/stu.txt’
into table test.stu;
查询表中的数据：select * from test.stu;
利用hive进行数据的计算，以查询表中男女生各有多少人。

select sex, count(1) from
test.stu group by sex;
hive> show tables in test;
OK
stu
Time taken: 0.052 seconds, Fetched: 1 row(s)
hive> desc test.stu;
OK
id int
name string
sex string
score int
Time taken: 0.146 seconds, Fetched: 4 row(s)
hive> load data local inpath '/root/stu.txt' into table test.stu;
Loading data to table test.stu
OK
Time taken: 2.101 seconds
hive> select * from test.stu;
OK
id name sex score
1 张三 男 90
2 李四 男 99
3 王五 女 89
4 赵六 男 90
5 田七 女 88
Time taken: 2.803 seconds, Fetched: 5 row(s)
hive> select sex, count(1) from test.stu group by sex;
Query ID = root_20220607014915_f18ca704-4ea2-4c93-b1d4-f34dfe45647e
Total jobs = 1
Launching Job 1 out of 1

可以发现，hive将输入的HQL（hive SQL）语句转换为MapReduce程序在执行。

Number of reduce tasks not specified. Estimated from input data
size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1654537721645_0001, Tracking URL =
http://hadoop:8088/proxy/application_1654537721645_0001/
Kill Command = /opt/hadoop-2.7.3/bin/mapred job -kill
job_1654537721645_0001
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 1
2022-06-07 01:49:42,087 Stage-1 map = 0%, reduce = 0%
2022-06-07 01:49:56,202 Stage-1 map = 100%, reduce = 0%, Cumulative
CPU 2.46 sec
2022-06-07 01:50:05,847 Stage-1 map = 100%, reduce = 100%,
Cumulative CPU 4.33 sec
MapReduce Total cumulative CPU time: 4 seconds 330 msec
Ended Job = job_1654537721645_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.33 sec HDFS
Read: 12944 HDFS Write: 129 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 330 msec
OK
sex _c1
女 2
男 3
Time taken: 53.638 seconds, Fetched: 2 row(s)