目录
安装mysql
sudo apt-get update
sudo apt-get install mysql-server
查看初始密码,如下所示:
sudo cat /etc/mysql/debian.cnf
[client]
host = localhost
user = debian-sys-maint
password = 密码内容
socket = /var/run/mysqld/mysqld.sock
[mysql_upgrade]
host = localhost
user = debian-sys-maint
password = 密码内容
socket = /var/run/mysqld/mysqld.sock
用初始密码登录mysql (用上一步的默认密码登录
sudo mysql -u debian-sys-maint -p
修改用户名密码
mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> Alter user 'root'@'localhost' IDENTIFIED BY 'root';
Query OK, 0 rows affected (0.02 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
exit退出后用新用户名密码登录
sudo mysql -u root -p
创建新用户hive并授权
mysql> create user 'hive'@'localhost' identified by 'hive';
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.01 sec)
mysql> grant all privileges on *.* to 'hive'@'localhost' with grant option;
Query OK, 0 rows affected (0.01 sec)
mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)
退出数据库,使用hive用户登录,并创建数据库表
mysql> create database hive default character set utf8 collate utf8_general_ci;
查看hive数据库
mysql> show databases like 'hive';
+------------------+
| Database (hive) |
+------------------+
| hive |
+------------------+
1 row in set (0.00 sec)
安装配置hive
解压hive文件,并修改权限(目录可更换),注意分条执行
sudo tar -zxvf apache-hive-3.1.3-bin.tar.gz -C /usr/local
cd /usr/local
sudo mv apache-hive-3.1.3-bin/ hive/
sudo chgrp -R root ./hive
sudo chown -R root ./hive
sudo chmod -R 755 ./hive
配置hive环境变量
sudo vim ~/.bashrc
# hive
export HIVE_HOME=/opt/apache-hive-bin
export PATH=${PATH}:${HIVE_HOME}/bin
配置conf文件
使用副本 and 新建文件
cd /usr/local/hive/conf
sudo cp ./hive-default.xml.template ./hive-default.xml
sudo cp ./hive-env.sh.template ./hive-env.sh
sudo cp ./hive-log4j2.properties.template ./hive-log4j2.properties
sudo touch ./hive-site.xml
修改hive-env.xml
sudo vim ./hive-env.xml
# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop
# hadoop path
export HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=
export HIVE_CONF_DIR=/usr/local/hive/conf
# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib
修改hive-log4j2.properties,配置hive的log
sudo vim ./hive-log4j2.properties
配置参数如下:
property.hive.log.dir=/usr/local/hive/logs
在/usr/local/hive 目录下新建目录 tmp ,并在其中创建目录hive
cd /usr/local/hive
sudo mkdir tmp
sudo mkdir tmp/hive
修改hive-site.xml文件,参考其他博客,初次配置可用如下配置
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
</configuration>
执行前配置
在hdfs中创建hive相关文件路径
hdfs dfs -mkdir -p /user/hive/warehouse
可以用如下命令进行查询验证
hdfs dfs -ls -R / | grep 'hive'
下载mysql-connector-java,并将其jar包复制到hive lib 文件夹下,下载地址:https://dev.mysql.com/downloads/
# 解压
tar -zxvf mysql-connector-java_8.0.30.tar.gz -C ./
# 进入目录
cd mysql-connector-java_8.0.30/
# 将jar包复制到hive lib 文件夹
sudo cp ./mysql-connector-java-8.0.30.jar /usr/local/hive/lib/
# 进入 hive lib 目录
cd /usr/local/hive/lib/
由于hadoop和hive中存在guava的版本冲突,因此需要替换低版本的jar包
# 查看hive的guava版本
ls -al | grep 'guava'
# 查看hadoop的guava版本
ls -al /usr/local/hadoop/share/hadoop/common/lib/*.jar | grep 'guava'
# 复制
sudo cp /usr/local/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./
# 删除低版本
sudo rm -rf guava-19.0.jar
同上删除hive中log4j-slf4j 的jar包
sudo mv log4j-slf4j-impl-2.17.1.jar log4j-slf4j-impl-2.17.1.jar-bck
初始化hiveSchema
(踩坑点1:报错Could not create directory /usr/local/hive/logs,解决方案见最后)
sudo /usr/local/hive/bin/schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/hive/lib/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8
Metastore Connection Driver : com.mysql.cj.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql
执行hive
:~$ hive
Hive Session ID = f7d8e77c-ceb4-439f-8942-7149974dd882
Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
创建库表并插入数据
hive> insert overwrite table hive_table select cast(split(t2.l, '[,]')[0] as int) as id, split(t2.l, '[,]')[1] as name, split(t2.l, '[,]')[2] as ver, split(t2.l, '[,]')[3] as package, split(t2.l, '[,]')[4] as path from (select '1,hadoop,3.3.0,hadoop-3.3.0.tar.gz,/opt/hadoop:2,hive,3.2.1,apache-hive-3.1.2-bin.tar.gz,/opt/apache-hive-bin:3,mysql,8.0.20,mysql-server,/usr/local/mysql' as l) t1 lateral view explode(split(t1.l, '[:]')) t2 as l where coalesce(t2.l, '') <> '';
Query ID = biqiangwang_20220921231025_a2d296c2-b74f-41f1-b01f-2fa771d915ea
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1663767802943_0001, Tracking URL = http://master:8088/proxy/application_1663767802943_0001/
Kill Command = /usr/local/hadoop/bin/mapred job -kill job_1663767802943_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2022-09-21 23:13:40,683 Stage-1 map = 0%, reduce = 0%
2022-09-21 23:14:28,980 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.73 sec
2022-09-21 23:15:29,204 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.73 sec
2022-09-21 23:16:29,379 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.73 sec
2022-09-21 23:17:30,281 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.73 sec
2022-09-21 23:18:30,455 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 8.73 sec
2022-09-21 23:19:10,298 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 12.7 sec
2022-09-21 23:19:25,624 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 20.23 sec
MapReduce Total cumulative CPU time: 20 seconds 230 msec
Ended Job = job_1663767802943_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to directory hdfs://master:9000/user/hive/warehouse/hive.db/hive_table/.hive-staging_hive_2022-09-21_23-10-25_538_3474724099283121188-1/-ext-10000
Loading data to table hive.hive_table
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 20.23 sec HDFS Read: 22172 HDFS Write: 1363 SUCCESS
Total MapReduce CPU Time Spent: 20 seconds 230 msec
OK
Time taken: 547.47 seconds
有点慢。。。。可能是因为分配内存太少,条件好的可以适当多分配内存。
查看hive插入的数据
:~$ hdfs dfs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x - username supergroup 0 2022-09-21 23:09 /user/hive/warehouse/hive.db
踩坑点1:报错Could not create directory /usr/local/hive/logs
可能出错的原因有很多:配置文件错误,用户名不匹配等等
本次出错原因在于在解压hive时将文件分配给root角色,使用chomd命令修改目录所属角色即可解决。
# root角色
su root
# 修改文件权限
chown -R username fullpath