全网最详细的Hadoop大数据集群搭建并进行项目分析(基于完全分布式)---第二部分

Yang三少喜欢撸铁

已于 2022-03-25 17:22:47 修改

阅读量3.5k

点赞数 2

分类专栏： Hadoop集群搭建与运用文章标签： mysql hadoop hive hbase spark

于 2022-03-25 17:18:05 首次发布

本文链接：https://blog.csdn.net/yygabcd/article/details/123739321

版权

Hadoop集群搭建与运用专栏收录该内容

4 篇文章 7 订阅

订阅专栏

##所有需要的资料全部已上传到百度网盘上，请自行下载##

第二部分：大数据集群搭建完全分布式(共分四部分)

第七章、安装配置MySQL

1、卸载Centos7自带mariadb…

#执行命令
rpm -qa|grep mariadb
mariadb-libs-5.5.56-2.el7.x86_64
rpm -e mariadb-libs-5.5.56-2.el7.x86_64 --nodeps

2、创建mysql安装包存放点…

mkdir /export/software/mysql

3、上传mysql-5.7.29安装包到上述文件夹下、解压…

tar xvf mysql-5.7.29-1.el7.x86_64.rpm-bundle.tar

4、执行安装…

yum -y install libaio
yum -y install net-tools 
rpm -ivh mysql-community-common-5.7.29-1.el7.x86_64.rpm mysql-community-libs-5.7.29-1.el7.x86_64.rpm mysql-community-client-5.7.29-1.el7.x86_64.rpm mysql-community-server-5.7.29-1.el7.x86_64.rpm

在这里插入图片描述

5、初始化mysql…

mysqld --initialize

6、更改所属组…

chown mysql:mysql /var/lib/mysql -R

7、启动mysql…

systemctl start mysqld.service

8、查看生成的临时root密码…

cat  /var/log/mysqld.log

9、这行日志的最后就是随机生成的临时密码…

[Note] A temporary password is generated for root@localhost: /JOFe7,c&jj0

10、修改mysql root密码、授权远程访问…

mysql -u root -p
Enter password:     #这里输入在日志中生成的临时密码

11、更新root密码设置为hadoop…

mysql> alter user user() identified by "hadoop";
Query OK, 0 rows affected (0.00 sec)

12、授权…

mysql> use mysql;
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'hadoop' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;

13、mysql的启动和关闭状态查看…

systemctl stop mysqld
systemctl status mysqld
systemctl start mysqld

14、建议设置为开机自启动服务…

systemctl enable  mysqld

15、查看是否已经设置自启动成功…

systemctl list-unit-files | grep mysqld

第八章、安装Hive并配置

1、解压hive文件…

#解压文件
tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server
#移动文件
mv /export/server/apache-hive-3.1.2-bin /export/server/hive

2、解决hadoop、hive之间guava版本差异…

cd /export/server/hive
rm -rf lib/guava-19.0.jar
cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar ./lib/

3、添加mysql jdbc驱动到hive安装包lib/文件下…

mysql-connector-java-5.1.32.jar

获取MySQL的jdbc驱动jar包，mysql-connector-java-5.1.32.jar，提取码：dsj8

4、修改hive环境变量文件添加Hadoop_HOME…

cd /export/server/hive/conf/
mv hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/export/server/hadoop-3.1.4
export HIVE_CONF_DIR=/export/server/hive/conf
export HIVE_AUX_JARS_PATH=/export/server/hive/lib

如下图所示：
在这里插入图片描述

5、新增hive-site.xml 配置mysql等相关信息…

vim hive-site.xml添加以下内容：
<configuration>
    <!-- 存储元数据mysql相关配置 -->
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value> jdbc:mysql://node1:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false&amp;useUnicode=true&amp;characterEncoding=UTF-8</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>hadoop</value>
    </property>
    <!-- H2S运行绑定host -->
    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>node1</value>
    </property>
    <!-- 远程模式部署metastore 服务地址 -->
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node1:9083</value>
    </property>
    <!-- 关闭元数据存储授权  -->
    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    <!-- 关闭元数据存储版本的验证 -->
    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>
</configuration>

6、初始化metadata…

cd /export/server/hive
bin/schematool -initSchema -dbType mysql -verbos

7、在node3上安装配置Hive…

1、进入node3环境下，解压Hive文件
cd /export/software/
tar zxvf apache-hive-3.1.2-bin.tar.gz -C /export/server
mv /export/server/apache-hive-3.1.2-bin /export/server/hive
2、解决hadoop、hive之间guava版本差异:
rm -rf /export/server/hive/lib/guava-19.0.jar
cp /export/server/hadoop-3.1.4/share/hadoop/common/lib/guava-27.0-jre.jar /export/server/hive/lib/

3、添加mysql jdbc驱动到hive安装包lib/文件下:
mysql-connector-java-5.1.32.jar

4、修改hive环境变量文件 添加Hadoop_HOME:
cd /export/server/hive/conf/
mv hive-env.sh.template hive-env.sh
vim hive-env.sh
export HADOOP_HOME=/export/server/hadoop-3.1.4
export HIVE_CONF_DIR=/export/server/hive/conf
export HIVE_AUX_JARS_PATH=/export/server/hive/lib
5、新增hive-site.xml 配置mysql等相关信息:
vim hive-site.xml添加以下内容：
<configuration>
    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://node1:9083</value>
    </property>
</configuration>

8、配置logs文件…

1、在export路径下创建logs文件夹mkdir logs，添加以下内容：
nohup/export/server/hive/bin/hive --service metastore > ./metastore.log 2>&1 &
nohup/export/server/hive/bin/hive--service hiveserver2 > ./hiveserver2.log 2>&1 &
2、配置metastor和hiveserver3并在后台运行，连接beeline不会报错

如图所示：
在这里插入图片描述

9、配置beeline连接报错…

1、beeline连接报错 root is not allowed to impersonate root (state=08S01,code=0)
修改hadoop 配置/export/server/hadoop-3.1.4/etc/hadoop/core-site.xml,添加如下配置项：
<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>
2、将配置好的core文件分发到node2、node3节点上：
scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node2://export/server/hadoop-3.1.4/etc/hadoop
scp -r /export/server/hadoop-3.1.4/etc/hadoop/core-site.xml root@node3://export/server/hadoop-3.1.4/etc/hadoop

10、启动集群、Hive…

start -all.sh
到cd /export/server/hive/bin路径下，输入 ./beeline启动；再输入
! connect jdbc:hive2://node1:10000 再输入root然后一直回车

如下图所示：
在这里插入图片描述

11、Hive命令表操作…

1、查看数据库show databases;
2、创建数据库create database if not exists myhive;
3、进入数据库use myhive;
4、查看该数据库中的表show tables;
5、对应的数据库在hdfs上的路径为 /user/hive/warehouse
6、删除数据库 drop database myhive; 如果有数据就会报错
7、强制删除数据库，包含数据库下面的表一起删除 
drop database myhive2 cascade; 
8、查看表的结构 desc stu1；
9、查看表的内容select * from stu1;
10、向表中插入数据
insert into stu values(1,'zhangsan'); 
insert into stu values(2,'lisi');
create table if not exists stu4(id int ,name string) row format delimited fields terminated by '\t' ;
11、在windows上下载stu4文件并利用rz -E拖到data文件中
12、在HDFS上新建文件hadoop fs -mkdir -p /mytest
13、在data路径下直接上传文件到hdfs上表对应的路径
hadoop fs -put stu4.txt /user/hive/warehouse/mytest.db/stu4/
14、导入数据load data inpath '/hivedatas/stu.txt' into table stu4;

15、创建student表：
create external table student (sid string,sname string,sbirth string , ssex string) row format delimited fields terminated by '\t' location '/hive_table/student';学生表添加数据 ：
load data local inpath '/export/data/student.txt' into table student;

16、创建teacher表：
create external table teacher (tid string,tname string) row format delimited fields terminated by '\t' location '/hive_table/teacher';
老师表添加数据，并覆盖已有数据 ：
load data local inpath '/export/data/teacher.txt' overwrite into table teacher;

17、创建分数表：
 create table score(sid string,cid string, sscore int) partitioned by (month string) row format delimited fields terminated by '\t';
导入数据：load data local inpath '/export/data/score.txt' into table score partition (month='202006');

18、创建分数表2:
create table score2(sid string,cid string, sscore int) partitioned by (year string,month string, day string) 
row format delimited fields terminated by '\t'; 
导入数据：load data local inpath '/export/data/score.txt' into table score2 partition(year='2020',month='06',day='01');

19、查询表命令：
select * from score2 where year = '2020' and month = '06' and day = '01'；
show partitions score;
alter table score add partition(month='202008'); 
alter table score add partition(month='202009') partition(month = '202010');
alter table score drop partition(month = '202010');

20、创建hive_array表：
create external table hive_array(name string, work_locations array<string>) row format delimited fields terminated by '\t’ 
collection items terminated by ','; 
导入数据：load data local inpath '/export/data/array_data.txt' overwrite into table hive_array;
-- 查询loction数组中第一个元素 
select name, work_locations[0] location from hive_array; 
-- 查询location数组中元素的个数 
select name, size(work_locations) location from hive_array;
-- 查询location数组中包含tianjin的信息
select* from hive_array where array_contains(work_locations,'tianjin');

PS：这是本项目的第二部分，剩余的部分烦请移步到本人主页的查找，如有做的不好的地方请多多包涵！

Yang三少喜欢撸铁

关注

2
点赞
踩
6

收藏

觉得还不错? 一键收藏
2
评论
全网最详细的Hadoop大数据集群搭建并进行项目分析(基于完全分布式)---第二部分

##所有需要的资料全部已上传到百度网盘上，请自行下载##获取镜像，https://pan.baidu.com/s/1ho4hMrvIu1V6W4wWdH8nIA，提取码：ygyg获取Xshell，https://pan.baidu.com/s/1xWRle9chuNtBpE0fDa7DHA，提取码：u3s6获取Hadoop，https://pan.baidu.com/s/1a5M23KlUMtqKOoWqDnZBHQ，提取码：y1y3获取jdk，https://pan.baidu.com/s/1ft
复制链接

扫一扫