hadoop完全分布式搭建
一、前期准备 新建一台虚拟机
1、基础配置
配置网络(静态ip)
vi /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=static
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=e03b9c84-1624-432a-935a-c8c67420d34b
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.1.10
NETMASK=255.255.255.0
GATEWAY=192.168.1.2
DNS1=144.144.144.144
DNS2=8.8.8.8
重启网络使网络生效
systemctl restart network
关闭防火墙
systemctl stop firewalld
查看防火墙状态
systemctl status firewalld
关闭selinux
vim /etc/selinux/config
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1fCzRYui-1633676973350)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007204417782.png)]
创建hadoop用户
2、克隆三个虚拟机(2_master、2_slave1、2_slave2)
关机克隆
更改三台虚拟机的静态ip
修改主机名
hostnamectl set-hostname master
hostnamectl set-hostname slave1
hostnamectl set-hostname slave2
添加主机名的映射
vim /etc/hosts
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-iSb0I0Ha-1633676973352)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007211918746.png)]
三台主机互相免密登录
[root@master ~]# ssh-keygen -t rsa -P ''
[root@master ~]# ssh-copy-id -i .ssh/id_rsa.pub root@master
[root@master ~]# ssh-copy-id -i .ssh/id_rsa.pub root@slave1
[root@master ~]# ssh-copy-id -i .ssh/id_rsa.pub root@slave2
查看密钥
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-keuKnaRg-1633676973353)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007212851393.png)]
二、配置hadoop
1、安装Hadoop
解压hadoop压缩包和jdk压缩包
[root@master ~]# tar zxvf hadoop-2.7.1.tar.gz -C /usr/local/src
[root@master ~]# tar zxvf jdk-8u231-linux-x64.tar.gz -C /usr/local/src
修改Hadoop和jdk所在文件夹名
[root@master ~]# mv /usr/local/src/hadoop-2.7.1/ /usr/local/src/hadoop
[root@master ~]# mv /usr/local/src/jdk1.8.0_231/ /usr/local/src/jdk1.8
配置hadoop和jdk的环境变量
[root@master ~]# vim /etc/profile
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-SNPky7rK-1633676973354)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007220710475.png)]
使环境变量生效
[root@master hadoop]# source /etc/profile
2、修改配置文件
配置hadoop-env.sh文件,目的是告诉Hadoop系统jdk的安装目录
[root@master hadoop]# vim etc/hadoop/hadoop-env.sh
25行注释掉
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-biNtpykK-1633676973355)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007221801459.png)]
切换路径更改配置文件/usr/local/src/hadoop/etc/hadoop
[root@master hadoop]# vim hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
[root@master hadoop]# vim core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.1.10:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/src/hadoop/tmp</value>
</property>
</configuration>
在“/usr/local/src/hadoop/etc/hadoop”目录下有一个mapred-site.xml.template,需要修改文件名成,把它重命名为mapred-site.xml然后修改mapred-site.xml文件配置
[root@master hadoop]# cd /usr/local/src/hadoop/etc/hadoop/
[root@master hadoop]# cp mapred-site.xml.template mapred-site.xml
[root@master hadoop]# vim mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
[root@master hadoop]# vim yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
3、hadoop其他相关配置文件
修改masters配置文件
[root@master hadoop]# vim masters
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-INj44DfN-1633676973356)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007230517211.png)]
修改slaves配置文件
[root@master hadoop]# vim slaves
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JmVFCdax-1633676973356)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007230738545.png)]
新建目录
[root@master hadoop]# mkdir /usr/local/src/hadoop/tmp
[root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/name -p
[root@master hadoop]# mkdir /usr/local/src/hadoop/dfs/data -p
同步配置文件到slave 节点
hadoop用户之间相互免密
[root@master hadoop]# su hadoop
[hadoop@master hadoop]$ cd
[hadoop@master ~]$ ssh-keygen -t rsa -P ''
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@master
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave1
[hadoop@master ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@slave2
登出Hadoop用户
[hadoop@master ~]$ exit
(1)将master上的Hadoop安装文件同步到slave1、slave2
[root@master ~]# scp -r /usr/local/src/ root@slave1:/usr/local/
[root@master ~]# scp -r /usr/local/src/ root@slave2:/usr/local/
(2)在每个slave节点上配置Hadoop和jdk的环境变量
[root@slave1 ~]# vim /etc/profile
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PYLUemt7-1633676973357)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211007234833935.png)]
(3)是每个slave节点上的环境变量生效
[root@slave1 ~]# source /etc/profile
(4)在每个master 和slave节点上修改/usr/local/src/hadoop目录的权限
[root@slave1 ~]# chown -R hadoop:hadoop /usr/local/src/hadoop/
(5)在每个slave节点上切换用户到hadoop用户
[root@slave1 ~]# su - hadoop
三、Hadoop集群运行
配置Hadoop格式化
[hadoop@master hadoop]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ bin/hdfs namenode -format
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UIGx5wzn-1633676973358)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211008003101036.png)]
status 0 表示成功
启动NameNode、DataNode、SecondaryNameNode进程
[hadoop@master hadoop]$ start-all.sh
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9CYQhgO6-1633676973358)(C:\Users\Lenovo\AppData\Roaming\Typora\typora-user-images\image-20211008003425386.png)]
ll
[root@slave1 ~]# su - hadoop
## 三、Hadoop集群运行
配置Hadoop格式化
```shell
[hadoop@master hadoop]$ cd /usr/local/src/hadoop/
[hadoop@master hadoop]$ bin/hdfs namenode -format
[外链图片转存中…(img-UIGx5wzn-1633676973358)]
status 0 表示成功
启动NameNode、DataNode、SecondaryNameNode进程
[hadoop@master hadoop]$ start-all.sh
[外链图片转存中…(img-9CYQhgO6-1633676973358)]
dfs.namenode.name.dir file:/usr/local/src/hadoop/dfs/name <property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/local/src/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>