目录
3.1.因为要将Hadoop1里面的Hadoop的配置文件同步到另外两台,所以需要三台之间做免密(配置SSH无密码登录(hadoop1、hadoop2、hadoop3均需要做,一台三次共做9次)
4.格式化(由于先做了伪分布式所以先将hdfs-site.xml里面配的存放Hadoop.tmp的内容删掉三台都要删)重新在hadoop1上格式化(hdfs namenode -format)
搭建好Hadoop伪分布之后改成完全分布式,或者直接搭建完全分布式
1.克隆虚拟机
将Hadoop1关机,右击最上方Hadoop1单机管理->克隆(总共需要三台虚拟机)
将三台虚拟机全部开启,修改主机名分别为hadoop1,hadoop2,hadoop3
2.配置文件
2.1.core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop1:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.1.3/tmp</value>
<description>A base for other temporary directories.</description>
</property>
2.2.hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>3</value>
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.
</description>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop2:50090</value>
<description>
The secondary namenode http server address and port.
</description>
</property>
2.3.yarn-site.xml
<property>
<description>A comma separated list of services where service name should only
contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
2.4mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
2.5.hadoop-env.sh(java的执行路径)
export JAVA_HOME=/opt/module/jdk1.8.0_212
2.6.workers(配置DataNode的节点)
3.分发数据
3.1.因为要将Hadoop1里面的Hadoop的配置文件同步到另外两台,所以需要三台之间做免密(配置SSH无密码登录(hadoop1、hadoop2、hadoop3均需要做,一台三次共做9次)
vim /etc/hosts
给三台虚拟机都做好三台虚拟机的IP加主机名(不知道IP用ifconfig查看):wq退出
给自己做免密
ssh-keygen -t rsa 回车三次
ssh-copy-id 主机名
验证是否可以免密登录
依次输入
ssh hadoop2 exit登出
ssh hadoop3
3.2.分发(总之就是需要三台虚拟机的环境和配置文件保持一致)
因为我这里是先做了伪分布,所以只修改了hdfs-site.xml和yarn-site.xml这里我也只分发这两个文件(具体路径看自己所配的)👇这是分到hadoop2的还有hadoop3的只需要把2改成3
scp /opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml hadoop2:/opt/module/hadoop-3.1.3/etc/hadoop/hdfs-site.xml
如果是直接搭建的Hadoop完全分布式则需要把Java包,Hadoop包以及配好的环境变量都分发过去
例如:
scp -r /opt/module/hadoop-3.1.3/ hadoop2:/opt/module/
4.格式化(由于先做了伪分布式所以先将hdfs-site.xml里面配的存放Hadoop.tmp的内容删掉三台都要删)重新在hadoop1上格式化(hdfs namenode -format)
rm -rf /opt/module/hadoop-3.1.3/tmp/
5.启动
start-all.sh
jps查看进程
6.关闭防火墙(三台虚拟机都要做)
查看状态
systemctl status firewalld
关闭
systemctl status firewalld
永久关闭
systemctl disable firewalld
Hadoop完全分布式结束
start-all.sh :启动全部节点
start-dfs.sh :启动Hadoop的HDFS进程:NameNode、SecondaryNameNode、DataNode
stop-dfs.sh :关闭
start-yarn.sh :启动Hadoop的YARN进程:ResourceManager、NodeMannager