前言:
本篇文章针对于2020秋季学期的复习操作,一是对该学期的巩固,二是让老师知道他的努力没有白费,同时,在此感谢徐老师对我们的精心教导…
接https://blog.csdn.net/m0_52080234/article/details/112602778的任务2
整体思路
前提:
注意:若原来hadoop伪分布式集群是启动的,一定要关闭。
删除原来Mymaster节点上的hadoop,还有元数据对应的目录文件
/opt/hadoop
/opt/hadoop-repo
/temp/hadoop-root/*
①准备三台虚拟机
方式1:复制
方式2:clone → 使用vmware自带的clone功能来实施 (推荐)
②修改clone之后的虚拟机的关键配置信息:
ip地址,主机名
③免密码登录
Mymaster 上的root用户 → Mymaster,Myslave01,Myslave02
步骤:
大概的思路是→ 先在master节点上配置好hadoop,然后,跨节点copy到另外两台节点上。
集群规模
详解:
在Mymaster节点上配置好hadoop:
- ①JDK,Hadoop环境变量
- ②修改核心的配置文件:
hdfs→core-site.xml,hdfs-site.xml
yarn→yarn-site.xml,mapred-site.xml - ③修改hdfs,yarn的两个核心配置shell脚本(若是已经配置了JAVA_HOME,可以省略)
hadoop-env.sh
yarn-env.sh - ④配置slaves文件,用来定制DataNode守护进程所在的节点
Mymaster
Mysalve01
Myslave02 - ⑤将master节点上配置好的hadoop跨节点拷贝到另外两台虚拟机上
scp -r 源 目的地
如:scp -r /opt/hadoop root@slave01:/opt - ⑥在Mymaster节点上格式化namenode
- ⑦启动集群
- ⑧确认
a)进程数
b)验证hdfs的可用性
c)验证yarn的可用性
实操开始
删除我们伪分布式的相关数据
删除当前用户下的隐藏目录之.ssh下的相应文件
关机
poweroff
我们这里直接克隆两台虚拟机
克隆好了先启动Myslave01一台
由于我们的虚拟机是克隆的所以现在可以在finalShell上用Mymaster的配置连上我们的Myslave01
修改
/etc/hostname
/etc/hosts
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-ens33
windows host文件
我们搭建sentos的时候有详细操作这里就快速带过了
↓Myslave01↓
关机
poweroff
开启我们的Myslave02
同样的道理我们也可以用Mymaster配置连接我们的Myslave02
修改
/etc/hostname
/etc/hosts
/etc/sysconfig/network
/etc/sysconfig/network-scripts/ifcfg-ens33
windows host文件
我们搭建Centos的时候有详细操作这里就快速带过了
关机
编辑虚拟机(这里自己看情况来我的是16内存的,以下是我的配置)
Mymaster
Myslave01
Myslave02
一切准备就绪,开机!!
配置finalShell
并验证三台是否都能连的上
并检查名字有没有变
ok我们这里是变成了我们的Myslave01
修改linux中的映射文件,让集群中的每台虚拟机感知到团队中其他成员的存在
[root@Mymaster ~]# vim /etc/hosts
[root@Mymaster ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.8.201 Mymaster
192.168.8.202 Myslave01
192.168.8.203 Myslave02
[root@Mymaster ~]# scp -r /etc/hosts root@Myslave01:/etc/
The authenticity of host 'myslave01 (192.168.8.202)' can't be established.
ECDSA key fingerprint is SHA256:S18Xnq5jGlaByGMauuqmae8WCIN88kze704KfHa40jY.
ECDSA key fingerprint is MD5:e6:c6:37:60:2d:dd:d3:e5:bd:8d:00:cb:32:38:00:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'myslave01,192.168.8.202' (ECDSA) to the list of known hosts.
root@myslave01's password:
hosts 100% 229 296.7KB/s 00:00
[root@Mymaster ~]# scp -r /etc/hosts root@Myslave02:/etc/
The authenticity of host 'myslave02 (192.168.8.203)' can't be established.
ECDSA key fingerprint is SHA256:S18Xnq5jGlaByGMauuqmae8WCIN88kze704KfHa40jY.
ECDSA key fingerprint is MD5:e6:c6:37:60:2d:dd:d3:e5:bd:8d:00:cb:32:38:00:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'myslave02,192.168.8.203' (ECDSA) to the list of known hosts.
root@myslave02's password:
hosts 100% 229 204.0KB/s 00:00
[root@Mymaster ~]#
搞定,到了这里我们节点之间的通信
说明是没有问题的!!
配置免密登陆
免密详情介绍请见往期文章
思路→Mymaster 上的root用户 → Mymaster,Myslave01,Myslave02
我这里就配置一下上述免密就感觉比较ok了
配置Mymaster节点的用户root到Mymaster的免密码登录
[root@Mymaster ~]# ll ~/.ssh/
总用量 4
-rw-r--r-- 1 root root 524 1月 13 20:24 known_hosts
[root@Mymaster ~]# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:6bIMsRYtZ3lmaDTLLdcu07i/sz7OMLuL7WQnvhprYuM root@Mymaster
The key's randomart image is:
+---[RSA 2048]----+
| |
| |
| o |
| + * o |
| + @ S . |
| O B + |
| + o @ + |
| .+o.@ @o |
| oE+*oXBB= |
+----[SHA256]-----+
[root@Mymaster ~]# ssh-copy-id -i root@Mymaster
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@mymaster's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@Mymaster'"
and check to make sure that only the key(s) you wanted were added.
[root@Mymaster ~]# ll ~/.ssh/
总用量 16
-rw------- 1 root root 395 1月 14 16:48 authorized_keys
-rw------- 1 root root 1679 1月 14 16:47 id_rsa
-rw-r--r-- 1 root root 395 1月 14 16:47 id_rsa.pub
-rw-r--r-- 1 root root 524 1月 13 20:24 known_hosts
[root@Mymaster ~]# ssh Mymaster
Last login: Thu Jan 14 16:36:11 2021 from 192.168.8.1
[root@Mymaster ~]# exit
登出
Connection to mymaster closed.
[root@Mymaster ~]#
配置Mymaster节点的用户root到Myslave01的免密码登录
[root@Mymaster ~]# ssh-copy-id -i root@Myslave01
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@myslave01's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@Myslave01'"
and check to make sure that only the key(s) you wanted were added.
[root@Mymaster ~]# ssh Myslave01
Last login: Thu Jan 14 16:38:09 2021 from 192.168.8.1
[root@Myslave01 ~]# exit
登出
Connection to myslave01 closed.
[root@Mymaster ~]#
配置Mymaster节点的用户root到Myslave02的免密码登录
[root@Mymaster ~]# ssh-copy-id -i root@Myslave02
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root@myslave02's password:
Number of key(s) added: 1
Now try logging into the machine, with: "ssh 'root@Myslave02'"
and check to make sure that only the key(s) you wanted were added.
[root@Mymaster ~]# ssh Myslave02
Last login: Thu Jan 14 16:38:16 2021 from 192.168.8.1
[root@Myslave01 ~]# exit
登出
Connection to myslave02 closed.
[root@Mymaster ~]#
搞定!!
正式开始我们的hadoop分布式集群的安装
我们的口号是→上传解压重命名source生效修改配置文件
先弄一台机器的
Mymaster
[root@Mymaster opt]# tar -zxvf hadoop-2.7.6.tar.gz -C ../
[root@Mymaster soft]# cd ..
[root@Mymaster opt]# mv hadoop-2.7.6/ hadoop
[root@Mymaster opt]# cat /etc/profile.d/bigdata-etc.sh
# 配置jdk环境变量
export JAVA_HOME=/opt/jdk
export PATH=$PATH:$JAVA_HOME/bin
# 配置hadoop环境变量
export HADOOP_HOME=/opt/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
[root@Mymaster opt]# source /etc/profile.d/bigdata-etc.sh
[root@Mymaster opt]#
.......
与HDFS相关的配置文件
配置文件详情请见往期文章
↓core-site.xml↓
<configuration>
<configuration>
<!-- hdfs的地址名称:schame,ip,port-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://Mymaster:8020</value>
</property>
<!-- hdfs的基础路径,被其他属性所依赖的一个基础路径 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop/tmp</value>
</property>
</configuration>
</configuration>
hdfs-site.xml
<configuration>
<!-- namenode守护进程管理的元数据文件fsimage存储的位置-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/hadoop/dfs/name</value>
</property>
<!-- 确定DFS数据节点应该将其块存储在本地文件系统的何处-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/hadoop/dfs/data</value>
</property>
<!-- 块的副本数,连同自身,一共有三个副本 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--
块的大小,一块128M, block ,200M的话,有2个块,第一个块占用了128M,剩余的占用200M-128M
大数据领域中,一般存储到hdfs上的资源是经过压缩的,根据不同的压缩算法,压缩率会有所不同。100M ~>100K
下述的块的size单位是byte
1KB = 1024B
1M=1024KB
-->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!-- secondarynamenode守护进程的http地址:主机名和端口号。参考守护进程布局-->
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Myslave01:50090</value>
</property>
</configuration>
与Yarn相关的配置文件
yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- 指定resourcemanager的主机名-->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Mymaster</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<!--
指定mapreduce使用yarn资源管理器,Yet Another Resource Negotiator
spark,flink计算资源的调度也可以选用yarn
-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!--
配置作业历史服务器的地址
历史服务器: 查看过往的job运行的轨迹和状态信息的
-->
<property>
<name>mapreduce.jobhistory.address</name>
<value>Mymaster:10020</value>
</property>
<!-- 配置作业历史服务器的http地址-->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>Mymaster:19888</value>
</property>
</configuration>
slaves
[root@Mymaster ~]# vim /opt/hadoop/etc/hadoop/slaves
[root@Mymaster ~]# cat /opt/hadoop/etc/hadoop/slaves
Mymaster
Myslave01
Myslave02
[root@Mymaster ~]#
我们现在只是配置了Mymaster节点,所以
将Mymaster节点上配置好的hadoop同步到其他的节点上
[root@Mymaster ~]# scp -r /opt/hadoop/ root@Myslave01:/opt
.....
东西有点多可能要几十秒
然后在Myslave01确认
[root@Mymaster ~]# ssh Myslave01
Last login: Thu Jan 14 16:57:13 2021 from mymaster
[root@Myslave01 ~]# cd /opt
[root@Myslave01 opt]# ll
总用量 0
drwxr-xr-x 9 root root 149 1月 14 17:23 hadoop
drwxr-xr-x 8 10 143 255 9月 23 2016 jdk
drwxr-xr-x 2 root root 67 1月 13 16:59 soft
[root@Myslave01 opt]# exit
登出
Connection to myslave01 closed.
[root@Mymaster ~]#
ok,下面是Myslave02的
[root@Mymaster ~]# scp -r /opt/hadoop/ root@Myslave02:/opt
.......
[root@Mymaster ~]# ssh Myslave02
Last login: Thu Jan 14 16:59:20 2021 from mymaster
[root@Myslave01 ~]# cd /opt/
[root@Myslave01 opt]# ll
总用量 0
drwxr-xr-x 9 root root 149 1月 14 17:27 hadoop
drwxr-xr-x 8 10 143 255 9月 23 2016 jdk
drwxr-xr-x 2 root root 67 1月 13 16:59 soft
[root@Myslave01 opt]# exit
登出
Connection to myslave02 closed.
[root@Mymaster ~]#
好的那么我们的配置文件就搞定了
接着就是格式NN了(详情介绍见往期文章)
由于我们的设计NN是Mymater上的进程,因此我们的格式化当然也是在Mymaster上的
[root@Mymaster ~]# hadoop namenode -format
......
启动集群
为了安全起见第一次还是分开启动
查看hdfs相关的监控页面(本质在访问NameNode进程)
查看yarn相关的监控页面(本质在访问ResourceManager进程)
验证(同样的上传→查看→下载→yarn)
[root@Mymaster input]# yarn jar /opt/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount hdfs://Mymaster:8020/input hdfs://Mymaster:8020/output
.........
[root@Mymaster input]# hdfs dfs -cat /output/*
aaa 1
ddasf 1
dfsa 1
dfsaa 1
dfsf 1
sdfs 1
到了这里我们的机器是没有问题的!!!
最后!!!!关机前要先关集群!!!!
那么本次复习又结束了!!!
编写于2021-1-14