环境准备
随着公司业务的增长,数据量越来越大,原有的数据节点的容量已经不能满足存储数据的需求,需要在原有集群基础上动态添加新的数据节点。 (1)克隆一台虚拟机(克隆cslave1为cslave2) (2)修改ip地址和主机名称(hostname:cslave2;ip:192.168.1.104) 需要修改的地方:
vi /etc/udev/rules.d/70-persistent-net.rules
#注释掉eth0这一行,把eth1改为eth0,并复制address物理地址
vi /etc/sysconfig/network-scripts/ifcfg-eth0
#修改IPADDR,修改HWADDR地址为刚刚复制的物理地址
vi /etc/sysconfig/network
#修改主机HOSTNAME为cslave2
vi /etc/hosts
#加入ip、hostname映射
192.168.1.101 cmaster0
192.168.1.102 cslave0
192.168.1.103 cslave1
192.168.1.104 cslave2
(3)在cslave2上,删除原来HDFS文件系统留存的文件(因为复制的cslave1,所以整个/opt/module目录是存在的)
[hadoop@cslave2 tmp]$ pwd
/opt/module/hadoop-2.7.2/data/tmp
[hadoop@cslave2 tmp]$ rm -rf *
配置ssh免密码
[hadoop@cslave2 ~]$ cd
[hadoop@cslave2 ~]$ rm -rf .ssh/
[hadoop@cslave2 ~]$ ssh-keygen -t rsa
[hadoop@cslave2 .ssh]$ cd ~/.ssh
[hadoop@cslave2 .ssh]$ cp id_rsa.pub authorized_keys
##分发ssh公钥(所有节点都要做),这一步需要在集群中每台机器都做
##把各个节点的authorized_keys的内容互相拷贝加入到对方的此文件中,然后就可以免密码彼此ssh连入
测试ssh(所有节点都要做)
# ssh cmaster0 date
# ssh cslave0 date
# ssh cslave1 date
# ssh cslave2 date
集群部署规划
/ cmaster0 cslave0 cslave1 cslave2 HDFS DataNode DataNode DataNode DataNode HDFS NameNode / SecondaryNameNode / YARN NodeManager NodeManager NodeManager NodeManager YARN / ResourceManager / /
服役新数据节点
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version cmaster0:50010 (192.168.1.101:50010) 2 In Service 27.01 GB 256 KB 5.72 GB 21.29 GB 9 256 KB (0%) 0 2.7.2 cslave0:50010 (192.168.1.102:50010) 2 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2 cslave1:50010 (192.168.1.103:50010) 2 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2
(1)在namenode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建dfs.hosts文件,包含所有服役的节点
[hadoop@cmaster0 hadoop]$ pwd
/opt/module/hadoop-2.7.2/etc/hadoop
[hadoop@cmaster0 hadoop]$ vi dfs.hosts
cmaster0
cslave0
cslave1
cslave2
(2)在namenode的hdfs-site.xml配置文件中增加dfs.hosts属性
<property>
<name>dfs.hosts</name>
<value>/opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts</value>
</property>
[hadoop@cmaster0 hadoop]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version cmaster0:50010 (192.168.1.101:50010) 2 In Service 27.01 GB 256 KB 5.72 GB 21.29 GB 9 256 KB (0%) 0 2.7.2 cslave0:50010 (192.168.1.102:50010) 2 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2 cslave1:50010 (192.168.1.103:50010) 2 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2 cslave2:50010 (192.168.1.104:50010) Wed Dec 26 19:32:45 UTC+0800 2018 Dead - - - - - - - -
[hadoop@cmaster0 hadoop]$ yarn rmadmin -refreshNodes
18/12/27 04:47:16 INFO client.RMProxy: Connecting to ResourceManager at cslave0/192.168.1.102:8033
(5)在namenode的slaves文件中增加新主机名称(不需要分发) 下一次启动集群的时候,才能认识新增的datanode节点cslave2
[hadoop@cmaster0 hadoop]$ vi slaves
cmaster0
cslave0
cslave1
cslave2
[hadoop@cslave2 hadoop]$ hadoop-daemon.sh start datanode
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hadoop-datanode-cslave2.out
[hadoop@cslave2 hadoop]$ yarn-daemon.sh start nodemanager
starting nodemanager, logging to /opt/module/hadoop-2.7.2/logs/yarn-hadoop-nodemanager-cslave2.out
[hadoop@cslave2 hadoop]$ jps
3035 Jps
3003 NodeManager
2907 DataNode
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version cmaster0:50010 (192.168.1.101:50010) 1 In Service 27.01 GB 256 KB 5.72 GB 21.29 GB 9 256 KB (0%) 0 2.7.2 cslave0:50010 (192.168.1.102:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2 cslave1:50010 (192.168.1.103:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 9 256 KB (0%) 0 2.7.2 cslave2:50010 (192.168.1.104:50010) 1 In Service 27.01 GB 24 KB 5.38 GB 21.63 GB 0 24 KB (0%) 0 2.7.2
[hadoop@cslave2 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[hadoop@cslave2 hadoop-2.7.2]$ ./sbin/start-balancer.sh
starting balancer, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hadoop-balancer-cslave2.out
退役旧数据节点
1)在namenode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建dfs.hosts.exclude文件,仅包含要退役的节点
[hadoop@cmaster0 hadoop]$ vi dfs.hosts.exclude
cslave2
2)在namenode的hdfs-site.xml配置文件中增加dfs.hosts.exclude属性
[hadoop@cmaster0 hadoop]$ vi hdfs-site.xml
<property>
<name>dfs.hosts.exclude</name>
<value>/opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts.exclude</value>
</property>
3)刷新namenode、刷新resourcemanager
[hadoop@cmaster0 hadoop]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[hadoop@cmaster0 hadoop]$ yarn rmadmin -refreshNodes
18/12/27 05:24:08 INFO client.RMProxy: Connecting to ResourceManager at cslave0/192.168.1.102:8033
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version cmaster0:50010 (192.168.1.101:50010) 1 In Service 27.01 GB 256 KB 5.72 GB 21.29 GB 10 256 KB (0%) 0 2.7.2 cslave0:50010 (192.168.1.102:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 10 256 KB (0%) 0 2.7.2 cslave1:50010 (192.168.1.103:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 10 256 KB (0%) 0 2.7.2 cslave2:50010 (192.168.1.104:50010) 1 Decommission In Progress 27.01 GB 24 KB 5.38 GB 21.63 GB 1 24 KB (0%) 0 2.7.2
[hadoop@cslave2 ~]$ jps
3003 NodeManager
2907 DataNode
3344 Jps
[hadoop@cslave2 ~]$ hadoop-daemon.sh stop datanode
stopping datanode
[hadoop@cslave2 ~]$ yarn-daemon.sh stop nodemanager
stopping nodemanager
[hadoop@cslave2 ~]$ jps
3450 Jps
5)从include文件中删除退役节点,再运行刷新节点的命令 (1)从namenode的dfs.hosts文件中删除退役节点cslave2
[hadoop@cmaster0 hadoop]$ vi dfs.hosts
cmaster0
cslave0
cslave1
(2)刷新namenode,刷新resourcemanager
[hadoop@cmaster0 hadoop]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful
[hadoop@cmaster0 hadoop]$ yarn rmadmin -refreshNodes
18/12/27 05:37:21 INFO client.RMProxy: Connecting to ResourceManager at cslave0/192.168.1.102:8033
Node Last contact Admin State Capacity Used Non DFS Used Remaining Blocks Block pool used Failed Volumes Version cmaster0:50010 (192.168.1.101:50010) 1 In Service 27.01 GB 256 KB 5.72 GB 21.29 GB 10 256 KB (0%) 0 2.7.2 cslave0:50010 (192.168.1.102:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 10 256 KB (0%) 0 2.7.2 cslave1:50010 (192.168.1.103:50010) 1 In Service 27.01 GB 256 KB 5.38 GB 21.63 GB 10 256 KB (0%) 0 2.7.2
6)从namenode的slave文件中删除退役节点cslave2
[hadoop@cmaster0 hadoop]$ vi slaves
cmaster0
cslave0
cslave1
[hadoop@cslave2 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[hadoop@cslave2 hadoop-2.7.2]$ ./sbin/start-balancer.sh
starting balancer, logging to /opt/module/hadoop-2.7.2/logs/hadoop-hadoop-balancer-cslave2.out