一、当前环境状况介绍
目前我的环境上总共有2个namenode和7个datanode,现在想增加更多的机器(其实就是一堆老旧残的机器,随时都可能有生命危险,哈哈)。
上图是我目前的7个datanode节点,我希望有机器增加了,增加多一台storage155。
二、前期工作
1、保证增加的节点与主节点ssh互通
使用ssh-keygen生成本地的rsa-key,或者自己手工在home目录创建一个.ssh的文件夹,我这里使用的是ssh-keygen,然后创建一个authorized_keys文件,并将namenode节点上的public-key拷贝过来,因为是HA模式,需要将两个namenode节点的key都需要复制过来。示例如下:
hadoop@storage155:~/.ssh$ ls
authorized_keys id_rsa id_rsa.pub
hadoop@storage155:~/.ssh$ more authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCXVq1gdtV2Z1A0UpNIqoY7yxAvdA9UP7kdZMXNNtZEoeP/2NG2yX9rd2gbV7qRypYoWVvE5kahGZ6uTOc03RS/GHmwHPJE0+U1f6UPS4PRbnHlOr4xvaFLEA
TnBkKanYOklq4tM39Q6LF5PTL5r08aC+govMXMwZgHfKC76N35pFXU+iuZhhrLpQ+PN2ScUJvo/SYMyX3SCGJzw+c3p0XrYyw+mYaRE0eC+/OUOPvZnclZ+J0d80FiSuktYSl7u5ZFiqqoDkEjdFL+lCl0Oni0
Mz2/5S1BV81wHLiyPmZSCTMunwi6QgqrLMo38pNXdYGLstFKHfiO5jYgE94D91hZ hadoop@storage14
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCq5Sjzg2RkmLbW/8q267LP78D/WhwCMxfM4CJ8BLsbZmCx0aJuFi8ol/0Nr5UL4ES32Nf/j+zrfqZMl1RAAk3O5uf/asKp1NByuCxvs7jae0Ox02648iDKoK
mJpL+CGGN+1kvIZ7e8qDRhvAqLbBrFjcAk5QOo47lXuSf1TnX7hiJYj8X34xBNJHehUP61vzM6t4uftAnnmwnc9EGsnGNp77OzOs+Ua+q6awZwMIog2re0XndfB+8ft8iczQ24ekY8UagHixnYsaAeyZgE88/7
DaXTHbY2Tp27tm8efwanIS5qwFz81m8GUIhjinr7aNEZR48/0FG5Xrf/bpDXQ9lD hadoop@storage60
2、配置所有节点的hosts
将新增的storage155的域名增加了原来的9个节点中,并将其他9个节点对应的域名增加到storage155的hosts中。(/etc/host文件,仅为新手不清楚时查看)
3、安装hadoop
在storage155这个节点上安装hadoop安装包,这里完全可以将namenode或者datanode上的安装包拷贝过去就行了
scp -r hadoop hadoop@storage155:/data/hadoop_home
4、将storage155增加到hadoop的slaves配置文件中
将storage155增加到hadoop-2.7.3/etc/hadoop/slaves文件中,并将配置文件同步到所有节点(为了保证所有节点的配置一致性)
三、启动新增的datanode
直接在namenode上执行start-all.sh或者在storage155上执行start-dfs.sh都可以完成。
四、安装过程中遇到的问题
1、多目录存放的问题
配置完成后,执行启动时报了这样的错
java.io.IOException: All directories in dfs.datanode.data.dir are invalid: "/data1/deeplearn/dfs/data" "/data2/deeplearn/dfs/data" "/data3/deeplearn/dfs/data" "/data4/deeplearn/dfs/data" "/data5/deeplearn/dfs/data" "/data6/deeplearn/dfs/data" "/data7/deeplearn/dfs/data" "/data8/deeplearn/dfs/data" "/data9/deeplearn/dfs/data" "/data10/deeplearn/dfs/data" "/data11/deeplearn/dfs/data" "/data12/deeplearn/dfs/data"
at org.apache.hadoop.hdfs.server.datanode.DataNode.checkStorageLocations(DataNode.java:2396)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2369)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2261)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2308)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2485)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:2509)
2019-09-30 16:15:27,813 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
因为配置数据存放在多个目录,我的hdfs-site.xml中的dfs.datanode.data.dir配置是这样的:
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data1/deeplearn/dfs/data,file:///data2/deeplearn/dfs/data,file:///data3/deeplearn/dfs/data,file:///data4/deeplearn/dfs/data,fil
e:///data5/deeplearn/dfs/data,file:///data6/deeplearn/dfs/data,file:///data7/deeplearn/dfs/data,file:///data8/deeplearn/dfs/data,file:///data9/deeplearn/dfs/d
ata,file:///data10/deeplearn/dfs/data,file:///data11/deeplearn/dfs/data,file:///data12/deeplearn/dfs/data</value>
</property>
因此需要保证上面这些目录都存在,并且有读写权限
root@ubuntu:/home/linxiaojie1# mkdir /data1/deeplearn /data2/deeplearn /data3/deeplearn /data4/deeplearn /data5/deeplearn /data6/deeplearn /data7/deeplearn /data8/deeplearn /data9/deeplearn /data10/deeplearn /data11/deeplearn /data12/deeplearn
root@ubuntu:/home/linxiaojie1# chown -R hadoop:hadoop /data1/deeplearn /data2/deeplearn /data3/deeplearn /data4/deeplearn /data5/deeplearn /data6/deeplearn /data7/deeplearn /data8/deeplearn /data9/deeplearn /data10/deeplearn /data11/deeplearn /data12/deeplearn
2、文件权限问题
启动时也报了另一个错误
java.io.IOException: the path component: '/data/hadoop_home/hadoop' is group-writable, and the group is not root. Its permissions are 0775, and it is owned by gid 1000. Please fix this or select a different socket path.
将文件夹的权限调整下就可以了,命令:chmod 755 hadoop/
五、确认运行状态
好了,一切正常,数据也成功写入了。