备注:一开始我的3台机器中的配置基本和第一台的一样,都是一开始的伪分布配置(只运行在1个机器上),配置过程主要根据博客的,视频上的做参考
1.配置hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>ns1</value>
</property>
<property>
<name>dfs.ha.namenodes.ns1</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn1</name>
<value>bd1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.ns1.nn2</name>
<value>bd4:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn1</name>
<value>bd1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.ns1.nn2</name>
<value>bd4:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://bd1:8485;bd4:8485;bd5:8485/ns1</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/bigdata01/app/hadoop-2.7.3/tmp/data/dfs/jn</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.ns1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/bigdata01/.ssh/id_rsa</value>
</property>
2.创建/home/bigdata01/app/hadoop-2.7.3/tmp/data/dfs/jn这个文件夹
3.配置core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns1</value>
</property>
4. 配置slaves文件
5.分发到其他机器上scp -r hadoop-2.7.3/ bd4:/home/bigdata01/app/
6. 三台机器分别启动Journalnode
hadoop-daemon.sh start journalnode
7. 启动Zookeeper,去bin目录下启动
./zkServer.sh start
8. 格式化NameNode
hadoop namenode -format
在第二台NameNode上:
hdfs namenode -bootstrapStandby
报错: FATAL ha.BootstrapStandby: Unable to fetch namespace information
from active NN at bd1/192.168.132.100:8020: Call From
bd4/192.168.132.103 to bd1:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
使用这个直接同步吧:scp -r data/ bd4:/home/bigdata01/app/hadoop-2.7.3/
9. 在第一台、第二台上启动NameNode:
hadoop-daemon.sh start namenode
查看HDFS Web页面
http://bd1:50070 两个都是standby
切换第一台为active状态:
hdfs haadmin -transitionToActive nn1
10.利用zookeeper集群实现故障自动转移,在配置故障自动转移之前,要先关闭集群,不能在HDFS运行期间进行配置
关闭NameNode、DataNode、JournalNode、zookeeper
hadoop-daemon.sh stop namenode
hadoop-daemon.sh stop journalnode
./zkServer.sh stop
11. 修改hdfs-site.xml
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
12. 修改core-site.xml
--指定zookeper集群地址
<property>
<name>ha.zookeeper.quorum</name>
<value>bd1:2181,bd4:2181,bd5:2181</value>
</property>
13. 将hdfs-site.xml和core-site.xml分发到其他机器
scp hdfs-site.xml bd4:/home/bigdata01/app/hadoop-2.7.3/etc/hadoop/
14. 启动zookeeper
15.格式化zkfc
hdfs zkfc -formatZK
格式化成功的标志: Successfully created /hadoop-ha/ns1 in ZK
16.直接start-dfs.sh
启动 NameNode、DataNode、JournalNode、zkfc
17.测试
上传文件,杀死namanode进程,查看文件是否上传成功
18. 也需要实现HA来保证ResourceManger的高可也用性(yarn的ha这个还不够完善,不如hdfs)
18.1修改yarn-site.xml,并分发到其他机器
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>106800</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-cluster</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm12,rm13</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm12</name>
<value>bd4</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm13</name>
<value>bd5</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>bd1:2181,bd4:2181,bd5:2181</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
18.2在第一台机器上启动 start-yarn.sh
第二,三台机器上启动 yarn-daemon.sh start resourcemanager
Web客户端访问bd4机器上的resourcemanager正常,它是active状态的。
http://192.168.132.103:8088/cluster
访问另外一个resourcemanager,因为他是standby,会自动跳转到active的resourcemanager。http://192.168.132.104:8088/cluster(standby的时候展示不出来)
18.3 测试YARN HA
hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 5 5
在job运行过程中,将Active状态的resourcemanager进程杀掉
以后的启动顺序:
3台机器上启动zookeeper
第一台机器上启动start-dfs.sh,start-yarn.sh
第二,三台机器上启动yarn-daemon.sh start resourcemanager