hadoop集群的高可用(Namenode&&ResourceManager)

主机名ipsoft运行的进程
master192.168.1.115jdk,hadoopNameNode,DFSZKFailoverController(zkfc)
slave1192.168.1.116jdk,hadoopNameNode,DFSZKFailoverController(zkfc)
slave2192.168.1.117jdk,hadoopResourceManager
slave3192.168.1.118jdk,hadoopResourceManager
slave4192.168.1.119jdk,hadoop,zookeeperDataNode,NodeManager、JournalNode、QuorumPeerMain
slave5192.168.1.120jdk、hadoop、zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain
slave6192.168.1.121jdk、hadoop、zookeeperDataNode、NodeManager、JournalNode、QuorumPeerMain

创建数据目录,方便起见,在所有的节点都执行如下命令【有些节点因为角色不一样,有些目录是无用】

mkdir -p /home/qun/data/hadoop-2.8/name 
mkdir -p /home/qun/data/hadoop-2.8/data 
mkdir -p /home/qun/data/hadoop-2.8/tmp 
mkdir -p /home/qun/data/hadoop-2.8/namesecondary 
mkdir -p /home/qun/data/hadoop-2.8/journal 
配置zk cluster

修改slave4,slave5,slave6 $ZooKeeper_HOME/conf/zoo.cfg

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
#maxClientCnxns=60
#autopurge.snapRetainCount=3
#autopurge.purgeInterval=1
server.1=slave4:2888:3888
server.2=slave5:2888:3888
server.3=slave6:2888:3888

分别在slave4,slave5,slave6上设置zookeeper id

#slave4
echo 1 > /tmp/zookeeper/myid
#slave5
echo 2 > /tmp/zookeeper/myid
#slave6
echo 3 > /tmp/zookeeper/myid

启动zk,并且查看状态

 ./bin/zkServer.sh start conf/zoo.cfg &

[qun@slave5 zookeeper-3.4.6]$ ./bin/zkServer.sh status
JMX enabled by default
Using config: /home/qun/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: leader
配置hadoop集群

core-site.xml

<configuration>
    <property> 
        <name>fs.defaultFS</name> 
        <value>hdfs://ns1</value> 
    </property> 
    <property> 
        <name>hadoop.tmp.dir</name> 
        <value>/home/qun/data/hadoop-2.8/tmp</value> 
    </property> 

    <property> 
        <name>fs.checkpoint.period</name> 
        <value>3600</value> 
    </property> 
    <property> 
        <name>fs.checkpoint.size</name> 
        <value>67108864</value> 
    </property> 
    <property> 
        <name>fs.checkpoint.dir</name> 
        <value>/home/qun/data/hadoop-2.8/namesecondary</value> 
    </property>

    <property>
        <name>ha.zookeeper.quorum</name>
        <value>slave4:2181,slave5:2181,slave6:2181</value>
    </property>

</configuration>

hdfs-site.xml

<configuration>

    <property>
        <name>dfs.replication</name>
        <value>2</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/qun/data/hadoop-2.8/name</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/qun/data/hadoop-2.8/data</value>
    </property>

    <property>
        <name>dfs.nameservices</name>
        <value>ns1</value>
    </property>
    <property>
        <name>dfs.ha.namenodes.ns1</name>
        <value>nn1,nn2</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn1</name>
        <value>master:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn1</name>
        <value>master:50070</value>
    </property>
    <property>
        <name>dfs.namenode.rpc-address.ns1.nn2</name>
        <value>slave1:9000</value>
    </property>
    <property>
        <name>dfs.namenode.http-address.ns1.nn2</name>
        <value>slave1:50070</value>
    </property>
    <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://slave4:8485;slave5:8485;slave6:8485/ns1</value>
    </property>
    <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/home/qun/data/hadoop-2.8/journal</value>
    </property>
    <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>dfs.client.failover.proxy.provider.ns1</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    </property>
    <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
        sshfence
        shell(/bin/true)
        </value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/qun/.ssh/id_rsa</value>
    </property>
    <property>
        <name>dfs.ha.fencing.ssh.connect-timeout</name>
        <value>30000</value>
    </property>

</configuration>

yarn-site.xml

<configuration>

    <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yrc</value>
    </property>
    <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>slave2</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>slave3</value>
    </property>
    <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>slave4:2181,slave5:2181,slave6:2181</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>


</configuration>

mapred-site.xml

<configuration>

    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>

</configuration>

slaves

slave4
slave5
slave6
免秘钥登录配置

首先配置namenode的两个节点需要能够互相免秘钥登录,并且这两个节点还需要到datanode的免秘钥登录
在master上执行,生成秘钥,一直按回车即可

ssh-keygen -t rsa

将公钥拷贝到其他节点,包括自己

ssh-copy-id -i qun@master
ssh-copy-id -i qun@slave1
ssh-copy-id -i qun@slave2
ssh-copy-id -i qun@slave3
ssh-copy-id -i qun@slave4
ssh-copy-id -i qun@slave5
ssh-copy-id -i qun@slave6

在slave1上执行,生成秘钥,一直按回车即可

ssh-keygen -t rsa

将公钥拷贝到其他节点,包括自己

ssh-copy-id -i qun@slave1
ssh-copy-id -i qun@master
ssh-copy-id -i qun@slave4
ssh-copy-id -i qun@slave5
ssh-copy-id -i qun@slave6
在master节点执行
hdfs zkfc -formatZK

在slave4,slave5,slave6节点启动journalnode

hadoop-daemon.sh start journalnode
master节点启动namenode【active】
hdfs namenode -format
hadoop-daemon.sh start namenode
在slave1节点,同步元数据,并且启动namenode【standby】
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode

启动所有datanode,在master节点上执行【之前有些进程是已经启动过的,例如namenode,journalnode】

start-dfs.sh
查看namenode状态
[qun@master hadoop]$ hdfs haadmin -getServiceState nn1
active

[qun@master hadoop]$ hdfs haadmin -getServiceState nn2
standby

启动yarn,在slave2上执行

start-yarn.sh

在slave3上启动resourceManager【备用RM】

yarn-daemon.sh start resourcemanager
查看yarn状态
[qun@slave3 hadoop]$ yarn rmadmin -getServiceState rm1
active

[qun@slave3 hadoop]$ yarn rmadmin -getServiceState rm2
standby

测试hdfs,先上传一个文件,然后干掉master上的namenode进程【active】,再去访问hdfs,此时slave1上的namenode变成了active,再次启动master上的namenode,此时master上的namenode为standby状态

上传文件

[qun@master hadoop]$ hadoop fs -put core-site.xml /
[qun@master hadoop]$ hadoop fs -ls /
Found 1 items
-rw-r--r--   2 qun supergroup       1348 2018-08-06 22:00 /core-site.xml
[qun@master hadoop]$ hadoop fs -ls hdfs://ns1/
Found 1 items
-rw-r--r--   2 qun supergroup       1348 2018-08-06 22:00 hdfs://ns1/core-site.xml

kill master上的namenode

[qun@master hadoop]$ jps
5507 Jps
4663 NameNode
5149 DFSZKFailoverController
You have new mail in /var/spool/mail/qun
[qun@master hadoop]$ kill -9 4663

再次访问hdfs

[qun@master hadoop]$ hadoop fs -ls /
18/08/06 22:06:15 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/06 22:06:17 WARN ipc.Client: Failed to connect to server: master/192.168.1.115:9000: try once and fail.
java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:685)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:788)
    at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:410)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1550)
    at org.apache.hadoop.ipc.Client.call(Client.java:1381)
    at org.apache.hadoop.ipc.Client.call(Client.java:1345)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:796)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
    at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
    at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1717)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1437)
    at org.apache.hadoop.hdfs.DistributedFileSystem$27.doCall(DistributedFileSystem.java:1434)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1434)
    at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
    at org.apache.hadoop.fs.Globber.doGlob(Globber.java:282)
    at org.apache.hadoop.fs.Globber.glob(Globber.java:148)
    at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1686)
    at org.apache.hadoop.fs.shell.PathData.expandAsGlob(PathData.java:326)
    at org.apache.hadoop.fs.shell.Command.expandArgument(Command.java:245)
    at org.apache.hadoop.fs.shell.Command.expandArguments(Command.java:228)
    at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:103)
    at org.apache.hadoop.fs.shell.Command.run(Command.java:175)
    at org.apache.hadoop.fs.FsShell.run(FsShell.java:315)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at org.apache.hadoop.fs.FsShell.main(FsShell.java:378)
Found 1 items
-rw-r--r--   2 qun supergroup       1348 2018-08-06 22:00 /core-site.xml

我们可以看到他默认会先去访问master上的namenode,失败了,然后再去访问slave1上的namenode,这个有点费解,但总之访问是成功了,这点不知道如何去优化?有知道,请告知,多谢!!!

测试Yarn
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.3.jar wordcount /input /output2

因为ResourceManager也是做了高可用的,所以可以将其中一个kill掉,然后再去测试下Wordcount程序是否能成功执行,这里就不去赘述了

问题
  • ssh-copy-id :command not found 解决:yum -y install openssh-clients

参考链接:

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值