环境准备
linux:centOS6.5 3台(node1,node2,node3) jdk-8u73-linux-x64.tar.gz hadoop-2.5.2.tar.gz zookeeper-3.4.6.tar.gz 集群启动后每台机器对应的进程: node1: DataNode,JournalNode,NodeManager,ResourceManager,NameNode,DFSZKFailoverController,QuorumPeerMain node2: DataNode,DataNode,JournalNode,NameNode,QuorumPeerMain,DFSZKFailoverController node3: NodeManager,JournalNode,QuorumPeerMain,DataNode
[配置ssh免密登录](http://my.oschina.net/aiguozhe/blog/33994)
关闭防火墙
service iptables stop
安装jdk并配置环境变量(百度一下)
安装zookeeper
解压安装包:tar -zxvf zookeeper-3.4.6.tar.gz 修改配置文件:将conf下的zoo_sample.cfg重命名为zoo.cfg,文件内容为: tickTime=2000 dataDir=/var/lib/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=node1:2888:3888 server.2=node2:2888:3888 server.3=node3:2888:3888 创建dataDir目录,并在每台机器的该目录下创建一个myid文件,文件内容node1为1,node2为2,node3为3 将zookeeper整个目录复制到其他机器上 配置zookeeper环境变量 启动zookeeper:***zkServer.sh start***安装hadoop
解压安装包:tar -zxvf hadoop-2.5.2.tar.gz
修改配置文件:
目录:hadoop-2.5.2/etc/hadoop
core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
hdfs-site.xml
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/var/lib/hadoop-hdfs/cache/hdfs/dfs/name</value>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node1:50090</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node1:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node2:50090</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node2:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node1:8485;node2:8485;node3:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/journal</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node1</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node2</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node1:2181,node2:2181,node3:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
修改slaves,内容为:
node1
node2
node3
配置文件修改后,将hadoop整个目录拷贝到其他机器上,保证hadoop配置文件一致。
每台机器机器配置hadoop环境变量,每台机器创建配置文件中需要的目录。
格式化hdfs
hdfs namenode -format格式化zkfc hdfs zkfc -formatZK
启动hadoop start-all.sh
启动yarn standby
yarn-daemon.sh start resourcemanager
使用jps查看每台机器进程是否对应
浏览器访问地址: 虚拟机ip地址:50070 --hdfs 虚拟机ip地址:8088 --yarn