1、安装前准备
①、集群规划:
主机名称 | 用户 | 主机IP | 安装软件 | 运行进程 | |
centos71 | hzq | 192.168.1.201 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) | |
centos72 | hzq | 192.168.1.202 | jdk、hadoop | NameNode、DFSZKFailoverController(zkfc) | |
centos73 | hzq | 192.168.1.203 | jdk、hadoop | ResourceManager | |
centos74 | hzq | 192.168.1.204 | jdk、hadoop | ResourceManager | |
centos75 | hzq | 192.168.1.205 | jdk、hadoop | DataNode、NodeManager、JournalNode | |
centos76 | hzq | 192.168.1.206 | jdk、hadoop | DataNode、NodeManager、JournalNode | |
centos77 | hzq | 192.168.1.207 | jdk、hadoop | DataNode、NodeManager、JournalNode | |
centos78 | hzq | 192.168.1.205 | jdk、zookeeper | QuorumPeerMain | |
centos79 | hzq | 192.168.1.206 | jdk、zookeeper | QuorumPeerMain | |
centos710 | hzq | 192.168.1.207 | jdk、zookeeper | QuorumPeerMain |
②、每台主机之间设置免密登陆,参考《ssh免密登陆》
③、每台安装jdk1.8.0_131,安装及配置见《Linux安装JDK步骤》
④、Zookeeper集群搭建,搭建步骤参考《zookeeper-3.4.10安装教程---分布式配置》
⑤、修改“etc/hosts"文件如下:
192.168.31.128 centos71
192.168.31.129 centos72
192.168.31.130 centos73
192.168.31.131 centos74
192.168.31.132 centos76
192.168.31.133 centos75
192.168.31.137 centos77
192.168.31.134 centos78
192.168.31.135 centos79
192.168.31.136 centos710
⑥、准备Hadoop安装包:hadoop-2.8.0.tar.gz
⑦、关闭防火墙
2、Hadoop安装:
①、在"/home/hzq/software/"下创建"hadoop"文件夹
②、在"hadoop"目录下创建"data"文件夹,用于存放hadoop运行时文件
③、将"hadoop-2.8.0.tar.gz"解压到hadoop目录下
tar -zxvf ../package/hadoop-2.8.0.tar.gz -C /home/hzq/software/hadoop/
④、删除"hadoop-2.8.0"下"share"中的doc文件,为了提高scp拷贝时速度
rm -rf hadoop-2.8.0/share/doc
3、Hadoop配置:
①、修改 hadoop-env.sh 配置文件,修改JAVA_HOME
export JAVA_HOME=/home/hzq/software/jdk1.8.0_131
②、修改core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://hzqnns/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hzq/software/hadoop/data</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>centos78:2181,centos79:2181,centos710:2181</value>
</property>
③、修改hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.block.size</name>
<value>64M</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hzqnns</value>
</property>
<property>
<name>dfs.ha.namenodes.hzqnns</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hzqnns.nn1</name>
<value>centos71:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.hzqnns.nn1</name>
<value>centos71:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hzqnns.nn2</name>
<value>centos72:9000</value>
</property>
<property>
<name>dfs.namenode.http-address.hzqnns.nn2</name>
<value>centos72:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://centos75:8485;centos76:8485;centos77:8485/hzqnns</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hzq/software/hadoop/data/journaldata</value>
</property>
<!-- 开启NameNode失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.hzqnns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
<!-- 这里引入自己的shell脚本-->
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hzq/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
④、mapred-site.xml
将“mapred-site.xml.template”进行重命名。
mv mapred-site.xml.template mapred-site.xml
修改mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
⑤、yarn-site.xml
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>centos73</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>centos74</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>centos78:2181,centos79:2181,centos710:2181</value>
</property>
⑥、配置DataNode主机,修改slaves
centos75
centos76
centos77
⑦、将配置好的Hadoop发送到其他六台主机上
scp -r hadoop/ centos72:/home/hzq/software/
scp -r hadoop/ centos73:/home/hzq/software/
scp -r hadoop/ centos74:/home/hzq/software/
scp -r hadoop/ centos75:/home/hzq/software/
scp -r hadoop/ centos76:/home/hzq/software/
scp -r hadoop/ centos77:/home/hzq/software/
4、启动Hadoop(首次启动必须按照顺序来执行)
①、检查Zookeeper集群是否启动完成,如果没有,先启动Zookeeper集群。
- 分别在centos78,centos79,centos710启动zookeeper
zkServer.sh start
- 查看状态:一个leader,两个follower
zkServer.sh status
②、启动journalnode(分别在centos75、centos76、centos77上执行)
hadoop-daemon.sh start journalnode
注:运行jps命令检验是否启动成功,如成功,分别在
centos75、centos76、centos77多一个JournalNode进程
③、在centos71上格式化HDFS
hdfs namenode -format
④、使两个NameNode数据保持一直,将centos71主机上,data中的数据复制到centos72主机data中。
scp -r data/ centos72:/home/hzq/software/hadoop/data
⑤、在centos71上格式化ZKFC
hdfs zkfc -formatZK
⑥、在centos71上启动HDFS
start-dfs.sh
⑦、在centos73上启动Resourcemanager及NodeManager
start-yarn.sh
⑧、在centos74上启动Resourcemanager
yarn-daemon.sh start resourcemanager
5、验证是否启动成功:
①、在每台主机上分别使用jps验证。
②、HDFS管理界面 http://centos71:50070 或者 http://centos72:50070
③、MR管理界面 http://centos73:8088 或者 http://centos74:8088
6、常用命令:
- 查看hdfs的各节点状态信息
hdfs dfsadmin -report
- 获取一个namenode节点的HA状态
hdfs haadmin -getServiceState nn1
- 单独启动一个namenode进程
hadoop-daemon.sh start namenode
- 单独启动一个zkfc进程
hadoop-daemon.sh start zkfc
- 单独启动Resourcemanager进程
yarn-daemon.sh start resourcemanager
7、总结
1、搭建纯属于学习使用,没有做优化等等。
2、望路过大神多多指点指点。