前置条件:
[1] 已制作好CentOS虚拟机模板。
[2] 该模板安装好Oracle JDK,且JAVA_HOME值为/usr/java/jdk1.8.0_171-amd64/jre。
准备三台虚拟机
[1] 搭建完全Hadoop分布式+HA最少需要三台服务器,假设三台服务器的IP地址如下:
192.168.159.200 hadoop01
192.168.159.201 hadoop02
192.168.159.202 hadoop03
[2] 配置要求:建议每台虚拟机的配置最低为2核4G,如果主机内存确实有限,可以改为2核3G。
部署架构
配置文件
可分为三类:
[1] 只读的默认配置文件,包括
hadoop-2.7.3/share/doc/hadoop/hadoop-project-dist/hadoop-common/core-default.xml
hadoop-2.7.3/share/doc/hadoop/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
hadoop-2.7.3/share/doc/hadoop/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
hadoop-2.7.3/share/doc/hadoop/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
[2] site相关的配置,包括:
hadoop-2.7.3/etc/hadoop/core-site.xml
hadoop-2.7.3/etc/hadoop/hdfs-site.xml
hadoop-2.7.3/etc/hadoop/yarn-site.xml
hadoop-2.7.3/etc/hadoop/mapred-site.xml
[3] 控制脚本文件,在hadoop-2.7.3/etc/hadoop/*-env.sh
说明:以下操作在hadoop01上进行。
第一步:操作系统配置
[1] 修改/etc/hostname的内容为hadoop01
[2] 修改/etc/hosts的内容为
127.0.0.1 localhost
192.168.159.200 hadoop01
192.168.159.201 hadoop02
192.168.159.202 hadoop03
[3] 重启操作系统
[root@centos7 ~]# init 6
第二步:Hadoop和Zookeeper安装包下载
[1] hadoop采用2.7.3版本
官网下载https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz 或者我的百度网盘:链接:https://pan.baidu.com/s/1I351UowJLfkClf6v0iRytA 密码:5c9d
[2] zookeeper采用3.4.6版本
官网下载https://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz 或者我的百度网盘:链接:https://pan.baidu.com/s/1QQ6WKKI2fdg4JOvJ2tlQeg 密码:lgkx
第三步:解压hadoop和zookeeper包
[1] 将hadoop-2.7.3.tar.gz和zookeeper-3.4.6.tar.gz上传到/root目录下。
[root@hadoop01 ~]# cd /opt/
[root@hadoop01 opt]# tar zxf ~/hadoop-2.7.3.tar.gz
[root@hadoop01 opt]# tar zxf ~/zookeeper-3.4.6.tar.gz
[2] 创建hadoop需要的目录
[root@hadoop01 opt]# mkdir -p /opt/hadoop-2.7.3/data/namenode
[root@hadoop01 opt]# mkdir -p /opt/hadoop-2.7.3/data/datanode
[3] 创建zookeeper需要的目录
[root@hadoop01 opt]# mkdir -p /opt/zookeeper-3.4.6/data
第四步:配置Hadoop
- 配置hadoop-env.sh
[root@hadoop01 hadoop2.7.3] vim etc/hadoop/hadoop-env.sh
编辑etc/hadoop/hadoop-env.sh,修改JAVA_HOME的值如下:
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_171-amd64/jre
[2] 配置core-site.xml
编辑etc/hadoop/core-site.xml,修改如下:
<configuration>
<property>
<!-- The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. default value: file:/// -->
<name>fs.defaultFS</name>
<value>hdfs://ha-cluster</value>
</property>
<property>
<!-- A base for other temporary directories.default value: /tmp/hadoop-${user.name} -->
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.7.3/hadoop-tmp</value>
</property>
<property>
<!--A list of ZooKeeper server addresses, separated by commas, that are to be used by the ZKFailoverController in automatic failover.-->
<name>ha.zookeeper.quorum</name>
<value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
</property>
</configuration>
说明:hadoop.tmp.dir默认值为"/tmp/hadoop-${user.name}"。Linux操作系统重启后,这个目录会被清空,这可能导致数据丢失,因此需要修改。
[3] 配置hdfs-site.xml
编辑etc/hadoop/hdfs-site.xml,修改如下:
<configuration>
<property>
<!-- Default block replication. The actual number of replications can be specified when the file is created. The default is used
if replication is not specified in create time. Default value: 3 -->
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<!-- Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. Defualt value: file://${hadoop.tmp.dir}/dfs/name -->
<name>dfs.namenode.name.dir</name>
<value>/opt/hadoop-2.7.3/data/namenode</value>
</property>
<property>
<!-- Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Default value: file://${hadoop.tmp.dir}/dfs/data -->
<name>dfs.datanode.data.dir</name>
<value>/opt/hadoop-2.7.3/data/datanode</value>
</property>
<property>
<!-- the path where the JournalNode daemon will store its local state. -->
<name>dfs.journalnode.e