网上找了关于搭建HA环境的很多文章,但都不全,所以自己写一个记录一下。虽然不是傻瓜式的,但基本上所有的点都有涉及到了。
准备三台服务器
192.168.11.70
192.168.11.71
192.168.11.72
1、下载组件
首先去CDH网站上下载hadoop组件
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.1.tar.gz
2、环境配置
设置主机名和配置静态IP
vi /etc/hosts
192.168.11.70 hnode1
192.168.11.71 hnode2
192.168.11.72 hnode3
分别修改主机名
hostname hnode1
hostname hnode2
hostname hnode3
关闭防火墙
service iptables stop
配置JDK和HADOOP
vim ~/.bash_profile
export JAVA_HOME=/opt/modules/jdk1.8.0_171
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
export HADOOP_HOME=/opt/modules/hadoop-2.6.0-cdh5.16.1
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
source ~/.bash_profile
3、配置hadoop
1.新建用户hadoop,从root用户获取/opt文件夹的权限,所有节点都要执行
useradd -m hadoop -s /bin/bash
passwd hadoop
chown -R hadoop /opt/modules
2、为hadoop用户添加管理权限
visudo
## Next comes the main part: which users can run what software on
## which machines (the sudoers file can be shared between multiple
## systems).
## Syntax:
##
## user MACHINE=COMMANDS
##
## The COMMANDS section may have other options added to it.
##
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
**hadoop ALL=(ALL) ALL**
3、hadoop的安装路径不推荐安装在/home/hadoop目录下,推荐安装在/opt目录下,然后切换到hadoop用户,解压文件后将hadoop转移到/opt/modules下
4、切换到hadoop用户,配置SSH免密登录
#生成SSH密钥,在三台机器上分别执行
ssh-keygen -t rsa
#生成认证文件
cat *.pub > authorized_keys
#将各节点的公钥文件汇总到一个总的认证文件authorized_keys中
scp -r hadoop@192.168.11.71:/home/hadoop/.ssh/authorized_keys authorized_keys_node2
scp -r hadoop@192.168.11.72:/home/hadoop/.ssh/authorized_keys authorized_keys_node3
#合并
cat authorized_keys_node2 >> authorized_keys
cat authorized_keys_node3 >> authorized_keys
#分发 192.168.11.71上执行
scp -r hadoop@192.168.11.70:/home/hadoop/.ssh/authorized_keys /home/hadoop/.ssh
#分发 192.168.11.72上执行
scp -r hadoop@192.168.11.70:/home/hadoop/.ssh/authorized_keys /home/hadoop/.ssh
#修改权限,否则生成了rsa也还是需要输入密码
chmod 700 ~/.ssh
chmod 600 ~/.ssh/authorized_keys
4、安装、配置zookeeper
5、创建datanode和namenode文件目录
sudo mkdir /data
sudo mkdir /data/dfs
sudo mkdir /data/dfs/tmp
sudo mkdir /data/dfs/dn
sudo mkdir /data/dfs/nn
sudo mkdir /data/dfs/jn
sudo mkdir /data/hadoop-yarn
sudo chown -R hadoop.hadoop /data
sudo mkdir /var/log/hadoop-yarn
sudo chown -R hadoop.hadoop /var/log/hadoop-yarn
6、修改配置文件
配置文件的位置为/etc/hadoop/conf目录下,主要文件:
配置名称 | 类型 | 说明 |
---|---|---|
yarn-site.xml | xml | YARN守护进程的配置项,包括ResourceManager和NodeManager等 |
yarn-env.sh | Bash脚本 | Yarn运行环境变量设置 |
slaves | Plain Text | 运行DataNode和NodeManager的机器列表,每行一个 |
mapred-site.xml | xml | MapReduce计算框架的配置项 |
mapred-queues.xml | xml | MR队列设置 |
log4j.properties | 系统日志文件、NameNode审计日志DataNode子进程的任务日志的属性 | |
hdfs-site.xml | xml | HDFS守护进程的配置项,包括NameNode、SecondaryNameNode、DataNode、JN等 |
hadoop-metrics.properties | Java属性 | 控制metrics在Hadoop上如何发布的属性 |
hadoop-metrics2.properties | Java属性 | 控制metrics在Hadoop上如何发布的属性 |
hadoop-env.sh | Bash脚本 | Hadoop运行环境变量设置 |
exclude | Plain Text | 移除DN节点配置文件 |
core-site.xml | xml | Hadoop的配置项,例如HDFS和MapReduce常用的I/O设置等 |
container-executor.cfg | Cfg | Yarn Container配置 |
capacity-scheduler.xml | xml | Yarn调度属性设置 |
6.1 修改hadoop-env.sh
JAVA_HOME=/opt/modules/jdk1.8.0_171
6.2 修改core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为erph,这里必须与namenode集群名称相同 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://erph</value>
</property>
<!--指定hadoop数据临时存放目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>file:/data/dfs/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<!--指定zookeeper地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>hnode1:2181,hnode2:2181,hnode3:2181</value>
</property>
</configuration>
6.3 修改hdfs-site.xml
<configuration>
<!--开启web hdfs -->
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hnode1:50090</value>
</property>
<!-- 文件副本个数,默认为3份 -->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!-- NameNode存放reansaction file(edits)本地目录 -->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/data/dfs/nn</value>
</property>
<!-- NameNode存放reansaction file(edits)本地目录 -->
<property>
<name>dfs.namenode.edits.dir</name>
<value>${dfs.namenode.name.dir}</value>
</property>
<!-- DataNode存放block本地目录 -->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/data/dfs/dn</value>
</property>
<!-- 块大小 128M(默认) -->
<property>
<name>dfs.blocksize</name>
<value>134217728</value>
</property>
<!--======================================================================= -->
<!--HDFS高可用配置 -->
<!--nameservices逻辑名 -->
<property>
<name>dfs.nameservices</name>
<value>erph</value>
</property>
<!-- 设置NameNode IDs-->
<property>
<name>dfs.ha.namenodes.erph</name>
<value>hnode1,hnode2</value>
</property>
<!-- Hdfs HA: dfs.namenode.rpc-address.[nameservice ID] rpc 通信地址 -->
<property>
<name>dfs.namenode.rpc-address.erph.hnode1</name>
<value>hnode1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.erph.hnode2</name>
<value>hnode2:8020</value>
</property>
<!-- Hdfs HA: dfs.namenode.http-address.[nameservice ID] http 通信地址 -->
<property>
<name>dfs.namenode.http-address.erph.hnode1</name>
<value>hnode1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.erph.hnode2</name>
<value>hnode2:50070</value>
</property>
<!--==================Namenode editlog同步 ============================================ -->
<!--设置JournalNode服务器地址,QuorumJournalManager用于存储editlog -->
<!--格式:qjournal://<host1:port1>;<host2:port2>;<host3:port3>/<journalId> 端口同journalnode.rpc-address -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hnode1:8485;hnode2:8485;hnode3:8485/erph</value>
</property>
<!--JournalNode存放数据地址 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/dfs/jn</value>
</property>
<!--==================DataNode editlog同步 ============================================ -->
<property>
<!--DataNode,Client连接Namenode识别选择Active NameNode策略 -->
<name>dfs.client.failover.proxy.provider.erph</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--==================Namenode fencing:=============================================== -->
<!--Failover后防止停掉的Namenode启动,造成两个服务 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<!--多少milliseconds 认为fencing失败 -->
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<!--==================NameNode auto failover base ZKFC and Zookeeper====================== -->
<!--开启基于Zookeeper及ZKFC进程的自动备援设置,监视进程是否死掉 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<!--<value>Zookeeper-01:2181,Zookeeper-02:2181,Zookeeper-03:2181</value>-->
<value>hnode1:2181,hnode2:2181,hnode3:2181</value>
</property>
<property>
<!--指定ZooKeeper超时间隔,单位毫秒 -->
<name>ha.zookeeper.session-timeout.ms</name>
<value>2000</value>
</property>
<property>
<name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
<value>true</value>
</property>
</configuration>
6.4 修改mapred-site.xml
目录中只有一个mapred-site.xml.template文件,cp一份出来
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
6.5 修改yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>file:///data/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
<description>List of directories to store localized files in.</description>
</property>
<property>
<description>Where to store container logs.</description>
<name>yarn.nodemanager.log-dirs</name>
<value>file:///var/log/hadoop-yarn/containers</value>
</property>
<property>
<description>Classpath for typical applications.</description>
<name>yarn.application.classpath</name>
<value>
$HADOOP_CONF_DIR,
$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*,
$HADOOP_COMMON_HOME/share/hadoop/common/*,
$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*,
$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/*,
$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
</value>
</property>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>erph-yarn</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>hnode3</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>hnode2</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>hnode1:2181,hnode2:2181,hnode3:2181</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>hnode3:8089</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>hnode2:8089</value>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>5</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>32</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>hdfs://erph/var/log/hadoop-yarn/apps</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>8</value>
</property>
<property>
<name>yarn.nodemanager.delete.debug-delay-sec</name>
<value>86400</value>
</property>
<property>
<name>yarn.nodemanager.log.retain-seconds</name>
<value>86400</value>
</property>
</configuration>
6.6 修改slaves文件
配置的都是datanode
hnode1
hnode2
hnode3
7、启动命令
7.1、首次启动命令
在某一个namenode节点执行如下命令,创建命名空间
hdfs zkfc -formatZK
启动JournalNodes,在三台机器上执行
sbin/hadoop-daemon.sh start journalnode
在主namenode节点格式化namenode和journalnode目录
hdfs namenode -format -clusterId erph
主节点启动namenode进程
sbin/hadoop-daemon.sh start namenode
在备namenode节点执行第一行命令,这个是把备namenode节点的目录格式化并把元数据从主namenode节点copy过来,并且这个命令不会把journalnode目录再格式化了!然后用第二个命令启动备namenode进程
hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
在两个namenode节点都执行以下命令 实现自动failover,监控主备namenode
sbin/hadoop-daemon.sh start zkfc
在所有datanode节点都执行以下命令启动datanode
sbin/hadoop-daemon.sh start datanode
7.2、日常启停命令
sbin/start-dfs.sh
sbin/stop-dfs.sh
8、添加本地依赖库
查看本地依赖库
hadoop checknative -a
[hadoop@hnode1 sbin]$ hadoop checknative -a
19/01/04 13:55:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Native library checking:
hadoop: false
zlib: false
snappy: false
lz4: false
bzip2: false
openssl: false
19/01/04 13:55:10 INFO util.ExitUtil: Exiting with status 1
新建目录native
mkdir native
cd native
下载对应hadoop版本的RPM包
http://archive.cloudera.com/cdh5/redhat/6/x86_64/cdh/5.16.1/RPMS/x86_64/hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3.el6.x86_64.rpm
抽取rpm包内容
rpm2cpio hadoop-2.6.0+cdh5.16.1+2848-1.cdh5.16.1.p0.3.el6.x86_64.rpm | cpio -idmv
将usr/lib/hadoop/lib/native目录下内容复制到$HADOOP_HOME/lib/native目录
再次执行
hadoop checknative -a
[hadoop@hnode1 lib]$ hadoop checknative -a
19/01/04 13:57:32 WARN bzip2.Bzip2Factory: Failed to load/initialize native-bzip2 library system-native, will use pure-Java version
19/01/04 13:57:32 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /opt/modules/hadoop-2.6.0-cdh5.16.1/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: false
lz4: true revision:99
bzip2: false
openssl: false Cannot load libcrypto.so (libcrypto.so: cannot open shared object file: No such file or directory)!
19/01/04 13:57:32 INFO util.ExitUtil: Exiting with status 1
如果不添加本地依赖库,执行hdfs命令时会提示
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using built
9、测试验证
浏览器输入http://192.168.11.70:50070,会出现如下界面,表明node1是主节点
浏览器输入http://192.168.11.71:50070,会出现如下界面,表明node2是备节点
然后在192.168.11.70执行jps,杀掉相应的namenode进程,前面standby所对应的namenode变成active