上章中,完全分布式集群仅对namenode做了备份,并未对resourcemanager做备份,切不能自动切换主备,在生产环境中是十分危险的,本章将介绍一种HA的spark分布式计算集群的搭建方式。
一、安装前的准备
(1) jdk1.8.0_171.zip
(2)scala-2.11.1.tgz
(3)zookeeper-3.4.10.tar.gz
(4)hadoop-3.0.3.tar.gz
(5)spark-2.3.1-bin-hadoop2.7.tgz
服务器准备及角色规划:
10.10.10.1 spark01 namenode(active) resourcemanager(active) zookeeper
10.10.10.2 spark02 namenode(standby) zookeeper
10.10.10.3 spark03 resourcemanager(standby) zookeeper
10.10.10.4 spark04 datanode worker journalnode nodemanager
10.10.10.5 spark05 datanode worker journalnode nodemanager
10.10.10.6 spark06 datanode worker journalnode nodemanager
二、配置hosts和SSH免密码登陆
参考上节内容
三、安装JDK、SCALA
参考上节内容
四、搭建zookeeper集群
解压zookeeper 至用户目录(./usr/local/zookeeper) ,
进入/usr/local/zookeeper/conf目录下
cp zoo-sample.cfg zoo.cfg
vim zook.cfg
#加入以下配置
#zookeeper数据存放位置
dataDir=/usr/local/zookeeper/datadir
#zookeeper日志存放位置
dataLogDir=/usr/local/zookeeper/datalogdir
#2888原子广播端口,3888选举端口,zookeeper有几个节点,就配置几个server。
server.1=spark01:2888:3888
server.2=spark02:2888:3888
server.3=spark03:2888:3888
创建zookeeper数据文件路径,并执行如下操作:
mkdir -p /usr/local/zookeeper/datadir
cd /usr/local/zookeeper/datadir
vim myid
1
设定当前服务器在zookeeper集群中的编号
将zookeeper发送到其他两台服务器上,并修改myid编号:
scp -r /usr/local/zookeeper root@spark02:/usr/local/
scp -r /usr/local/zookeeper root@spark03:/usr/local/
#修改spark02的myid为2,spark03的myid为3
并分别配置环境变量ZOOKEEPER_HOME:
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER/bin
启动zookeeper集群,在三台服务器上分别执行:
/usr/local/zookeeper/bin/zkServer.sh start
#使用下面的命令查询zkServer状态
/usr/local/zookeeper/bin/zkServer.sh status
五、hadoop HA集群搭建
解压hadoop文件至用户目录(/usr/local/hadoop)
进入目录/usr/local/hadoop/etc/hadoop
修改文件hadoop-env.sh
vim hadoop-env.sh
export JAVA_HOME=/usr/local/java
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec
export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH
export HDFS_DATANODE_USER=root
export HDFS_DATANODE_SECURE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export HDFS_NAMENODE_USER=root
修改core-site.xml文件
vim core-site.xml
<configuration>
<!--ns为在hdfs-site.xml中定义的名称节点别名-->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<property>
<!--用来指定hadoop运行时产生文件的存放目录-->
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
<property>
<!--zookeeper集群地址-->
<name>ha.zookeeper.quorum</name>
<value>spark01:2181,spark02:2181,spark03:2181</value>
</property>
</configuration>
修改hdfs-site.xml
vim hdfs-site.xml
<configuration>
<!--执行hdfs的nameservice为ns,和core-site.xml保持一致-->
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<!--ns下有两个namenode,分别是nn1,nn2-->
<property>
<name>dfs.ha.namenodes.ns</name>
<value>nn1,nn2</value>
</property>
<!--nn1的RPC通信地址-->
<property>
<name>dfs.namenode.rpc-address.ns.nn1</name>
<value>spark01:9000</value>
</property>
<!--nn1的http通信地址-->
<property>
<name>dfs.namenode.http-address.ns.nn1</name>
<value>spark01:50070</value>
</property>
<!--nn2的RPC通信地址-->
<property>
<name>dfs.namenode.rpc-address.ns.nn2</name>
<value>spark02:9000</value>
</property>
<!--nn2的http通信地址-->
<property>
<name>dfs.namenode.http-address.ns.nn2</name>
<value>spark02:50070</value>
</property>
<!--指定namenode的元数据在JournalNode上的存放位置,
这样,namenode2可以从jn集群里获取最新的namenode的信息,达到热备的效果-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://spark04:8485;spark05:8485;spark06:8485/ns</value>
</property>
<!--指定JournalNode存放数据的位置-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/hadoop/journaldata</value>
</property>
<!--开启namenode故障时自动切换-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!--配置切换的实现方式-->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!--配置隔离机制-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!--配置隔离机制的ssh登录秘钥所在的位置-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<!--配置namenode数据存放的位置,可以不配置,如果不配置,默认用的是
core-site.xml里配置的hadoop.tmp.dir的路径-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop/tmp/namenode</value>
</property>
<!--配置datanode数据存放的位置,可以不配置,如果不配置,默认用的是
core-site.xml里配置的hadoop.tmp.dir的路径-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop/tmp/datanode</value>
</property>
<!--配置block副本数量-->
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<!--设置hdfs的操作权限,false表示任何用户都可以在hdfs上操作文件,生产环境不配置此项,默认为true-->
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
修改mapred-site.xml
vim mapred-site.xml
<configuration>
<property>
<!--指定mapreduce运行在yarn上-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>
/usr/local/hadoop/etc/hadoop,
/usr/local/hadoop/share/hadoop/common/*,
/usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/hdfs/*,
/usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoop/mapreduce/*,
/usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop/share/hadoop/yarn/*,
/usr/local/hadoop/share/hadoop/yarn/lib/*
</value>
</property>
</configuration>
修改yarn-site.xml
vim yarn-site.xml
<configuration>
<!-- 开启YARN HA -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定两个resourcemanager的名称 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置rm1,rm2的主机 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>spark01</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>spark03</value>
</property>
<!--开启yarn恢复机制-->
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<!--执行rm恢复机制实现类-->
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 配置zookeeper的地址 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>spark01:2181,spark02:2181,spark03:2181</value>
<description>For multiple zk services, separate them with comma</description>
</property>
<!-- 指定YARN HA的名称 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn-ha</value>
</property>
<property>
<!--指定yarn的老大 resoucemanager的地址-->
<name>yarn.resourcemanager.hostname</name>
<value>spark03</value>
</property>
<property>
<!--NodeManager获取数据的方式-->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改slaves文件(3.0以后为workers文件)
spark04
spark05
spark06
配置环境变量
vim /etc/profile
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
创建配置文件中定义的文件夹
mkdir -p /usr/local/hadoop/tmp
mkdir -p /usr/local/hadoop/journaldata
mkdir -p /usr/local/hadoop/tmp/namenode
mkdir -p /usr/local/hadoop/tmp/datanode
拷贝文件至其他五台服务器上,
然后执行如下命令启动集群
<!--首先启动zookeeper集群,在三台服务器上分别执行-->
zkServer.sh start
<!--在leader服务器上执行如下命令,在zookeeper集群上生成ha节点-->
hdfs zkfc -formatZK
<!在spark04、spark05、spark06任意一台服务器上执行如下命令,启动journalnode集群-->
hadoop-daemons.sh start journalnode
<!--在spark01上格式化名称节点,并启动-->
hadoop namenode -format
hadoop-daemon.sh start namenode
<!--在spark02上首先把namenode变为standby namenode,再启动-->
hdfs namenode -bootstrapStandby
hadoop-daemon.sh start namenode
<!--在spark04、spark05、spark06上分别启动datanode-->
hadoop-daemon.sh start datanode
<!--启动zookeeper失败恢复线程,这个线程需要在名称节点上启动即spark01、spark02-->
hadoop-daemon.sh start zkfc
<!--在spark01上启动主resourcemanager-->
start-yarn.sh
<!--在spark03上启动备resourcemanager-->
yarn-daemon.sh start resourcemanager
六、spark集群搭建
解压spark至用户目录(/usr/local/spark)
进入/usr/local/spark/conf
修改spark-env.sh
export JAVA_HOME=/usr/local/spark
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_HOME=/usr/local/spark
修改slaves文件
spark04
spark05
spark06
复制$HADOOP_CONF_DIR目录下hdfs-site.xml文件和core-site.xml文件至$SPARK_HOME/conf目录下
spark集群搭建完成
执行spark-shell --mater yarn验证
七、重启集群
在spark01,spark02,spark03上分别执行zkServer.sh start 启动zk集群
在spark01上执行start-all.sh启动集群