自动化监控spark进程并重启,实际生产中可能会出现服务进程挂掉的异常,自动监控与重启是非常重要。
1、监控主节点的shell脚本
#!/bin/bash
#
master=`ps -ef | grep Master | grep spark | awk '{print $2}'`
echo $master
if [ "$master" = "" ]; then
echo "Spark Master is restart!"
/opt/modules/spark/sbin/stop-master.sh
/opt/modules/spark/sbin/start-master.sh
else
echo "Spark Master is alive!"
fi
2、监控从节点的shell脚本
#!/bin/bash
#
slave=`ps -ef | grep Worker | grep spark | grep 7077 | awk '{print $2}'`
echo $slave
if [ "$slave" = "" ]; then
echo "Spark Worker is restart!"
/opt/modules/spark/sbin/stop-slave.sh
/opt/modules/spark/sbin/start-slave.sh spark://10.130.2.20:7077
else
echo "Spark Worker is alive!"
fi
3、加入crontab的定时
*/1 * * * * /opt/bin/monitorSparkSlave.sh
每分钟监控一次,如果服务进程不存在,则重启该服务。
注意重启slave节点时,需要指定master的ip