定时检查spark streaming job 运行状态保存到mysql中
1.python3保存数据到mysql
vi rlt_log_job_dinc.py
import pymysql
import logging
import pandas as pd
import time
import sys
db_name = '*'
db_user = '*'
db_pass = '*'
db_ip = 'ip'
db_port = 3306
def writeDb(sql,db_data=()):
try:
conn = pymysql.connect(db=db_name,user=db_user,passwd=db_pass,host=db_ip,port=int(db_port),charset="utf8")
cursor = conn.cursor()
except Exception as e:
print(e)
logging.error('error:%s' % e)
return False
try:
cursor.execute(sql, db_data)
conn.commit()
except Exception as e:
conn.rollback()
logging.error('error:%s' % e)
return False
finally:
cursor.close()
conn.close()
return True
jobName=sys.argv[1]
jobId=sys.argv[2]
updateTime=time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time()))
sql = """ replace INTO rlt_log_job_dinc(job_name,job_id,bz,update_time) VALUES(%s,%s,'sparkstreaming-job',%s) """
data = (jobName, jobId, updateTime)
result = writeDb(sql, data)
配置自己的数据库测试运行
python3 rlt_log_job_dinc.py test 123
2.编写监控spark streaming job工作的shell脚本
vi monitor.sh
#!/bin/bash
echo $(date +%F%n%T)
for jobName in jobNames
do
counter=5
echo ${jobName}
while [[ ${counter} -ne 0 ]]
do
jobId=`ps -ef | grep ${jobName} | grep -v grep | awk '{print $2}'`
echo $jobId
if [ "$jobId" = "" ]; then
echo "spark job ${jobName} will restart!!!"
echo `sh /root/sparkstreaming_jar/monitor/run_job/${jobName}.sh`
sleep 2m
else
echo "spark job ${jobName} is running!"
sh "/root/sparkstreaming_jar/monitor/rlt_log_job_dinc.sh" ${jobName} ${jobId}
break
fi
sleep 1m
counter=$(( $counter - 1 ))
jobId=`ps -ef | grep ${jobName} | grep -v grep | awk '{print $2}'`
done
if [ ${counter} -eq 0 ];then
echo "online ${jobName} restart failed!!!"
message="Error!online ${jobName} restart failed!!! date:$run_date"
sh "/root/sparkstreaming_jar/monitor/rlt_log_job_dinc.sh" ${jobName}
exit 1
fi
done
3.第二步骤中rlt_log_job_dinc.sh
使用shell执行python脚本
参考如下
vi rlt_log_job_dinc.sh
#!/bin/bash
jobName=$1
jobId=$2
python3 /root/sparkstreaming_jar/monitor/rlt_log_job_dinc.py $jobName $jobId
4.第二步骤中run_job/${jobName}.sh为spark streaming job工作的重启脚本
根据自己情况配置
5.运行
sh monitor.sh
6.配置定时执行
vi /etc/crontab
*/10 * * * * root cd /root/sparkstreaming_jar/monitor; sh monitor.sh > /root/sparkstreaming_jar/monitor/monitor.log 2>&1
重启crontab
systemctl restart crond.service
7.kettle配置mysql表格数据监控邮件通知