提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
hbase生产环境与hadoop集群共用资源,当hadoop资源cpu,带宽,内存占用较多,可能会导致regionserver挂掉,regionserver如果挂掉超过集群regionserver总个数的一半以上,那么集群就会出问题,所以迫切的需要regionserver挂掉自动恢复机制。本文即为使用python 与crontab 定时检查regionserver dead节点,并重启的例子。
一、实现原理
通过hbase的60010页面,查看hbase节点状态情况,检测到regionserver dead,自动通过ssh 隧道到regionserver的主机,执行regionserver启动命令,并发送到企业微信告警。
hbase:1.2.0-cdh5.7.0
python:3.7+
调度工具:dolphinsheduler 3.0,也可以用crontab
二、实现步骤
1、python脚本
新建Hbase.py文件,代码如下:
# -*- coding: utf-8 -*-
import requests
import time
import json
import os
def get_now_time():
"""
获取当前日期时间
:return:当前日期时间
"""
now = time.localtime()
now_time = time.strftime("%Y-%m-%d %H:%M:%S", now)
return now_time
# 获取html txt
def getHtml(url):
r = requests.get(url)
text = r.text
print(r.elapsed.microseconds)
# r.elapsed.total_seconds()
return text
if __name__ == '__main__':
for i in range(1, 2):
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36',
'cache-control': 'no-cache',
'content-type': 'application/json',
}
urls = [
'http://hbase-master001:60010/jmx?qry=Hadoop:service=HBase,name=Master,sub=Server']
for url in urls:
print(url)
txt = getHtml(url)
jsonObj = json.loads(str(txt))
deadRegion = jsonObj['beans'][0]['tag.deadRegionServers']
deads = deadRegion.split(';')
if len(deads) != 0:
jAlert = "**告警通知:[" + str(len(deads)) + "]** \n"
for dead in deads:
regionname = dead.split(',')[0]
if regionname != '00':
upload_time = get_now_time()
cmd = 'ssh -n -p 22 ' + regionname + ' ' + "'/home/hadoop/hbase/bin/hbase-daemon.sh start regionserver'"
jps = 'ssh -n -p 22 ' + regionname + ' ' + "'jps'"
print(cmd)
print(jps)
f = os.popen(cmd)
print(f.read())
f = os.popen(jps)
print(f.read())
jAlert = jAlert + ">问题名称:**<font color=\"warning\">" + "Hbase regionServer" + "</font>**\n" \
+ ">告警来源:<font color=\"comment\">" + "Hbase" + "</font>\n" \
+ ">告警时间:<font color=\"comment\">" + str(upload_time) + "</font>\n" \
+ ">问题详情:<font color=\"comment\">" + regionname + " is dead" + "</font>\n" \
+ ">问题描述:<font color=\"comment\">" + regionname + " 正在重启中" + "</font>\n" \
+ ">目前状态:<font color=\"comment\">" + "firing" + "</font>\n" \
+ ">告警级别:<font color=\"comment\">" + "critical" + "</font>\n"
jAlert = jAlert + "————————————————\n"
if len(deads) > 1:
data = {}
markdown = {}
markdown['content'] = jAlert
data['msgtype'] = 'markdown'
data['markdown'] = markdown
jdata = json.dumps(data, sort_keys=False, indent=4, separators=(',', ': '))
response = requests.post(
'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=******',
headers=headers, data=jdata)
print('response:' + response.text)
response.close()
print(txt)
2、代码解读
1、hbase master主机名称,改为自己的hbase master 主机名
2、企业微信机器人地址,需要修改为自己的webhook地址
3、ssh 端口号22,改为自己集群的隧道端口号
3、加入调度
3.1、crontab定义
sudo vi /etc/crontab
# For details see man 4 crontabs
# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed
*/5 * * * * hadoop /home/hadoop//job/alert/Hbase.sh
3.2、重启crontab:
centos 6
重启crond服务
sudo /etc/init.d/crond restart
Centos 7
sudo systemctl restart crond.service
总结
提示:这里对文章进行总结:
以上就是对hbase的regionserver 挂掉自动重启的方法。