关注身边无家可归的流浪猫咪
经过半个多月的努力,小花狸监控v1.0终于完成了.
写这个工具,起源于年初 三集中 的一个想法.集中复制,集中备份,集中监控.
最开始的时候,MySQL实例很少,有Slave复制,本地备份,服务器间互为异地备份,本地还有一个Shell写的监控脚本,一旦出现问题可以发送邮件.
后来,服务器慢慢多了起来.现在已经有将近10个MySQL实例,2套Mongodb副本集群,20+的redis实例.
这些不同的实例,散布在不同的机房,不同的业务使用,情况慢慢变得复杂起来.
我已经在第一时间想不起来某个实例的异地备份在哪里,或者30001端口的redis是干什么用的,是否是集群.
更要命的是,服务器间异地备份使用的是scp推,也就是说,一旦服务器被黑客攻陷,异地备份并不会幸免,因为异地备份服务器的密码就写在备份的脚本里.
由于服务器间的关系已经越来越模糊,攻陷了A服务器,就可能攻陷B服务器,乃至C,D,E服务器。
每每想到这里,总是提心吊胆.经常感觉压力很大.
面对越来越复杂的服务器关系,必须推动集中化管理.
经过上述改造,每个服务器都有如下保护措施
1.复制
2.本机备份
3.异地备份
并且黑客攻陷任意一台服务器,都不能使其成为跳板,继续攻陷别的服务器.进而保证了生产服务器数据的安全.
但是被攻陷的那台数据库,数据没有任何手段恢复.
所以不能从生产服务器推送备份到集中备份服务器,而必须从集中备份服务器上拉数据。
但是集中备份服务器怎么知道生产服务器的备份已经完成呢?
另外,随着服务器的增加,每周巡检也要花费大量的时间.每次检查至少花费2个小时.并且一周一次的检查频率...
这种巡检感觉没有什么意义,主要是确认复制是否正常.
小花狸监控(集中监控)主要解决上述的问题.他提供的主要功能
1.监控主机(磁盘空间,CPU,内存)
2.MySQL(复制,线程,每秒压力)
3.Mongodb(副本集,内存映射情况,每秒压力)
4.Redis(单实例,集群,Sentinel,复制,内存使用情况,每秒压力)
5.抓取备份
小花狸监控主要由两部分组成,Agent和Server
Agent部署在需要监控的服务器上,定期获取数据,将其发送到Server端。
Server端部署在集中备份服务器上.
Agent和Server在编译的时候,设置了一个共同的 AES密钥
Agent发送的内容加盐后(当前时间),经过Base64编码,再由空格填充到指定的位数,再由AES加密,最后再次经过Base64编码,发送至Server端.
这样可以保证服务器内部的信息不被泄露.
生产服务器备份完成之后,由Agent发送生产服务器的IP,账号,密码,备份的位置到Server端.
Server端会将这些信息作为参数,传递到一个expect脚本,再由expect脚本调用rsync命令,将生产服务器的备份拉回来.
生产服务器的账号信息,不会存储在Server端,并且网络传输由AES加密.
这样就实现了异地备份,并且由拉取备份代替了推送备份.增加了数据的安全性.
小花狸监控由golang编写(遵守GPL协议)
源码如下,点击下载
golang.zip
backup.sh需要和Server命令在同一级目录.
备份的时候,在备份脚本的最后一行,嵌入
/home/mysqlbackup/Agent --type=backup --user=mysqlbackup --password=123456 --localAddr=192.168.1.70 --path=/data/dbbak/mysql/
采集使用crontab
5 * * * * /home/mysql/Agent --type=mysql --user=user --password=123456 --port=3306
5 * * * * /home/mongodb/Agent --type mongodb --user=admin --password 123456 --port 27017
5 * * * * /home/mongodb/Agent --type redis --password=123456 --port 30000
5 * * * * /home/mongodb/Agent --type redis --port 6379
Server端初始化SQL
DROP TABLE IF EXISTS `cat_backup_probe_history`;
CREATE TABLE `cat_backup_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL comment '主机IP',
`starttime` datetime DEFAULT NULL comment '备份开始时间',
`endtime` datetime DEFAULT NULL comment '备份结束时间',
`source` varchar(40) DEFAULT NULL comment '生产服务器路径',
`target` varchar(40) DEFAULT NULL comment '集中备份服务器路径',
`size` bigint(20) DEFAULT NULL comment '备份大小',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--登记服务器信息,需要手工填写服务器信息和相关阈值
--没有录入界面,作为DBA肯定是Insert into ...
DROP TABLE IF EXISTS `cat_host_probe`;
CREATE TABLE `cat_host_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`generator_room` varchar(10) DEFAULT NULL,
`cpu_target` float DEFAULT NULL,
`mem_target` int(11) DEFAULT NULL,
`space_target` int(11) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--Agent发送的监控信息,其中指定了三个挂载点的信息 / /data /dbdata
DROP TABLE IF EXISTS `cat_host_probe_history`;
CREATE TABLE `cat_host_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`cpu_idle` float DEFAULT NULL comment 'CPU空闲',
`mem_used` bigint(20) DEFAULT NULL comment '内存使用M',
`mem_idle` bigint(20) DEFAULT NULL comment '内存空闲M',
`root_idle_space` bigint(20) DEFAULT NULL comment '根分区空闲空间',
`root_used_space` bigint(20) DEFAULT NULL,
`data_idle_space` bigint(20) DEFAULT NULL comment '/data挂载点空闲空间',
`data_used_space` bigint(20) DEFAULT NULL,
`dbdata_idle_space` bigint(20) DEFAULT NULL comment '/dbdata挂载点空闲空间',
`dbdata_used_space` bigint(20) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mongodb_probe`;
CREATE TABLE `cat_mongodb_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`admin` varchar(20) DEFAULT NULL comment '登记该实例的使用者',
`msg` varchar(200) DEFAULT NULL comment '登记该实例所属的业务系统',
`status` int(11) NOT NULL DEFAULT '1',
`isbackup` int(11) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mongodb_probe_history`;
CREATE TABLE `cat_mongodb_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`member1_name` varchar(30) DEFAULT NULL,
`member1_health` varchar(10) DEFAULT NULL,
`member1_StateStr` varchar(20) DEFAULT NULL,
`member1_OptimeDate` datetime DEFAULT NULL,
`member2_name` varchar(30) DEFAULT NULL,
`member2_health` varchar(10) DEFAULT NULL,
`member2_StateStr` varchar(20) DEFAULT NULL,
`member2_OptimeDate` datetime DEFAULT NULL,
`member3_name` varchar(30) DEFAULT NULL,
`member3_health` varchar(10) DEFAULT NULL,
`member3_StateStr` varchar(20) DEFAULT NULL,
`member3_OptimeDate` datetime DEFAULT NULL,
`insert_per_second` int(11) DEFAULT NULL,
`query_per_second` int(11) DEFAULT NULL,
`update_per_second` int(11) DEFAULT NULL,
`delete_per_second` int(11) DEFAULT NULL,
`getmore_per_second` int(11) DEFAULT NULL,
`command_per_second` int(11) DEFAULT NULL,
`res` int(11) DEFAULT NULL,
`Virtual` int(11) DEFAULT NULL,
`Mapped` int(11) DEFAULT NULL,
`CurrentConnections` int(11) DEFAULT NULL,
`BytesIn_per_second` int(11) DEFAULT NULL,
`BytesOut_per_second` int(11) DEFAULT NULL,
`NumRequests_per_second` int(11) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mysql_probe`;
CREATE TABLE `cat_mysql_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`admin` varchar(20) DEFAULT NULL,
`msg` varchar(200) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
`isbackup` int(11) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mysql_probe_history`;
CREATE TABLE `cat_mysql_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`Bytes_received` int(11) DEFAULT NULL,
`Bytes_sent` int(11) DEFAULT NULL,
`Com_commit` int(11) DEFAULT NULL,
`Com_delete` int(11) DEFAULT NULL,
`Com_select` int(11) DEFAULT NULL,
`Com_rollback` int(11) DEFAULT NULL,
`Com_update` int(11) DEFAULT NULL,
`Com_insert` int(11) DEFAULT NULL,
`Com_execute_sql` int(11) DEFAULT NULL,
`Created_tmp_disk_tables` int(11) DEFAULT NULL,
`Slow_queries` int(11) DEFAULT NULL,
`Slave_running` varchar(10) DEFAULT NULL,
`Threads_connected` int(11) DEFAULT NULL,
`Threads_running` int(11) DEFAULT NULL,
`Master_Host` varchar(20) DEFAULT NULL,
`Master_Log_File` varchar(30) DEFAULT NULL,
`Read_Master_Log_Pos` bigint(20) DEFAULT NULL,
`Exec_Master_Log_Pos` bigint(20) DEFAULT NULL,
`Slave_IO_Running` varchar(10) DEFAULT NULL,
`Slave_SQL_Running` varchar(10) DEFAULT NULL,
`Seconds_Behind_Master` varchar(10) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_redis_probe`;
CREATE TABLE `cat_redis_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`redis_type` enum('SingleInstance','HAInstance','Sentinel') DEFAULT NULL,
`max_mem` int(11) DEFAULT NULL comment 'Redis配置的最大内存阈值',
`cpu_target` int(11) DEFAULT NULL comment 'Redis最大CPU使用阈值',
`admin` varchar(20) DEFAULT NULL,
`msg` varchar(200) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_redis_probe_history`;
CREATE TABLE `cat_redis_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`connected_clients` int(11) DEFAULT NULL,
`blocked_clients` int(11) DEFAULT NULL,
`used_memory` bigint(20) DEFAULT NULL,
`role` varchar(10) DEFAULT NULL,
`instantaneous_ops_per_sec` int(11) DEFAULT NULL,
`rejected_connections` int(11) DEFAULT NULL,
`cpu_ratio` float DEFAULT NULL,
`master_host` varchar(20) DEFAULT NULL,
`master_port` int(11) DEFAULT NULL,
`master_link_status` varchar(20) DEFAULT NULL,
`master_last_io_seconds_ago` int(11) DEFAULT NULL,
`connected_slaves` int(11) DEFAULT NULL,
`SentinelInfo` varchar(100) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
小花狸监控v1.0由于时间仓促,没有自带Web展现界面.
主要查询SQL如下,可以使用任意的报表工具展现
查询MySQL
select g.*,
case
when g.Slave_IO_Running='NO' then 'IO线程停止'
when g.Slave_SQL_Running='NO' then 'SQL线程停止'
when g.mysqlts when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts mysqlts,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value,
a.*,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg,e.port p,e.host h
from
cat_mysql_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_mysql_probe_history a
inner join
(select host,port,max(id) id from cat_mysql_probe_history group by host,port) b
on(a.host=b.host and a.port=b.port and a.id=b.id)
) a on (e.host=a.host and e.port=a.port)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d
on (c.host=d.host and c.id=d.id)
)c on(c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,a.host,a.port
) g
查询MongoDB
select g.*,
case
when g.member1_health!='1' then 'member1异常'
when g.member2_health!='1' then 'member2异常'
when g.member3_health!='1' then 'member3异常'
when g.mongodbts when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts mongodbts,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value
, e.host
, e.port
, a.member1_name
, a.member1_health
, a.member1_StateStr
, a.member1_OptimeDate
, a.member2_name
, a.member2_health
, a.member2_StateStr
, a.member2_OptimeDate
, a.member3_name
, a.member3_health
, a.member3_StateStr
, a.member3_OptimeDate
, a.insert_per_second
, a.query_per_second
, a.update_per_second
, a.delete_per_second
, a.getmore_per_second
, a.command_per_second
, concat(round(a.res/1024),'G') res
, concat(round(a.Virtual/1024),'G') Virtual
, concat(round(a.Mapped/1024),'G') Mapped
, a.CurrentConnections
, a.BytesIn_per_second
, a.BytesOut_per_second
, a.NumRequests_per_second
, a.ts
,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg
from
cat_mongodb_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_mongodb_probe_history a
inner join (select host,port,max(id) id from cat_mongodb_probe_history group by host,port) b on(a.host=b.host and a.port=b.port and a.id=b.id)
)a on(a.port=e.port and a.host=e.host)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d on (c.host=d.host and c.id=d.id)
) c on (c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,a.host,a.port
) g
查询Redis
select g.*,
case
when g.redists when g.redis_type='HAInstance' and g.role='master' and g.connected_slaves!='2' then '复制异常'
when g.used_memory_value>redis_max_mem then 'redis内存超过阈值'
when g.redis_cpu_ratio>redis_max_cpu then 'redisCPU超过阈值'
when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts redists,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value,
e.host
, e.port
, a.connected_clients
, a.blocked_clients
, concat(round(a.used_memory/1024/1024,0),'M') used_memory
,round(a.used_memory/1024/1024,0) used_memory_value
, a.role
, a.instantaneous_ops_per_sec
, a.rejected_connections
, a.cpu_ratio redis_cpu_ratio
, a.master_host
, a.master_port
, a.master_link_status
, a.master_last_io_seconds_ago
, a.connected_slaves
, a.SentinelInfo
, a.ts
,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg,e.redis_type,e.max_mem redis_max_mem,e.cpu_target redis_max_cpu
from
cat_redis_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_redis_probe_history a
inner join (select host,port,max(id) id from cat_redis_probe_history group by host,port) b
on(a.host=b.host and a.port=b.port and a.id=b.id)
) a on (a.host=e.host and a.port=e.port)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d
on (c.host=d.host and c.id=d.id)
) c on (c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,redis_type,a.host,a.port
) g order by g.generator_room,g.redis_type,g.host,g.port;
使用报表之后的相关效果
清明放假三天,加了两天班,最后还熬了一宿夜,终于发布了.
我们家的小花狸,小乖.他的微笑提供给我源源不断的努力动力.
写这个工具,起源于年初 三集中 的一个想法.集中复制,集中备份,集中监控.
最开始的时候,MySQL实例很少,有Slave复制,本地备份,服务器间互为异地备份,本地还有一个Shell写的监控脚本,一旦出现问题可以发送邮件.
后来,服务器慢慢多了起来.现在已经有将近10个MySQL实例,2套Mongodb副本集群,20+的redis实例.
这些不同的实例,散布在不同的机房,不同的业务使用,情况慢慢变得复杂起来.
我已经在第一时间想不起来某个实例的异地备份在哪里,或者30001端口的redis是干什么用的,是否是集群.
更要命的是,服务器间异地备份使用的是scp推,也就是说,一旦服务器被黑客攻陷,异地备份并不会幸免,因为异地备份服务器的密码就写在备份的脚本里.
由于服务器间的关系已经越来越模糊,攻陷了A服务器,就可能攻陷B服务器,乃至C,D,E服务器。
每每想到这里,总是提心吊胆.经常感觉压力很大.
面对越来越复杂的服务器关系,必须推动集中化管理.
经过上述改造,每个服务器都有如下保护措施
1.复制
2.本机备份
3.异地备份
并且黑客攻陷任意一台服务器,都不能使其成为跳板,继续攻陷别的服务器.进而保证了生产服务器数据的安全.
但是被攻陷的那台数据库,数据没有任何手段恢复.
所以不能从生产服务器推送备份到集中备份服务器,而必须从集中备份服务器上拉数据。
但是集中备份服务器怎么知道生产服务器的备份已经完成呢?
另外,随着服务器的增加,每周巡检也要花费大量的时间.每次检查至少花费2个小时.并且一周一次的检查频率...
这种巡检感觉没有什么意义,主要是确认复制是否正常.
小花狸监控(集中监控)主要解决上述的问题.他提供的主要功能
1.监控主机(磁盘空间,CPU,内存)
2.MySQL(复制,线程,每秒压力)
3.Mongodb(副本集,内存映射情况,每秒压力)
4.Redis(单实例,集群,Sentinel,复制,内存使用情况,每秒压力)
5.抓取备份
小花狸监控主要由两部分组成,Agent和Server
Agent部署在需要监控的服务器上,定期获取数据,将其发送到Server端。
Server端部署在集中备份服务器上.
Agent和Server在编译的时候,设置了一个共同的 AES密钥
Agent发送的内容加盐后(当前时间),经过Base64编码,再由空格填充到指定的位数,再由AES加密,最后再次经过Base64编码,发送至Server端.
这样可以保证服务器内部的信息不被泄露.
生产服务器备份完成之后,由Agent发送生产服务器的IP,账号,密码,备份的位置到Server端.
Server端会将这些信息作为参数,传递到一个expect脚本,再由expect脚本调用rsync命令,将生产服务器的备份拉回来.
生产服务器的账号信息,不会存储在Server端,并且网络传输由AES加密.
这样就实现了异地备份,并且由拉取备份代替了推送备份.增加了数据的安全性.
小花狸监控由golang编写(遵守GPL协议)
源码如下,点击下载
golang.zip
backup.sh需要和Server命令在同一级目录.
备份的时候,在备份脚本的最后一行,嵌入
/home/mysqlbackup/Agent --type=backup --user=mysqlbackup --password=123456 --localAddr=192.168.1.70 --path=/data/dbbak/mysql/
采集使用crontab
5 * * * * /home/mysql/Agent --type=mysql --user=user --password=123456 --port=3306
5 * * * * /home/mongodb/Agent --type mongodb --user=admin --password 123456 --port 27017
5 * * * * /home/mongodb/Agent --type redis --password=123456 --port 30000
5 * * * * /home/mongodb/Agent --type redis --port 6379
Server端初始化SQL
DROP TABLE IF EXISTS `cat_backup_probe_history`;
CREATE TABLE `cat_backup_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL comment '主机IP',
`starttime` datetime DEFAULT NULL comment '备份开始时间',
`endtime` datetime DEFAULT NULL comment '备份结束时间',
`source` varchar(40) DEFAULT NULL comment '生产服务器路径',
`target` varchar(40) DEFAULT NULL comment '集中备份服务器路径',
`size` bigint(20) DEFAULT NULL comment '备份大小',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--登记服务器信息,需要手工填写服务器信息和相关阈值
--没有录入界面,作为DBA肯定是Insert into ...
DROP TABLE IF EXISTS `cat_host_probe`;
CREATE TABLE `cat_host_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`generator_room` varchar(10) DEFAULT NULL,
`cpu_target` float DEFAULT NULL,
`mem_target` int(11) DEFAULT NULL,
`space_target` int(11) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
--Agent发送的监控信息,其中指定了三个挂载点的信息 / /data /dbdata
DROP TABLE IF EXISTS `cat_host_probe_history`;
CREATE TABLE `cat_host_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`cpu_idle` float DEFAULT NULL comment 'CPU空闲',
`mem_used` bigint(20) DEFAULT NULL comment '内存使用M',
`mem_idle` bigint(20) DEFAULT NULL comment '内存空闲M',
`root_idle_space` bigint(20) DEFAULT NULL comment '根分区空闲空间',
`root_used_space` bigint(20) DEFAULT NULL,
`data_idle_space` bigint(20) DEFAULT NULL comment '/data挂载点空闲空间',
`data_used_space` bigint(20) DEFAULT NULL,
`dbdata_idle_space` bigint(20) DEFAULT NULL comment '/dbdata挂载点空闲空间',
`dbdata_used_space` bigint(20) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mongodb_probe`;
CREATE TABLE `cat_mongodb_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`admin` varchar(20) DEFAULT NULL comment '登记该实例的使用者',
`msg` varchar(200) DEFAULT NULL comment '登记该实例所属的业务系统',
`status` int(11) NOT NULL DEFAULT '1',
`isbackup` int(11) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mongodb_probe_history`;
CREATE TABLE `cat_mongodb_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`member1_name` varchar(30) DEFAULT NULL,
`member1_health` varchar(10) DEFAULT NULL,
`member1_StateStr` varchar(20) DEFAULT NULL,
`member1_OptimeDate` datetime DEFAULT NULL,
`member2_name` varchar(30) DEFAULT NULL,
`member2_health` varchar(10) DEFAULT NULL,
`member2_StateStr` varchar(20) DEFAULT NULL,
`member2_OptimeDate` datetime DEFAULT NULL,
`member3_name` varchar(30) DEFAULT NULL,
`member3_health` varchar(10) DEFAULT NULL,
`member3_StateStr` varchar(20) DEFAULT NULL,
`member3_OptimeDate` datetime DEFAULT NULL,
`insert_per_second` int(11) DEFAULT NULL,
`query_per_second` int(11) DEFAULT NULL,
`update_per_second` int(11) DEFAULT NULL,
`delete_per_second` int(11) DEFAULT NULL,
`getmore_per_second` int(11) DEFAULT NULL,
`command_per_second` int(11) DEFAULT NULL,
`res` int(11) DEFAULT NULL,
`Virtual` int(11) DEFAULT NULL,
`Mapped` int(11) DEFAULT NULL,
`CurrentConnections` int(11) DEFAULT NULL,
`BytesIn_per_second` int(11) DEFAULT NULL,
`BytesOut_per_second` int(11) DEFAULT NULL,
`NumRequests_per_second` int(11) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mysql_probe`;
CREATE TABLE `cat_mysql_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`admin` varchar(20) DEFAULT NULL,
`msg` varchar(200) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
`isbackup` int(11) DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_mysql_probe_history`;
CREATE TABLE `cat_mysql_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`Bytes_received` int(11) DEFAULT NULL,
`Bytes_sent` int(11) DEFAULT NULL,
`Com_commit` int(11) DEFAULT NULL,
`Com_delete` int(11) DEFAULT NULL,
`Com_select` int(11) DEFAULT NULL,
`Com_rollback` int(11) DEFAULT NULL,
`Com_update` int(11) DEFAULT NULL,
`Com_insert` int(11) DEFAULT NULL,
`Com_execute_sql` int(11) DEFAULT NULL,
`Created_tmp_disk_tables` int(11) DEFAULT NULL,
`Slow_queries` int(11) DEFAULT NULL,
`Slave_running` varchar(10) DEFAULT NULL,
`Threads_connected` int(11) DEFAULT NULL,
`Threads_running` int(11) DEFAULT NULL,
`Master_Host` varchar(20) DEFAULT NULL,
`Master_Log_File` varchar(30) DEFAULT NULL,
`Read_Master_Log_Pos` bigint(20) DEFAULT NULL,
`Exec_Master_Log_Pos` bigint(20) DEFAULT NULL,
`Slave_IO_Running` varchar(10) DEFAULT NULL,
`Slave_SQL_Running` varchar(10) DEFAULT NULL,
`Seconds_Behind_Master` varchar(10) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_redis_probe`;
CREATE TABLE `cat_redis_probe` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`redis_type` enum('SingleInstance','HAInstance','Sentinel') DEFAULT NULL,
`max_mem` int(11) DEFAULT NULL comment 'Redis配置的最大内存阈值',
`cpu_target` int(11) DEFAULT NULL comment 'Redis最大CPU使用阈值',
`admin` varchar(20) DEFAULT NULL,
`msg` varchar(200) DEFAULT NULL,
`status` int(11) NOT NULL DEFAULT '1',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
DROP TABLE IF EXISTS `cat_redis_probe_history`;
CREATE TABLE `cat_redis_probe_history` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`host` varchar(20) NOT NULL,
`port` int(11) NOT NULL,
`connected_clients` int(11) DEFAULT NULL,
`blocked_clients` int(11) DEFAULT NULL,
`used_memory` bigint(20) DEFAULT NULL,
`role` varchar(10) DEFAULT NULL,
`instantaneous_ops_per_sec` int(11) DEFAULT NULL,
`rejected_connections` int(11) DEFAULT NULL,
`cpu_ratio` float DEFAULT NULL,
`master_host` varchar(20) DEFAULT NULL,
`master_port` int(11) DEFAULT NULL,
`master_link_status` varchar(20) DEFAULT NULL,
`master_last_io_seconds_ago` int(11) DEFAULT NULL,
`connected_slaves` int(11) DEFAULT NULL,
`SentinelInfo` varchar(100) DEFAULT NULL,
`ts` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
主要查询SQL如下,可以使用任意的报表工具展现
查询MySQL
select g.*,
case
when g.Slave_IO_Running='NO' then 'IO线程停止'
when g.Slave_SQL_Running='NO' then 'SQL线程停止'
when g.mysqlts when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts mysqlts,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value,
a.*,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg,e.port p,e.host h
from
cat_mysql_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_mysql_probe_history a
inner join
(select host,port,max(id) id from cat_mysql_probe_history group by host,port) b
on(a.host=b.host and a.port=b.port and a.id=b.id)
) a on (e.host=a.host and e.port=a.port)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d
on (c.host=d.host and c.id=d.id)
)c on(c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,a.host,a.port
) g
select g.*,
case
when g.member1_health!='1' then 'member1异常'
when g.member2_health!='1' then 'member2异常'
when g.member3_health!='1' then 'member3异常'
when g.mongodbts when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts mongodbts,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value
, e.host
, e.port
, a.member1_name
, a.member1_health
, a.member1_StateStr
, a.member1_OptimeDate
, a.member2_name
, a.member2_health
, a.member2_StateStr
, a.member2_OptimeDate
, a.member3_name
, a.member3_health
, a.member3_StateStr
, a.member3_OptimeDate
, a.insert_per_second
, a.query_per_second
, a.update_per_second
, a.delete_per_second
, a.getmore_per_second
, a.command_per_second
, concat(round(a.res/1024),'G') res
, concat(round(a.Virtual/1024),'G') Virtual
, concat(round(a.Mapped/1024),'G') Mapped
, a.CurrentConnections
, a.BytesIn_per_second
, a.BytesOut_per_second
, a.NumRequests_per_second
, a.ts
,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg
from
cat_mongodb_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_mongodb_probe_history a
inner join (select host,port,max(id) id from cat_mongodb_probe_history group by host,port) b on(a.host=b.host and a.port=b.port and a.id=b.id)
)a on(a.port=e.port and a.host=e.host)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d on (c.host=d.host and c.id=d.id)
) c on (c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,a.host,a.port
) g
查询Redis
select g.*,
case
when g.redists when g.redis_type='HAInstance' and g.role='master' and g.connected_slaves!='2' then '复制异常'
when g.used_memory_value>redis_max_mem then 'redis内存超过阈值'
when g.redis_cpu_ratio>redis_max_cpu then 'redisCPU超过阈值'
when g.hostts when g.mem_ratio_value>g.mem_target then '内存超过阈值'
when g.cpuratio>g.cpu_target then 'cpu超过阈值'
when g.rootratio_value>g.space_target then 'root空间超过阈值'
when g.dataratio_value>g.space_target then 'data空间超过阈值'
when g.dbdataratio_value>g.space_target then 'dbdata空间超过阈值'
when g.hostts is null then '采集超时'
else
'正常'
end info
from
(
select
round(100-cpu_idle,2) cpuratio,a.ts redists,c.ts hostts,
concat(round(mem_used/1024),'G') mem_used,
concat(round(mem_idle/1024),'G') mem_idle,
concat(round(mem_used/(mem_used+mem_idle),2)*100,'%') mem_ratio,
round(mem_used/(mem_used+mem_idle),2)*100 mem_ratio_value,
concat(round(root_used_space/1024/1024/1024,2),'G') root使用,
concat(round(root_idle_space/1024/1024/1024,2),'G') root空闲,
concat(round(root_used_space/(root_used_space+root_idle_space),2)*100,'%') rootratio,
round(root_used_space/(root_used_space+root_idle_space),2)*100 rootratio_value,
concat(round(data_used_space/1024/1024/1024,2),'G') data使用,
concat(round(data_idle_space/1024/1024/1024,2),'G') data空闲,
concat(round(data_used_space/(data_used_space+data_idle_space),2)*100,'%') dataratio,
round(data_used_space/(data_used_space+data_idle_space),2)*100 dataratio_value,
concat(round(dbdata_used_space/1024/1024/1024,2),'G') dbdata使用,
concat(round(dbdata_idle_space/1024/1024/1024,2),'G') dbdata空闲,
concat(round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100,'%') dbdataratio,
round(dbdata_used_space/(dbdata_used_space+dbdata_idle_space),2)*100 dbdataratio_value,
e.host
, e.port
, a.connected_clients
, a.blocked_clients
, concat(round(a.used_memory/1024/1024,0),'M') used_memory
,round(a.used_memory/1024/1024,0) used_memory_value
, a.role
, a.instantaneous_ops_per_sec
, a.rejected_connections
, a.cpu_ratio redis_cpu_ratio
, a.master_host
, a.master_port
, a.master_link_status
, a.master_last_io_seconds_ago
, a.connected_slaves
, a.SentinelInfo
, a.ts
,f.generator_room,f.cpu_target,f.mem_target,f.space_target,
e.admin,e.msg,e.redis_type,e.max_mem redis_max_mem,e.cpu_target redis_max_cpu
from
cat_redis_probe e
left join cat_host_probe f on (e.host=f.host)
left join
(
select a.* from cat_redis_probe_history a
inner join (select host,port,max(id) id from cat_redis_probe_history group by host,port) b
on(a.host=b.host and a.port=b.port and a.id=b.id)
) a on (a.host=e.host and a.port=e.port)
left join
(
select c.* from cat_host_probe_history c
inner join (select host,max(id) id from cat_host_probe_history group by host) d
on (c.host=d.host and c.id=d.id)
) c on (c.host=e.host)
where e.status=1 and f.status=1
order by f.generator_room,redis_type,a.host,a.port
) g order by g.generator_room,g.redis_type,g.host,g.port;
使用报表之后的相关效果
清明放假三天,加了两天班,最后还熬了一宿夜,终于发布了.
我们家的小花狸,小乖.他的微笑提供给我源源不断的努力动力.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/29254281/viewspace-1505354/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/29254281/viewspace-1505354/