本文转自:http://www.ttlsa.com/mongodb/nagios-check_mongodb-plugin-to-monitor-mongodb/
当在生产环境下使用某种服务时,相应的监控措施也应当完善起来,来检测服务是否正常和获取相关信息是很有必要的。
下面来说说使用nagios-plugin-mongodb来监控mongodb数据库。https://github.com/mzupan/nagios-plugin-mongodb
1. 下载check_mongodb nagios插件
1
2
3
4
5
|
# cd /usr/local/nagios/libexec/
# wget --no-check-certificate https://github.com/mzupan/nagios-plugin-mongodb/archive/master.zip
# unzip master
# mv nagios-plugin-mongodb-master nagios-plugin-mongodb
# chown -R nagios.nagios nagios-plugin-mongodb/
|
2. 安装Mongo Python驱动
需要先安装EPEL源。参见《CentOS / RHCE 可供使用的yum》。
1
|
# yum install pymongo.x86_64
|
或者自己下载源码包编译。
1
2
3
4
|
# wget --no-check-certificate https://github.com/mongodb/mongo-python-driver/archive/master.zip
# unzip mongo-python-driver-master.zip
# cd mongo-python-driver-master
# python setup.py install
|
或通过python easy_install来安装。
1
|
# easy_install pymongo
|
3. check_mongodb.py 说明
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
|
# ./check_mongodb.py --help
usage
:
check_mongodb
.
py
[
options
]
This
Nagios
plugin
checks
the
health
of
mongodb
.
options
:
-
h
,
--
help
show
this
help
message
and
exit
-
H
HOST
,
--
host
=
HOST
The
hostname
you
want
to
connect
to
-
P
PORT
,
--
port
=
PORT
The
port
mongodb
is
runnung
on
-
u
USER
,
--
user
=
USER
The
username
you
want
to
login
as
-
p
PASSWD
,
--
pass
=
PASSWD
The
password
you
want
to
use
for
that
user
-
W
WARNING
,
--
warning
=
WARNING
The
warning
threshold
we
want
to
set
-
C
CRITICAL
,
--
critical
=
CRITICAL
The
critical
threshold
we
want
to
set
-
A
ACTION
,
--
action
=
ACTION
The
action
you
want
to
take
--
max
-
lag
Get
max
replication
lag
(
for
replication_lag
action
only
)
--
mapped
-
memory
Get
mapped
memory
instead
of
resident
(
if
resident
memory
can
not
be
read
)
-
D
,
--
perf
-
data
Enable
output
of
Nagios
performance
data
-
d
DATABASE
,
--
database
=
DATABASE
Specify
the
database
to
check
--
all
-
databases
Check
all
databases
(
action
database_size
)
-
s
,
--
ssl
Connect
using
SSL
-
r
,
--
replicaset
Connect
to
replicaset
-
q
QUERY_TYPE
,
--
querytype
=
QUERY_TYPE
The
query
type
to
check
[
query
|
insert
|
update
|
delete
|
getmore
|
command
]
from
queries_per_second
-
c
COLLECTION
,
--
collection
=
COLLECTION
Specify
the
collection
to
check
-
T
SAMPLE_TIME
,
--
time
=
SAMPLE_TIME
Time
used
to
sample
number
of
pages
faults
|
Nagios MongoDB监控插件的所有动作:
通过参数-A来传递下列任一动作。这些动作有:'connect', 'connections', 'replication_lag', 'replication_lag_percent', 'replset_state', 'memory', 'memory_mapped', 'lock', 'flushing', 'last_flush_time', 'index_miss_ratio', 'databases', 'collections', 'database_size', 'database_indexes', 'collection_indexes', 'queues', 'oplog', 'journal_commits_in_wl', 'write_data_files', 'journaled', 'opcounters', 'current_lock', 'replica_primary', 'page_faults', 'asserts', 'queries_per_second', 'page_faults', 'chunks_balance', 'connect_primary', 'collection_state', 'row_count'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
connect–默认动作.检查连接
connections–检查打开的数据库连接的百分比
memory–检测内存使用量
memory_mapped–检查映射内存的使用情况
lock–检查锁定时间的百分比
flushing–检查平均flush时间(以微秒)
last_flush_time–检查上次刷新时间(以微秒)
index_miss_ratio–检查索引命中失败率
databases–检查数据库的总数
collections–检查集合的总数
database_size–检查特定数据库的大小
database_indexes–检查特定数据库的索引大小
collection_indexes–检查一个集合的索引大小
replication_lag–检查复制延迟(以秒为单位)
replication_lag_percent–检查复制延迟(以百分比表示)
replset_state–检查副本集的状态
replica_primary–检查副本集的主服务器
queries_per_second–检查每秒查询量
connect_primary–检查连接在一组中的主服务器
collection_state–检查数据库中特定集合的状态
|
4. 定义nagios command
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
# vim /usr/local/nagios/etc/objects/commands.cfg
define command {
command_name check_mongodb
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$
}
define command {
command_name check_mongodb_database
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -d $ARG8$
}
define command {
command_name check_mongodb_collection
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -d $ARG8$ -c $ARG9$
}
define command {
command_name check_mongodb_replicaset
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -r $ARG8$
}
define command {
command_name check_mongodb_query
command_line $USER1$/nagios-plugin-mongodb/check_mongodb.py -H $ARG1$ -P $ARG2$ -u $ARG3$ -p $ARG4$ -A $ARG5$ -W $ARG6$ -C $ARG7$ -q $ARG8$
}
|
5. 创建监控项
5.1 Check Connection 需要监控集群中每台mongodb实例。
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoConnectCheck
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect!2!4
}
|
5.2 Check Percentage of Open Connections 检查空闲连接率
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Free Connections
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connections!70!80
}
|
5.3 Check Replication Lag 检测复制延迟
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoReplicationLag
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag!15!30
}
|
5.4 Check Replication Lag Percentage 检查复制滞后百分比。如果检查达到100%的话就需要完全重新同步。
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Replication Lag Percentage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replication_lag_percent!50!75
}
|
5.5 Check Memory Usage 检查内存使用情况
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoMemoryUsage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory!20!28
}
|
5.6 Check Mapped Memory Usage 检查mongodb映射内存使用情况
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Mapped Memory Usage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!memory_mapped!20!28
}
|
5.7 Check Lock Time Percentage 检查锁定时间百分比。如果有锁定时间通常意味着数据库已经超载。
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoLockPercentage
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!lock!5!10
}
|
5.8 Check Average Flush Time 检查平均刷新时间。如果平均刷新时间高就意味着数据库存在大量写。
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Flush Average
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!flushing!100!200
}
|
5.9 Check Last Flush Time 检查最后刷新时间。如果最后刷新时间高就意味着服务器可能存在IO压力,需要更换更快的磁盘。
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoLastFlushTime
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!last_flush_time!200!400
}
|
5.10 Check status of mongodb replicaset 检查的MongoDB replicaset状态
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB state
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replset_state!0!0
}
|
5.11 Check status of index miss ratio 检查索引命中失败率。如果该值高,需要考虑添加索引了。
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBIndexMissRatio
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!index_miss_ratio!.005!.01
}
|
5.12 Check number of databases and number of collections
1
2
3
4
5
6
7
8
9
10
11
12
13
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB Number of databases
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!databases!300!500
}
define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB Number of collections
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collections!300!500
}
|
5.13 Check size of a database 检查数据库的大小。跟踪数据增长率。
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBDatabasesizedb_ttlsa_posts
check_command check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_size!300!500!db_ttlsa_posts
}
|
5.14 Check index size of a database 检查数据库的索引大小
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB Database index size db_ttlsa_posts
check_command check_mongodb_database!10.1.11.155!27017!check_mongodb!www.ttlsa.com!database_indexes!50!100!db_ttlsa_posts
}
|
5.15 Check index size of a collection 检查一个集合的索引大小
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBDatabaseindexsizedb_ttlsa_posts
check_command check_mongodb_collection!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_indexes!50!100!db_ttlsa_posts!posts
}
|
5.16 Check the primary server of replicaset 检查replicaset的主服务器
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description MongoDB Replicaset Master Monitor: replset_ttlsa
check_command check_mongodb_replicaset!10.1.11.155!27017!check_mongodb!www.ttlsa.com!replica_primary!0!1!replset_ttlsa
}
|
5.17 Check the number of queries per second 检查每秒查询数量。这将检查服务器上每秒查询数量,类型有:query|insert|update|delete|getmore|command
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoDBUpdatesperSecond
check_command check_mongodb_query!10.1.11.155!27017!check_mongodb!www.ttlsa.com!queries_per_second!200!150!update
}
|
5.18 Check Primary Connection
1
2
3
4
5
6
|
define service {
use generic-service
hostgroup_name Mongo Servers
service_description Mongo Connect Check
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!connect_primary!2!4
}
|
5.19 Check Collection State 检测集合状态
1
2
3
4
5
6
|
defineservice{
use generic-service
hostgroup_name MongoServers
service_description MongoCollectionState
check_command check_mongodb!10.1.11.155!27017!check_mongodb!www.ttlsa.com!collection_state!db_ttlsa_posts!posts
}
|
转载请注明来自运维生存时间: http://www.ttlsa.com/html/4188.html