参考文章:

http://heylinux.com/archives/3063.html

https://mmonit.com/wiki/Monit/ConfigurationExamples

https://mmonit.com/wiki/Monit/Gmail

monit它最大的特点是配置文件简单易读,同时支持进程和系统状态的监控,并灵活的提供了各种检测的方式,周期,并进行报警和响应(重启服务,执行命令等)

安装配置:

由于monit是属于epel源里的,所以你必须配置好epel源码,然后

yum -y install monit

直接上修改后的配置文件

[root@aliyun_test ~]# cat /etc/monit.conf

###############################################################################

## Monit control file

###############################################################################

##

## Comments begin with a '#' and extend through the end of the line. Keywords

## are case insensitive. All path's MUST BE FULLY QUALIFIED, starting with '/'.

##

## Below you will find examples of some frequently used statements. For 

## information about the control file and a complete list of statements and 

## options, please have a look in the Monit manual.

##

##

###############################################################################

## Global section

###############################################################################

##

## Start Monit in the background (run as a daemon):

#

set daemon  30              # check services at 30 seconds intervals

#   with start delay 240    # optional: delay the first check by 4-minutes (by 

#                           # default Monit check immediately after Monit start)

#

#

## Set syslog logging. If you want to log to a standalone log file instead,

## specify the full path to the log file

#

set logfile syslog

#

#

## Set the location of the Monit lock file which stores the process id of the

## running Monit instance. By default this file is stored in $HOME/.monit.pid

#

set pidfile /var/run/monit.pid

#

## Set the location of the Monit id file which stores the unique id for the

## Monit instance. The id is generated and stored on first Monit start. By 

## default the file is placed in $HOME/.monit.id.

#

set idfile /var/.monit.id

#

## Set the location of the Monit state file which saves monitoring states

## on each cycle. By default the file is placed in $HOME/.monit.state. If

## the state file is stored on a persistent filesystem, Monit will recover

## the monitoring state across reboots. If it is on temporary filesystem, the

## state will be lost on reboot which may be convenient in some situations.

#

set statefile /var/.monit.state

#

## Set the list of mail servers for alert delivery. Multiple servers may be 

## specified using a comma separator. If the first mail server fails, Monit 

# will use the second mail server in the list and so on. By default Monit uses 

# port 25 - it is possible to override this with the PORT option.

#

set mailserver localhost               # primary mailserver

#                backup.bar.baz port 10025,  # backup mailserver on port 10025

#                localhost                   # fallback relay

#

#

## By default Monit will drop alert events if no mail servers are available. 

## If you want to keep the alerts for later delivery retry, you can use the 

## EVENTQUEUE statement. The base directory where undelivered alerts will be 

## stored is specified by the BASEDIR option. You can limit the queue size 

## by using the SLOTS option (if omitted, the queue is limited by space

## available in the back end filesystem).

#

set eventqueue

     basedir /var/monit  # set the base directory where events will be stored

#     slots 100           # optionally limit the queue size

#

#

## Send status and events to M/Monit (for more informations about M/Monit 

## see http://mmonit.com/). By default Monit registers credentials with 

## M/Monit so M/Monit can smoothly communicate back to Monit and you don't

## have to register Monit credentials manually in M/Monit. It is possible to

## disable credential registration using the commented out option below. 

## Though, if safety is a concern we recommend instead using https when

## communicating with M/Monit and send credentials encrypted.

#

# set mmonit http://monit:monit@192.168.1.10:8080/collector

#     # and register without credentials     # Don't register credentials

#

#

## Monit by default uses the following format for alerts if the the mail-format

## statement is missing::

## --8<--

set mail-format {

      from: monit@$HOST

   subject: monit alert --  $EVENT $SERVICE

   message: $EVENT Service $SERVICE

                 Date:        $DATE

                 Action:      $ACTION

                 Host:        $HOST

                 Description: $DESCRIPTION


            Your faithful employee,

            Monit

 }

## --8<--

##

## You can override this message format or parts of it, such as subject

## or sender using the MAIL-FORMAT statement. Macros such as $DATE, etc.

## are expanded at runtime. For example, to override the sender, use:

#

# set mail-format { from: monit@foo.bar }

#

#

## You can set alert recipients whom will receive alerts if/when a 

## service defined in this file has errors. Alerts may be restricted on 

## events by using a filter as in the second example below.

#

set alert 13817419446@139.com                       

#

## Do not alert when Monit starts, stops or performs a user initiated action.

## This filter is recommended to avoid getting alerts for trivial cases.

#

# set alert your-name@your.domain not on { instance, action }

#

#

## Monit has an embedded HTTP interface which can be used to view status of 

## services monitored and manage services from a web interface. The HTTP 

## interface is also required if you want to issue Monit commands from the

## command line, such as 'monit status' or 'monit restart service' The reason

## for this is that the Monit client uses the HTTP interface to send these

## commands to a running Monit daemon. See the Monit Wiki if you want to 

## enable SSL for the HTTP interface. 

#

set httpd port 2812 and

    use address localhost  # only accept connection from localhost

    allow localhost        # allow localhost to connect to the server and

    allow admin:monit      # require user 'admin' with password 'monit'


###############################################################################

## Services

###############################################################################

##

## Check general system resources such as load average, cpu and memory

## usage. Each test specifies a resource, conditions and the action to be

## performed should a test fail.

#

#  check system $HOST

#    if loadavg (1min) > 4 then alert

#    if loadavg (5min) > 2 then alert

#    if cpu usage > 95% for 10 cycles then alert

#    if memory usage > 75% then alert

#    if swap usage > 25% then alert

#

#    

## Check if a file exists, checksum, permissions, uid and gid. In addition

## to alert recipients in the global section, customized alert can be sent to 

## additional recipients by specifying a local alert handler. The service may 

## be grouped using the GROUP option. More than one group can be specified by

## repeating the 'group name' statement.

#    

#  check file apache_bin with path /usr/local/apache/bin/httpd

#    if failed checksum and 

#       expect the sum 8f7f419955cefa0b33a2ba316cba3659 then unmonitor

#    if failed permission 755 then unmonitor

#    if failed uid root then unmonitor

#    if failed gid root then unmonitor

#    alert security@foo.bar on {

#           checksum, permission, uid, gid, unmonitor

#        } with the mail-format { subject: Alarm! }

#    group server

#

#    

## Check that a process is running, in this case Apache, and that it respond

## to HTTP and HTTPS requests. Check its resource usage such as cpu and memory,

## and number of children. If the process is not running, Monit will restart 

## it by default. In case the service is restarted very often and the 

## problem remains, it is possible to disable monitoring using the TIMEOUT

## statement. This service depends on another service (apache_bin) which

## is defined above.

#    

#  check process apache with pidfile /usr/local/apache/logs/httpd.pid

#    start program = "/etc/init.d/httpd start" with timeout 60 seconds

#    stop program  = "/etc/init.d/httpd stop"

#    if cpu > 60% for 2 cycles then alert

#    if cpu > 80% for 5 cycles then restart

#    if totalmem > 200.0 MB for 5 cycles then restart

#    if children > 250 then restart

#    if loadavg(5min) greater than 10 for 8 cycles then stop

#    if failed host www.tildeslash.com port 80 protocol http 

#       and request "/somefile.html"

#    then restart

#    if failed port 443 type tcpssl protocol http

#       with timeout 15 seconds

#    then restart

#    if 3 restarts within 5 cycles then unmonitor

#    depends on apache_bin

#    group server

#    

#    

## Check filesystem permissions, uid, gid, space and inode usage. Other services,

## such as databases, may depend on this resource and an automatically graceful

## stop may be cascaded to them before the filesystem will become full and data

## lost.

#

#  check filesystem datafs with path /dev/sdb1

#    start program  = "/bin/mount /data"

#    stop program  = "/bin/umount /data"

#    if failed permission 660 then unmonitor

#    if failed uid root then unmonitor

#    if failed gid disk then unmonitor

#    if space usage > 80% for 5 times within 15 cycles then alert

#    if space usage > 99% then stop

#    if inode usage > 30000 then alert

#    if inode usage > 99% then stop

#    group server

#

#

## Check a file's timestamp. In this example, we test if a file is older 

## than 15 minutes and assume something is wrong if its not updated. Also,

## if the file size exceed a given limit, execute a script

#

#  check file database with path /data/mydatabase.db

#    if failed permission 700 then alert

#    if failed uid data then alert

#    if failed gid data then alert

#    if timestamp > 15 minutes then alert

#    if size > 100 MB then exec "/my/cleanup/script" as uid dba and gid dba

#

#

## Check directory permission, uid and gid.  An event is triggered if the 

## directory does not belong to the user with uid 0 and gid 0.  In addition, 

## the permissions have to match the octal description of 755 (see chmod(1)).

#

#  check directory bin with path /bin

#    if failed permission 755 then unmonitor

#    if failed uid 0 then unmonitor

#    if failed gid 0 then unmonitor

#

#

## Check a remote host availability by issuing a ping test and check the 

## content of a response from a web server. Up to three pings are sent and 

## connection to a port and an application level network check is performed.

#

#  check host myserver with address 192.168.1.1

#    if failed ping then alert

#    if failed port 3306 protocol mysql with timeout 15 seconds then alert

#    if failed port 80 protocol http

#       and request /some/path with content = "a string"

#    then alert

#

#

## Check a network link status (up/down), link capacity changes, saturation

## and bandwidth usage.

#

#  check network public with interface eth0

#    if failed link then alert

#    if changed link then alert

#    if saturation > 90% then alert

#    if download > 10 MB/s then alert

#    if total upload > 1 GB in last hour then alert

#

#

## Check custom program status output.

#

#  check program myscript with path /usr/local/bin/myscript.sh

#    if status != 0 then alert

#

#

###############################################################################

## Includes

###############################################################################

##

## It is possible to include additional configuration parts from other files or

## directories.

#

#  include /etc/monit.d/*

#

# set daemon mode timeout to 1 minute

set daemon 60

# Include all files from /etc/monit.d/

include /etc/monit.d/*

注意:如果要实现邮箱报警,本地localhost的邮件服务器25端口要开启监听


接下来为mysql做监控:

vim /etc/monit.d/mysql

vim /etc/monit.d/mysql

check process mysql with pidfile /mydata/data/aliyun_test.pid

    start program = "/etc/init.d/mysqld start" with timeout 10 seconds

    stop program  = "/etc/init.d/mysqld stop"

    if failed port 3306 protocol mysql

       with timeout 10 seconds

    then restart

    if 3 restarts within 5 cycles then unmonitor

    group server

启动minit服务:

service monit start

[root@aliyun_test ~]# tail -f /var/log/monit 

[CST Aug 12 13:01:00] info     : Monit daemon with pid [20358] stopped

[CST Aug 12 13:01:00] info     : 'aliyun_test' Monit 5.14 stopped

[CST Aug 12 13:01:00] info     : Starting Monit 5.14 daemon with http interface at [localhost]:2812

[CST Aug 12 13:01:00] info     : Starting Monit HTTP server at [localhost]:2812

[CST Aug 12 13:01:00] info     : Monit HTTP server started

[CST Aug 12 13:01:00] info     : 'aliyun_test' Monit 5.14 started

手动模拟mysql进程挂掉:

service mysqld stop

观察日志:

[CST Aug 12 13:51:04] error    : 'mysql' process is not running

[CST Aug 12 13:51:04] info     : 'mysql' trying to restart

[CST Aug 12 13:51:04] info     : 'mysql' start: /etc/init.d/mysqld

[CST Aug 12 13:52:05] info     : 'mysql' process is running with pid 21246

看邮件截图:

wKioL1etZNKCeCUpAACU9ivRkc0825.jpg

上图为does not exist mysql表示出故障了,准备重启

wKiom1etZQqQCjbfAACNHHTdzro603.jpg

上图则表示mysql服务已经正常上线,成功解决故障

注:monit有web接口,请配置好配置文件,如下:

也可以通过web接口来访问:查看配置文件

set httpd port 2812 and

    use address localhost  # only accept connection from localhost

    allow localhost        # allow localhost to connect to the server and

    allow admin:monit      # require user 'admin' with password 'monit'

修改为:

set httpd port 2812 and

    use address 外网ip  # only accept connection from localhost

    allow 外网ip        # allow localhost to connect to the server and

    allow admin:monit      # require user 'admin' with password 'monit'

    allow @monit           # allow users of group 'monit' to connect (rw)

    allow @users readonly  # allow users of group 'users' to connect readonly

通过浏览器来访问,查看监控状态

wKiom1etZ2zRLlyiAACrQDOdjNc261.jpg

更多学习monit技能,请参考文章上方的参考文章