采用Shell完成对Linux&Oracle系统环境的监控


1.环境介绍

         目前数据库环境如下表:

主机名

网络ip地址

数据库名

unique name

数据库实例名

数据文件位置

zyx.test.com

192.168.5.231

orcl

orcl

test

/u01/app/oracle/oradata/orcl/

 

    系统为redhat linux 6.5,此次部署需要完成监控的是对数据库告警日志、数据库连接是否正常、表空间使用情况、系统剩余空间、系统负载、内存使用率等进行监控分析,发现告警并立即发送告警邮件。

 

 

2.前期注意事项

2.1 时间同步设置

[root@zyx ~]# echo $LANG

en_US.UTF-8

[root@zyx ~]# mkdir -p /opt/scripts/time/

[root@zyx ~]# vim /opt/scripts/time/time_sync.sh

#!/bin/bash

 

#time

node1_time=`date +"%Y-%m-%d %H:%M:%S"`

echo "before time sync: $node1_time"

 

/usr/sbin/ntpdate asia.pool.ntp.org

 

#time

node1_time=`date +"%Y-%m-%d %H:%M:%S"`

echo "after time sync: $node1_time"

 

[root@zyx ~]# chmod a+x /opt/scripts/time/time_sync.sh

[root@zyx ~]# sh /opt/scripts/time/time_sync.sh

before time sync: 2016-04-23 01:32:35

21 Jun 09:33:34 ntpdate[5087]: step time server 139.59.193.168 offset 5126456.496962 sec

after time sync: 2016-06-21 09:33:35

 

[root@zyx ~]# crontab -l

# 同步时间

55 03 * * 6 /opt/scripts/time/time_sync.sh >> /opt/scripts/time/time_sync.log

 

 

----后面oracle用户下监控脚本也创建在该目录的,先将该目录属组改为oracle

[root@zyx ~]# chown -R oracle. /opt/scripts/

 

2.2 字符集问题(监控和被监控)

         mail发送邮件时,如果有中文,一定得注意字符集问题,如果字符集设置不对,那么在发送的告警邮件中如果有中文,会发送二进制附件:

[oracle@zyx ~]$ export LANG=en_US.UTF-8

[oracle@zyx ~]$ echo $LANG

en_US.UTF-8

[oracle@zyx ~]$ mail -s "test_mail" 923451558@qq.com

MY test mail!!!!!

我的测试邮箱额!!!

EOT

 

----更改后发送没有报错,查看日志是否发送成功(注意,如果发送的邮件的服务器和接收服务器的时间不一致,接收服务器可能会拒绝接收邮件,从而导致发送失败)

[oracle@zyx ~]$ /usr/lib/sendmail -bp

Mail queue is empty

 

[oracle@orcl ~]$ vim /var/spool/mail/oracle

[root@orcl ~]# vim /var/log/maillog

 

----如果没有问题后,记得在.bash_profile中设置

 

----登录我的邮箱查看

 

2.3 ssh互信

----如果需要将本机的日志拷贝到其他主机进行check报警,需要建立互信机制

----orcl.test.com(备监控库)

[oracle@orcl ~]$ ll .ssh/

[oracle@orcl ~]$ ssh-keygen -t rsa

[oracle@orcl ~]$ ssh-copy-id 192.168.5.232

 

---- dg2.orcl.com(监控库)

[oracle@dg2 ~]$ ll .ssh/

[oracle@dg2 ~]$ ssh-keygen -t rsa

[oracle@dg2 ~]$ ssh-copy-id 192.168.5.231

 

[oracle@dg2 ~]$ ssh 192.168.5.231 hostname

orcl.test.com

[oracle@dg2 ~]$ ssh 192.168.5.231 date

Thu Jun 16 10:23:38 CST 2016

 

[oracle@orcl ~]$ ssh 192.168.5.232 hostname

dg2.orcl.com

[oracle@orcl ~]$ ssh 192.168.5.232 date

Thu Jun 16 10:23:57 CST 2016

 

2.4 时时关注邮件报错

[oracle@orcl ~]$ /usr/lib/sendmail -bp

Mail queue is empty

[oracle@orcl ~]$ vim /var/spool/mail/oracle

[root@orcl ~]# vim /var/log/maillog

 

 

3.定义监控环境

----准备

------创建用于存放log数据

[oracle@zyx ~]$ mkdir -p /opt/scripts/mail/data

------创建用于check发送告警邮件

[oracle@zyx ~]$ mkdir -p /opt/scripts/monitor

3.1 监控脚本(api_monitor_iface.sh)

----主库中主要用于将log传送或check触发发送告警邮件

[oracle@zyx ~]$ vim /opt/scripts/monitor/api_moniter_iface.sh

#!/bin/sh

#--------------------------

#creator: shall

#create time: 2016-6-16

#function:

#usage: call monitor's interface

#--------------------------

 
if [ $# -ge 1 ]; then

        if [ $1 == 'mobile' ]; then

                echo "send mobile message"

                if [ $# -ne 3 ]; then

                echo "Usage: `basename $0`(mobile msg_title msg_file msg_group)"

                exit 1

                fi

        else

                echo "Usage: `basename $0`(mobile msg_title msg_file msg_group)"

                echo "or Usage: `basename $0`(mail mail_title mail_file mail_group)"

                echo "or Usage: `basename $0`(scp source_file dest_file)"

                exit 1

        fi

fi

 
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

#监控层地址

admin="oracle"

monitor="192.168.5.231"

 

if [ $1 == 'mobile' ]; then

         msg_title=$2

         msg_file=$3

         msg_group=$4

         echo "para 1: $1"

    "$SENDMSG '$msg_title' '$msg_file' '$msg_group'"

fi

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#邮件发送结束

echo "$node_time:" `basename $0` " exit code 0"

 

3.2 发送短信脚本(api_send_dbmobile.sh)

----定义发送邮件短信脚本

[oracle@zyx ~]$ vim /opt/scripts/monitor/api_send_dbmobile.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function:

#usage: send mobile

#--------------------------

 #这时表示至少需要符合条件的三个参数,否则,会提示脚本的用法

if [ $# -ne 3 ]; then

         echo "Usage2: `basename $0`(title file group1|group2)"

         exit 1

fi

 

#邮件发送的分组列表

#组1

#组2

msg_group1="18580131502@163.com","923451558@qq.com"

msg_group2="923451558@qq.com"

 

#获得短信标题、正文

msg_title=$1

msg_file=$2

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#发送短信子过程

case $3 in

group1)echo "$node_time: send mobile to dba-user-list"

         mail -s "$msg_title" $msg_group1 < $msg_file

;;

group2)echo "$node_time: send mobile to operation-user-list"

        mail -s "$msg_title" $msg_group2 < $msg_file

;;

*)echo "Usage: `basename $0`(title file group1|group2)"

;;

esac

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#邮件发送结束

echo "$node_time:" `basename $0` " exit code 0"

 

3.监控部署

3.1 数据库告警日志(monitor_alertlog_check.sh)

----调用monitor,check告警日志并发送邮件

[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_alertlog_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function: copy alert log file to monitor

#usage: crontab on host

#--------------------------

 

#得到主机名,SID

dbname=`hostname`

#ORACLE_SID=$1

ORACLE_SID=test

ORACLE_BASE=/u01/app/oracle

oracle_alert_log="$ORACLE_BASE"/diag/rdbms/orcl/"$ORACLE_SID"/trace/alert_"$ORACLE_SID".log

dest_oracle_alert_log=/opt/scripts/mail/data/"$dbname"_"$ORACLE_SID"_alert.log

 

#变量打印测试

echo $oracle_alert_log

echo $dbname

 

#告警日志文件,传至存放数据目录,等待分析

cp "$oracle_alert_log" "$dest_oracle_alert_log"

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#copy结束

echo "$node_time:" `basename $0` " exit code 0"

 

echo "--------------------------------------------------------------------------------"

 

#等待采集端主机采集负载信息

#sleep 30

 

#告警文件所在地址

dest_dir=/opt/scripts/mail/data/

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

dest_servers="zyx.test.com"

dest_sid="test"

for dest_server in $dest_servers

do

         echo "---------- $dest_server ----------";

         new_file="$dest_dir""$dest_server"_"$dest_sid"_alert.log

         old_file="$dest_dir""$dest_server"_"$dest_sid"_alert.old

         diff_file="$dest_dir""$dest_server"_"$dest_sid"_alert.diff

        

         #通过diff命令来比较原始文件与目标文件,并过滤所有的ORA-信息,

         #也就是错误信息,写入到差异文件

         echo $new_file $old_file $diff_file

         diff $new_file $old_file | grep "ORA-" | sed 's/^<//g' > $diff_file

         #value_alert_err=`cat $diff_file 2 > /dev/null`

         cp -rf $new_file $old_file

        

         #如果差异文件内容存在且不为空,则表示有错误信息

         if [ -s "$diff_file" ]; then

                   msg_title="$dest_server""_alert_error"

                   msg_file=$diff_file

                   msg_group="group1"

                   echo "$msg_file"

                   #msg_log="$dest_server"":""$value_alert_err"

                   $SENDMSG $msg_title $msg_file $msg_group

         fi

done

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#脚本执行结束

echo "$node_time:" `basename $0` "exit code 0"

 

4.2系统剩余空间(monitor_disk_check.sh)

----调用monitor,发送告警邮件

[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_disk_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function: copy disk info into to monitor

#usage: crontab on host

#--------------------------

 

#得到主机名,磁盘日志文件

dbname=`hostname`

disk_log=/opt/scripts/mail/data/"$dbname"_disk.log

 

#变量打印测试

echo $disk_log

echo $dbname

 

#磁盘信息快照

df -P | awk 'FNR>1{print $6,$5}' > "$disk_log"

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

echo "$node_time:" `basename $0` " exit code 0"

 

echo "--------------------------------------------------------------------------------"


#
磁盘负载文件所在地址

dest_dir=/opt/scripts/mail/data

 

#磁盘负载的报警阙值

diskoff=98

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

node_time=`date +"%Y-%m-%d %H:%M:%S"`


dest_servers="zyx.test.com"

for dest_server in $dest_servers

do

         #echo -e "\033[32;1m================= $dest_server =================\033[0m";

         #获得最新的磁盘负载信息

         #disk日志文件与disk消息文件

         disk_info="$dest_dir"/"$dest_server"_disk.log

         disk_warning="$dest_dir"/"$dest_server"_disk.out

        

         echo $disk_info

         echo $disk_warning        

 

         cat /dev/null > $disk_warning

         #遍历disk日志文件,逐个比较并写入disk消息文件

         while read LINE

         do

                   value_disk_per=0

                   value_disk_per=`echo $LINE | awk '{print $2}' | sed s/'%'//g`

                   value_disk_name=`echo $LINE | awk '{print $1}'`

                   echo "$value_disk_name" : "$value_disk_per"%

                   if [ $value_disk_per -ge  $diskoff ]; then

                            echo ""$dest_server"'s "$value_disk_name" 磁盘使用空间超过 "$diskoff"%, 目前为 "$value_disk_per"%" >> $disk_warning

                   fi

         done < $disk_info

        

         #如果文件内容存在且不为空,则表示有错误信息

        if [ -s "$disk_warning" ]; then

                   msg_title="$dest_server""_disk_warning"

                msg_file=$disk_warning

                msg_group="group1"

                $SENDMSG "$msg_title" "$msg_file" "$msg_group"            

         fi

done

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#脚本执行结束

echo "$node_time:" `basename $0` "exit code 0"

 

4.3 TNS通讯(monitor_tnsping_check.sh)

----调用monitor,用于发送告警邮件(监控监听)

[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_tnsping_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function: copy tnsping result file to monitor

#usage: crontab on host

#--------------------------

 

#得到主机名,SID

dbname=`hostname`

tnsping_log=/opt/scripts/mail/data/"$dbname"_tnsping.log

 

#清空tnsping日志文件

cat /dev/null > $tnsping_log

 

#循环tnspnig各个节点,如果PING不通,则写入日志文件

dest_tnsping_nodes="orcl orclps test1 test2"

 

for dest_tnsping_node in $dest_tnsping_nodes

do

         value_tnsping=`tnsping "$dest_tnsping_node" | tail -1 | grep OK`

         echo "$dest_tnsping_node"": ""$value_tnsping"

         #如果value_tnsping为空,则说明有问题,打印至日志

         if [ ! -n "$value_tnsping" ]; then

                    echo "$dest_tnsping_node"" TNS连接失败!" >> $tnsping_log

         fi

done

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

echo "$node_time:" `basename $0` "exit code 0"

 

echo "--------------------------------------------------------------------------------"

#等待采集端主机采集负载信息

#sleep 60

 

#告警文件所在地址

dest_dir=/opt/scripts/mail/data/

 

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

#这个列表也可以写在配置文件中

dest_servers="zyx.test.com"

for dest_server in $dest_servers

do

         #通过手机发送报错信息

         value_tnsping="$dest_dir""$dest_server"_tnsping.log

         echo "$dest_server"": ""$value_tnsping"

         #如果文件内容存在且不为空,则表示有错误信息

         if [ -s "$value_tnsping" ]; then

                   msg_title="$dest_server""_tnsping_error"

                   msg_file=$value_tnsping

                   msg_group="group1"

                   $SENDMSG $msg_title $msg_file $msg_group

         fi

done

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

#脚本执行结束

echo "$node_time:" `basename $0` "exit code 0"

 

4.4系统负载(monitor_load_check.sh)

----调用monitor,用于发送告警邮件

[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_load_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#--------------------------

# 监控系统负载与CPU、内存,超出警戒值则发邮件告警

node_time=`date +"%Y-%m-%d %H:%M:%S"`

# 得到主机名

dbname=`hostname`

echo "node_time is $node_time"

echo "db_name is $dbname"

 

#告警文件所在地址

dest_dir=/opt/scripts/mail/data/

load_log="$dest_dir""$dbname"_load.log

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh


# 监控系统负载的变化情况,超出时发邮件告警:

# 抓取cpu的总核数

cpu_num=`grep -c 'model name' /proc/cpuinfo`

 

# 抓取当前系统15分钟的平均负载值

load_15=`uptime | sed 's/.*average://g'|awk '{print $3}'`

 

# 计算当前系统单个核心15分钟的平均负载值,结果小于1.0时前面个位数补0

average_load=`echo "scale=2;a=$load_15/$cpu_num;if(length(a)==scale(a)) print 0;print a" | bc`

echo " average_load  is $average_load"

 

# 取上面平均负载值的个位整数

average_int=`echo $average_load | cut -f 1 -d "."`

echo " average_int  is $average_int"

 

# 设置系统单个核心15分钟的平均负载的告警值为0.70(即使用超过70%的时候告警)

load_warn=0.70

# 当单个核心15分钟的平均负载值大于等于1.0(即个位整数大于0) ,直接发邮件告警;如果小于1.0则进行二次比较

if (($average_int > 0)); then

      echo "$node_time  $dbname服务器15分钟的系统平均负载为$average_load,超过警戒值1.0,请立即处理!!!"  >$load_log

           msg_title="$dbname服务器系统负载严重告警!!!"

           msg_file=$load_log

      msg_group="group1"

          echo "SENDMSG $msg_title $msg_file $msg_group"

     $SENDMSG $msg_title $msg_file $msg_group

         else

# 当前系统15分钟平均负载值与告警值进行比较(当大于告警值0.70时会返回1,小于时会返回0 )

       load_now=`expr $average_load \> $load_warn`

# 如果系统单个核心15分钟的平均负载值大于告警值0.70(返回值为1),则发邮件给管理员

           if (($load_now == 1)); then

         echo "$node_time  $dbname服务器15分钟的系统平均负载达到 $average_load,超过警戒值0.70,请及时处理。"  >$load_log

              msg_title="$dbname服务器系统负载告警!!!"

              msg_file=$load_log

         msg_group="group1"

             echo "SENDMSG $msg_title $msg_file $msg_group"

        $SENDMSG $msg_title $msg_file $msg_group

      fi

fi


4.5 CPU使用率 (monitor_cpu_check.sh)

[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_cpu_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#--------------------------

# 监控系统CPU,超出警戒值则发邮件告警

node_time=`date +"%Y-%m-%d %H:%M:%S"`

# 得到主机名

dbname=`hostname`

echo "node_time is $node_time"

echo "db_name is $dbname"

 

#告警文件所在地址

dest_dir=/opt/scripts/mail/data/

cpu_log="$dest_dir""$dbname"_cpu.log


SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

# 监控系统cpu的情况,当使用超过80%的时候发告警邮件

# 取当前空闲cpu百份比值(只取整数部分)

cpu_idle=`top -b -n 1 | grep Cpu | awk '{print $5}' | cut -f 1 -d "."`

echo " cpu_idle  is $cpu_idle "

 

# 设置空闲cpu的告警值为20%,如果当前cpu使用超过80%(即剩余小于20%),立即发邮件告警

if (($cpu_idle < 20)); then

         echo "$node_time   $dbname服务器cpu剩余$cpu_idle%,使用率已经超过80%,请及时处理。" >$cpu_log

              msg_title="$dbname服务器CPU告警"

              msg_file=$cpu_log

         msg_group="group1"

             echo "SENDMSG $msg_title $msg_file $msg_group"

        $SENDMSG $msg_title $msg_file $msg_group

fi

 

4.6 swap使用率(monitor_swap_check.sh)

[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_swap_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#--------------------------

# 监控系统CPU,超出警戒值则发邮件告警

node_time=`date +"%Y-%m-%d %H:%M:%S"`

# 得到主机名

dbname=`hostname`

echo "node_time is $node_time"

echo "db_name is $dbname"

 

#告警文件所在地址

dest_dir=/opt/scripts/mail/data/

swap_log="$dest_dir""$dbname"_swap.log


SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

# 监控系统交换分区swap的情况,当使用超过80%的时候发告警邮件

# 系统分配的交换分区总量

swap_total=`free -m | grep Swap | awk '{print $2}'`

 

# 当前剩余的交换分区free大小

swap_free=`free -m | grep Swap | awk '{print $4}'`

 

# 当前已使用的交换分区used大小

swap_used=`free -m | grep Swap | awk '{print $3}'`

 

if (($swap_used != 0)); then

    swap_per=0`echo "scale=2;$swap_free/$swap_total" | bc`

    echo "swap_pre is $swap_per"

 

# 设置交换分区的告警值为20%(即使用超过80%的时候告警)。

    swap_warn=0.20

 

    swap_now=`expr $swap_per \> $swap_warn`

 

    if (($swap_now == 0)); then

       echo "$node_time     $dbname服务器swap交换分区只剩下 $swap_free M 未使用,剩余不足20%,使用率已经超过80%,请及时处理。"  >$swap_log

              msg_title="$dbname服务器内存告警"

              msg_file=$swap_log

         msg_group="group1"

             echo "SENDMSG $msg_title $msg_file $msg_group"

        $SENDMSG $msg_title $msg_file $msg_group

    fi

fi

 

 

4.7 判断数据库是否open(monitor_db_connect_info.sh)

----判断是否能连接数据库,如果不能连接,发送告警邮件,建议由其他主机进行监控

------配置脚本之前,需要在判断的数据库中创建scott用户,密码设置为111111,并且给予连接权限

SQL> create user scott identified by 111111;

create user scott identified by 111111

            *

ERROR at line 1:

ORA-01920: user name 'SCOTT' conflicts with another user or role name

SQL> alter user scott account unlock identified by 111111;

 

#####################################################################

#####################################################################

[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_db_connect_info.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function: TNSPING各个数据库节点,然后每个节点单独写入一个日志文件

#usage: crontab on monitor

#--------------------------

 

#当前数据库环境变量

ORACLE_BASE=/u01/app/oracle

ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1

SQLPLUS=$ORACLE_HOME/bin/sqlplus

 

#设置主机变量

dbname=`hostname`

base_dir=/opt/scripts/mail/data/

 

MONITOR=/opt/scripts/monitor/api_monitor_iface.sh

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

#监管主机节点逐个探测,格式列表为: 主机TNS@分管运维邮件组

dest_connect_nodes="orcl@group1 orclps@group1 test2@group1"

 

for value_dest_connect_node in $dest_connect_nodes

do

 

#解析主机TNS名称、所属邮件组

dest_connect_node=`echo $value_dest_connect_node | awk -F@ '{print $1}'`

dest_connect_node_group=`echo $value_dest_connect_node | awk -F@ '{print $2}'`

dest_connect_log="$base_dir""$dest_connect_node"_connect.log

 

#数据库连接状态返回值

value_dest_connect=`$SQLPLUS -S scott/111111@$dest_connect_node << EOF

set colsep ','

set echo off

set feedback off

set heading off

set pagesize 0

set linesize 1000

set trimspool on

set timing off

set long 1000

 

Select 1 from dual;

 

exit

EOF`

 

         #取SQL输出的首行,去除TAB、空格

         value_dest_connect=`echo "$value_dest_connect" | sed -n '1p' | tr -d "[\011] "`

         echo "$dest_connect_node"@"$dest_connect_node_group"": |""$value_dest_connect""|"

       

         #如果value_dest_connect不等于"1",则说明有问题,打印至日志

         #字符串相等比较

         #使用单个等号、等号两边各有一个空格,$var旁边的x是为了预防空值比较的处理

         if [ "$value_dest_connect"x != "1"x ]; then

                echo "$dest_connect_node"" 数据库连接失败!" > $dest_connect_log

                msg_title="$dest_connect_node""_connect_error"

                msg_file=$dest_connect_log

                msg_group="$dest_connect_node_group"

                $SENDMSG "$msg_title" "$msg_file" "$msg_group"

        fi

done

 

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#脚本执行结束

echo "$node_time:" `basename $0` "exit code 0"

 

4.8 数据库表空间告警(monitor_db_ts_check.sh)

---需要用到system用户登录(安全性??)

[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_db_ts_check.sh

#!/bin/sh

. ~/.bash_profile

#--------------------------

#creator: shall

#create time: 2016-6-16

#function: copy tablespace info into to monitor

#usage: crontab on host

#--------------------------

 

#数据库环境变量

ORACLE_BASE=/u01/app/oracle

ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1

SQLPLUS=$ORACLE_HOME/bin/sqlplus

 

 

#得到主机名,表空间日志文件

dbname=`hostname`

ts_log=/opt/scripts/mail/data/"$dbname"_ts.log

 

#表空间信息快照

$SQLPLUS -S system/sys << EOF

set colsep ',';

set echo off

set feedback off

set heading off

set pagesize 0

set linesize 1000

set trimspool on

set timing off

set long 1000

spool $ts_log

 

Select a.Tablespace_Name "ts_name",

       Round(a.Bytes / 1024 / 1024 ) "total(M)",

       --Round((a.Bytes - b.Bytes) / 1024 / 1024 ) "used",

       Round(b.Bytes / 1024 / 1024 ) "free(M)",

       Round(((a.Bytes - b.Bytes) / a.Bytes) * 100, 2)||'%' "used_percent"

  From (Select Tablespace_Name, Sum(Bytes) Bytes

          From Dba_Data_Files

         Group By Tablespace_Name) a,

       (Select Tablespace_Name, Sum(Bytes) Bytes, Max(Bytes) Largest

          From Dba_Free_Space

         Group By Tablespace_Name) b

 Where a.Tablespace_Name = b.Tablespace_Name

   And Round(((a.Bytes - b.Bytes) / a.Bytes) * 100, 2) >= 93

   Union All

   SELECT a.tablespace_name "ts_name",

          a.BYTES/ 1024 / 1024  "total(M)",

          (a.bytes - nvl(b.bytes, 0))/ 1024 / 1024 "free(M)",

          ROUND(nvl(b.bytes, 0)/a.BYTES*100,2) || '%' "used_percent"

  FROM (SELECT   tablespace_name, SUM (bytes) bytes FROM dba_temp_files GROUP BY tablespace_name) a,

       (SELECT   tablespace_name, SUM (bytes_cached) bytes FROM v\$temp_extent_pool GROUP BY tablespace_name) b

WHERE a.tablespace_name = b.tablespace_name(+)

  And a.tablespace_name='TEMP'

  And ROUND(nvl(b.bytes, 0)/a.BYTES*100,2) >=85;

 

spool off

exit

EOF


#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

#邮件发送结束

echo "$node_time:" `basename $0` " exit code 0"

 

echo "--------------------------------------------------------------------------------"

#表空间负载文件所在地址

dest_dir=/opt/scripts/mail/data

SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#这个列表也可以写在配置文件中

dest_servers="zyx.test.com orcl.com"

for dest_server in $dest_servers

do

         disk_info="$dest_dir"/"$dest_server"_ts.log

         echo $disk_info

        #如果文件内容存在且不为空,则表示有错误信息

        if [ -s "$disk_info" ]; then

                msg_title="$dest_server""_tablespace_warning"

                msg_file=$disk_info

                msg_group="group1"

                $SENDMSG "$msg_title" "$msg_file" "$msg_group"            

         fi

                mv "$disk_info"  "$disk_info".old

done

 

#得到当前时间戳

node_time=`date +"%Y-%m-%d %H:%M:%S"`

 

#脚本执行结束

echo "$node_time:" `basename $0` "exit code 0"

 

5.主库脚本定时任务

5.1 授予执行权限,并执行测试

[oracle@zyx time]$ cd /opt/scripts/monitor/

[oracle@zyx monitor]$ chmod a+x api_*

[oracle@zyx monitor]$ cd /opt/scripts/mail/

[oracle@zyx mail]$ chmod a+x monitor*

 

------在授予执行权限后,可以执行下脚本,看是否有报错

[oracle@zyx mail]$ sh monitor_alertlog_check.sh

[oracle@zyx mail]$ sh monitor_disk_check.sh

[oracle@zyx mail]$ sh monitor_tnsping_check.sh

[oracle@zyx mail]$ sh monitor_cpu_check.sh

[oracle@zyx mail]$ sh monitor_load_check.sh

[oracle@zyx mail]$ sh monitor_swap_check.sh

.........

 

------执行完后,可以看到生成一些文件

[oracle@zyx mail]$ ll data/

total 124

-rw-r--r-- 1 oracle oinstall    58 Jun 21 10:48 zyx.test.com_disk.log

-rw-r--r-- 1 oracle oinstall    76 Jun 21 10:48 zyx.test.com_disk.out

-rw-r--r-- 1 oracle oinstall  4156 Jun 21 10:48 zyx.test.com_test_alert.diff

-rw-r----- 1 oracle oinstall 50828 Jun 21 10:48 zyx.test.com_test_alert.log

-rw-r----- 1 oracle oinstall 50828 Jun 21 10:48 zyx.test.com_test_alert.old

-rw-r--r-- 1 oracle oinstall    76 Jun 21 10:53 zyx.test.com_tnsping.log

------登录邮箱查看告警邮件

 

5.2 添加计划任务,自动运行

....................

[oracle@orcl mail]$ crontab -e

*/15 * * * *        /opt/scripts/mail/db_oracle_alert_info.sh > /opt/scripts/mail/db_oracle_alert_info.log

30 07 * * *         /opt/scripts/mail/db_disk_info.sh > /opt/scripts/mail/db_disk_info.log

*/30 * * * *        /opt/scripts/mail/db_tnsping_info.sh >/opt/scripts/mail/db_tnsping_info.log

*/10 * * * *        /opt/scripts/mail/sys_warning.sh > /opt/scripts/mail/sys_warning.log

 

 

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30130773/viewspace-2121157/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/30130773/viewspace-2121157/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值