1.环境介绍
目前数据库环境如下表:
主机名 | 网络ip地址 | 数据库名 | unique name | 数据库实例名 | 数据文件位置 |
zyx.test.com | 192.168.5.231 | orcl | orcl | test | /u01/app/oracle/oradata/orcl/ |
系统为redhat linux 6.5,此次部署需要完成监控的是对数据库告警日志、数据库连接是否正常、表空间使用情况、系统剩余空间、系统负载、内存使用率等进行监控分析,发现告警并立即发送告警邮件。
2.前期注意事项
2.1 时间同步设置
[root@zyx ~]# echo $LANG
en_US.UTF-8
[root@zyx ~]# mkdir -p /opt/scripts/time/
[root@zyx ~]# vim /opt/scripts/time/time_sync.sh
#!/bin/bash
#time
node1_time=`date +"%Y-%m-%d %H:%M:%S"`
echo "before time sync: $node1_time"
/usr/sbin/ntpdate asia.pool.ntp.org
#time
node1_time=`date +"%Y-%m-%d %H:%M:%S"`
echo "after time sync: $node1_time"
[root@zyx ~]# chmod a+x /opt/scripts/time/time_sync.sh
[root@zyx ~]# sh /opt/scripts/time/time_sync.sh
before time sync: 2016-04-23 01:32:35
21 Jun 09:33:34 ntpdate[5087]: step time server 139.59.193.168 offset 5126456.496962 sec
after time sync: 2016-06-21 09:33:35
[root@zyx ~]# crontab -l
# 同步时间
55 03 * * 6 /opt/scripts/time/time_sync.sh >> /opt/scripts/time/time_sync.log
----后面oracle用户下监控脚本也创建在该目录的,先将该目录属组改为oracle
[root@zyx ~]# chown -R oracle. /opt/scripts/
2.2 字符集问题(监控和被监控)
mail发送邮件时,如果有中文,一定得注意字符集问题,如果字符集设置不对,那么在发送的告警邮件中如果有中文,会发送二进制附件:
[oracle@zyx ~]$ export LANG=en_US.UTF-8
[oracle@zyx ~]$ echo $LANG
en_US.UTF-8
[oracle@zyx ~]$ mail -s "test_mail" 923451558@qq.com
MY test mail!!!!!
我的测试邮箱额!!!
EOT
----更改后发送没有报错,查看日志是否发送成功(注意,如果发送的邮件的服务器和接收服务器的时间不一致,接收服务器可能会拒绝接收邮件,从而导致发送失败)
[oracle@zyx ~]$ /usr/lib/sendmail -bp
Mail queue is empty
[oracle@orcl ~]$ vim /var/spool/mail/oracle
[root@orcl ~]# vim /var/log/maillog
----如果没有问题后,记得在.bash_profile中设置
----登录我的邮箱查看
2.3 ssh互信
----如果需要将本机的日志拷贝到其他主机进行check报警,需要建立互信机制
----orcl.test.com(备监控库)
[oracle@orcl ~]$ ll .ssh/
[oracle@orcl ~]$ ssh-keygen -t rsa
[oracle@orcl ~]$ ssh-copy-id 192.168.5.232
---- dg2.orcl.com(监控库)
[oracle@dg2 ~]$ ll .ssh/
[oracle@dg2 ~]$ ssh-keygen -t rsa
[oracle@dg2 ~]$ ssh-copy-id 192.168.5.231
[oracle@dg2 ~]$ ssh 192.168.5.231 hostname
orcl.test.com
[oracle@dg2 ~]$ ssh 192.168.5.231 date
Thu Jun 16 10:23:38 CST 2016
[oracle@orcl ~]$ ssh 192.168.5.232 hostname
dg2.orcl.com
[oracle@orcl ~]$ ssh 192.168.5.232 date
Thu Jun 16 10:23:57 CST 2016
2.4 时时关注邮件报错
[oracle@orcl ~]$ /usr/lib/sendmail -bp
Mail queue is empty
[oracle@orcl ~]$ vim /var/spool/mail/oracle
[root@orcl ~]# vim /var/log/maillog
3.定义监控环境
----准备
------创建用于存放log数据
[oracle@zyx ~]$ mkdir -p /opt/scripts/mail/data
------创建用于check发送告警邮件
[oracle@zyx ~]$ mkdir -p /opt/scripts/monitor
3.1 监控脚本(api_monitor_iface.sh)
----主库中主要用于将log传送或check触发发送告警邮件
[oracle@zyx ~]$ vim /opt/scripts/monitor/api_moniter_iface.sh
#!/bin/sh
#--------------------------
#creator: shall
#create time: 2016-6-16
#function:
#usage: call monitor's interface
#--------------------------
if [ $# -ge 1 ]; then
if [ $1 == 'mobile' ]; then
echo "send mobile message"
if [ $# -ne 3 ]; then
echo "Usage: `basename $0`(mobile msg_title msg_file msg_group)"
exit 1
fi
else
echo "Usage: `basename $0`(mobile msg_title msg_file msg_group)"
echo "or Usage: `basename $0`(mail mail_title mail_file mail_group)"
echo "or Usage: `basename $0`(scp source_file dest_file)"
exit 1
fi
fi
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
#监控层地址
admin="oracle"
monitor="192.168.5.231"
if [ $1 == 'mobile' ]; then
msg_title=$2
msg_file=$3
msg_group=$4
echo "para 1: $1"
"$SENDMSG '$msg_title' '$msg_file' '$msg_group'"
fi
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#邮件发送结束
echo "$node_time:" `basename $0` " exit code 0"
3.2 发送短信脚本(api_send_dbmobile.sh)
----定义发送邮件短信脚本
[oracle@zyx ~]$ vim /opt/scripts/monitor/api_send_dbmobile.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function:
#usage: send mobile
#--------------------------
#这时表示至少需要符合条件的三个参数,否则,会提示脚本的用法
if [ $# -ne 3 ]; then
echo "Usage2: `basename $0`(title file group1|group2)"
exit 1
fi
#邮件发送的分组列表
#组1
#组2
msg_group1="18580131502@163.com","923451558@qq.com"
msg_group2="923451558@qq.com"
#获得短信标题、正文
msg_title=$1
msg_file=$2
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#发送短信子过程
case $3 in
group1)echo "$node_time: send mobile to dba-user-list"
mail -s "$msg_title" $msg_group1 < $msg_file
;;
group2)echo "$node_time: send mobile to operation-user-list"
mail -s "$msg_title" $msg_group2 < $msg_file
;;
*)echo "Usage: `basename $0`(title file group1|group2)"
;;
esac
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#邮件发送结束
echo "$node_time:" `basename $0` " exit code 0"
3.监控部署
3.1 数据库告警日志(monitor_alertlog_check.sh)
----调用monitor,check告警日志并发送邮件
[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_alertlog_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function: copy alert log file to monitor
#usage: crontab on host
#--------------------------
#得到主机名,SID
dbname=`hostname`
#ORACLE_SID=$1
ORACLE_SID=test
ORACLE_BASE=/u01/app/oracle
oracle_alert_log="$ORACLE_BASE"/diag/rdbms/orcl/"$ORACLE_SID"/trace/alert_"$ORACLE_SID".log
dest_oracle_alert_log=/opt/scripts/mail/data/"$dbname"_"$ORACLE_SID"_alert.log
#变量打印测试
echo $oracle_alert_log
echo $dbname
#告警日志文件,传至存放数据目录,等待分析
cp "$oracle_alert_log" "$dest_oracle_alert_log"
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#copy结束
echo "$node_time:" `basename $0` " exit code 0"
echo "--------------------------------------------------------------------------------"
#等待采集端主机采集负载信息
#sleep 30
#告警文件所在地址
dest_dir=/opt/scripts/mail/data/
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
dest_servers="zyx.test.com"
dest_sid="test"
for dest_server in $dest_servers
do
echo "---------- $dest_server ----------";
new_file="$dest_dir""$dest_server"_"$dest_sid"_alert.log
old_file="$dest_dir""$dest_server"_"$dest_sid"_alert.old
diff_file="$dest_dir""$dest_server"_"$dest_sid"_alert.diff
#通过diff命令来比较原始文件与目标文件,并过滤所有的ORA-信息,
#也就是错误信息,写入到差异文件
echo $new_file $old_file $diff_file
diff $new_file $old_file | grep "ORA-" | sed 's/^<//g' > $diff_file
#value_alert_err=`cat $diff_file 2 > /dev/null`
cp -rf $new_file $old_file
#如果差异文件内容存在且不为空,则表示有错误信息
if [ -s "$diff_file" ]; then
msg_title="$dest_server""_alert_error"
msg_file=$diff_file
msg_group="group1"
echo "$msg_file"
#msg_log="$dest_server"":""$value_alert_err"
$SENDMSG $msg_title $msg_file $msg_group
fi
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#脚本执行结束
echo "$node_time:" `basename $0` "exit code 0"
4.2系统剩余空间(monitor_disk_check.sh)
----调用monitor,发送告警邮件
[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_disk_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function: copy disk info into to monitor
#usage: crontab on host
#--------------------------
#得到主机名,磁盘日志文件
dbname=`hostname`
disk_log=/opt/scripts/mail/data/"$dbname"_disk.log
#变量打印测试
echo $disk_log
echo $dbname
#磁盘信息快照
df -P | awk 'FNR>1{print $6,$5}' > "$disk_log"
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
echo "$node_time:" `basename $0` " exit code 0"
echo "--------------------------------------------------------------------------------"
#磁盘负载文件所在地址
dest_dir=/opt/scripts/mail/data
#磁盘负载的报警阙值
diskoff=98
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
node_time=`date +"%Y-%m-%d %H:%M:%S"`
dest_servers="zyx.test.com"
for dest_server in $dest_servers
do
#echo -e "\033[32;1m================= $dest_server =================\033[0m";
#获得最新的磁盘负载信息
#disk日志文件与disk消息文件
disk_info="$dest_dir"/"$dest_server"_disk.log
disk_warning="$dest_dir"/"$dest_server"_disk.out
echo $disk_info
echo $disk_warning
cat /dev/null > $disk_warning
#遍历disk日志文件,逐个比较并写入disk消息文件
while read LINE
do
value_disk_per=0
value_disk_per=`echo $LINE | awk '{print $2}' | sed s/'%'//g`
value_disk_name=`echo $LINE | awk '{print $1}'`
echo "$value_disk_name" : "$value_disk_per"%
if [ $value_disk_per -ge $diskoff ]; then
echo ""$dest_server"'s "$value_disk_name" 磁盘使用空间超过 "$diskoff"%, 目前为 "$value_disk_per"%" >> $disk_warning
fi
done < $disk_info
#如果文件内容存在且不为空,则表示有错误信息
if [ -s "$disk_warning" ]; then
msg_title="$dest_server""_disk_warning"
msg_file=$disk_warning
msg_group="group1"
$SENDMSG "$msg_title" "$msg_file" "$msg_group"
fi
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#脚本执行结束
echo "$node_time:" `basename $0` "exit code 0"
4.3 TNS通讯(monitor_tnsping_check.sh)
----调用monitor,用于发送告警邮件(监控监听)
[oracle@zyx ~]$ vim /opt/scripts/mail/monitor_tnsping_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function: copy tnsping result file to monitor
#usage: crontab on host
#--------------------------
#得到主机名,SID
dbname=`hostname`
tnsping_log=/opt/scripts/mail/data/"$dbname"_tnsping.log
#清空tnsping日志文件
cat /dev/null > $tnsping_log
#循环tnspnig各个节点,如果PING不通,则写入日志文件
dest_tnsping_nodes="orcl orclps test1 test2"
for dest_tnsping_node in $dest_tnsping_nodes
do
value_tnsping=`tnsping "$dest_tnsping_node" | tail -1 | grep OK`
echo "$dest_tnsping_node"": ""$value_tnsping"
#如果value_tnsping为空,则说明有问题,打印至日志
if [ ! -n "$value_tnsping" ]; then
echo "$dest_tnsping_node"" TNS连接失败!" >> $tnsping_log
fi
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
echo "$node_time:" `basename $0` "exit code 0"
echo "--------------------------------------------------------------------------------"
#等待采集端主机采集负载信息
#sleep 60
#告警文件所在地址
dest_dir=/opt/scripts/mail/data/
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
#这个列表也可以写在配置文件中
dest_servers="zyx.test.com"
for dest_server in $dest_servers
do
#通过手机发送报错信息
value_tnsping="$dest_dir""$dest_server"_tnsping.log
echo "$dest_server"": ""$value_tnsping"
#如果文件内容存在且不为空,则表示有错误信息
if [ -s "$value_tnsping" ]; then
msg_title="$dest_server""_tnsping_error"
msg_file=$value_tnsping
msg_group="group1"
$SENDMSG $msg_title $msg_file $msg_group
fi
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#脚本执行结束
echo "$node_time:" `basename $0` "exit code 0"
4.4系统负载(monitor_load_check.sh)
----调用monitor,用于发送告警邮件
[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_load_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#--------------------------
# 监控系统负载与CPU、内存,超出警戒值则发邮件告警
node_time=`date +"%Y-%m-%d %H:%M:%S"`
# 得到主机名
dbname=`hostname`
echo "node_time is $node_time"
echo "db_name is $dbname"
#告警文件所在地址
dest_dir=/opt/scripts/mail/data/
load_log="$dest_dir""$dbname"_load.log
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
# 监控系统负载的变化情况,超出时发邮件告警:
# 抓取cpu的总核数
cpu_num=`grep -c 'model name' /proc/cpuinfo`
# 抓取当前系统15分钟的平均负载值
load_15=`uptime | sed 's/.*average://g'|awk '{print $3}'`
# 计算当前系统单个核心15分钟的平均负载值,结果小于1.0时前面个位数补0
average_load=`echo "scale=2;a=$load_15/$cpu_num;if(length(a)==scale(a)) print 0;print a" | bc`
echo " average_load is $average_load"
# 取上面平均负载值的个位整数
average_int=`echo $average_load | cut -f 1 -d "."`
echo " average_int is $average_int"
# 设置系统单个核心15分钟的平均负载的告警值为0.70(即使用超过70%的时候告警)
load_warn=0.70
# 当单个核心15分钟的平均负载值大于等于1.0(即个位整数大于0) ,直接发邮件告警;如果小于1.0则进行二次比较
if (($average_int > 0)); then
echo "$node_time $dbname服务器15分钟的系统平均负载为$average_load,超过警戒值1.0,请立即处理!!!" >$load_log
msg_title="$dbname服务器系统负载严重告警!!!"
msg_file=$load_log
msg_group="group1"
echo "SENDMSG $msg_title $msg_file $msg_group"
$SENDMSG $msg_title $msg_file $msg_group
else
# 当前系统15分钟平均负载值与告警值进行比较(当大于告警值0.70时会返回1,小于时会返回0 )
load_now=`expr $average_load \> $load_warn`
# 如果系统单个核心15分钟的平均负载值大于告警值0.70(返回值为1),则发邮件给管理员
if (($load_now == 1)); then
echo "$node_time $dbname服务器15分钟的系统平均负载达到 $average_load,超过警戒值0.70,请及时处理。" >$load_log
msg_title="$dbname服务器系统负载告警!!!"
msg_file=$load_log
msg_group="group1"
echo "SENDMSG $msg_title $msg_file $msg_group"
$SENDMSG $msg_title $msg_file $msg_group
fi
fi
4.5 CPU使用率 (monitor_cpu_check.sh)
[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_cpu_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#--------------------------
# 监控系统CPU,超出警戒值则发邮件告警
node_time=`date +"%Y-%m-%d %H:%M:%S"`
# 得到主机名
dbname=`hostname`
echo "node_time is $node_time"
echo "db_name is $dbname"
#告警文件所在地址
dest_dir=/opt/scripts/mail/data/
cpu_log="$dest_dir""$dbname"_cpu.log
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
# 监控系统cpu的情况,当使用超过80%的时候发告警邮件
# 取当前空闲cpu百份比值(只取整数部分)
cpu_idle=`top -b -n 1 | grep Cpu | awk '{print $5}' | cut -f 1 -d "."`
echo " cpu_idle is $cpu_idle "
# 设置空闲cpu的告警值为20%,如果当前cpu使用超过80%(即剩余小于20%),立即发邮件告警
if (($cpu_idle < 20)); then
echo "$node_time $dbname服务器cpu剩余$cpu_idle%,使用率已经超过80%,请及时处理。" >$cpu_log
msg_title="$dbname服务器CPU告警"
msg_file=$cpu_log
msg_group="group1"
echo "SENDMSG $msg_title $msg_file $msg_group"
$SENDMSG $msg_title $msg_file $msg_group
fi
4.6 swap使用率(monitor_swap_check.sh)
[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_swap_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#--------------------------
# 监控系统CPU,超出警戒值则发邮件告警
node_time=`date +"%Y-%m-%d %H:%M:%S"`
# 得到主机名
dbname=`hostname`
echo "node_time is $node_time"
echo "db_name is $dbname"
#告警文件所在地址
dest_dir=/opt/scripts/mail/data/
swap_log="$dest_dir""$dbname"_swap.log
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
# 监控系统交换分区swap的情况,当使用超过80%的时候发告警邮件
# 系统分配的交换分区总量
swap_total=`free -m | grep Swap | awk '{print $2}'`
# 当前剩余的交换分区free大小
swap_free=`free -m | grep Swap | awk '{print $4}'`
# 当前已使用的交换分区used大小
swap_used=`free -m | grep Swap | awk '{print $3}'`
if (($swap_used != 0)); then
swap_per=0`echo "scale=2;$swap_free/$swap_total" | bc`
echo "swap_pre is $swap_per"
# 设置交换分区的告警值为20%(即使用超过80%的时候告警)。
swap_warn=0.20
swap_now=`expr $swap_per \> $swap_warn`
if (($swap_now == 0)); then
echo "$node_time $dbname服务器swap交换分区只剩下 $swap_free M 未使用,剩余不足20%,使用率已经超过80%,请及时处理。" >$swap_log
msg_title="$dbname服务器内存告警"
msg_file=$swap_log
msg_group="group1"
echo "SENDMSG $msg_title $msg_file $msg_group"
$SENDMSG $msg_title $msg_file $msg_group
fi
fi
4.7 判断数据库是否open(monitor_db_connect_info.sh)
----判断是否能连接数据库,如果不能连接,发送告警邮件,建议由其他主机进行监控
------配置脚本之前,需要在判断的数据库中创建scott用户,密码设置为111111,并且给予连接权限
SQL> create user scott identified by 111111;
create user scott identified by 111111
*
ERROR at line 1:
ORA-01920: user name 'SCOTT' conflicts with another user or role name
SQL> alter user scott account unlock identified by 111111;
#####################################################################
#####################################################################
[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_db_connect_info.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function: TNSPING各个数据库节点,然后每个节点单独写入一个日志文件
#usage: crontab on monitor
#--------------------------
#当前数据库环境变量
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1
SQLPLUS=$ORACLE_HOME/bin/sqlplus
#设置主机变量
dbname=`hostname`
base_dir=/opt/scripts/mail/data/
MONITOR=/opt/scripts/monitor/api_monitor_iface.sh
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
#监管主机节点逐个探测,格式列表为: 主机TNS名@分管运维邮件组
dest_connect_nodes="orcl@group1 orclps@group1 test2@group1"
for value_dest_connect_node in $dest_connect_nodes
do
#解析主机TNS名称、所属邮件组
dest_connect_node=`echo $value_dest_connect_node | awk -F@ '{print $1}'`
dest_connect_node_group=`echo $value_dest_connect_node | awk -F@ '{print $2}'`
dest_connect_log="$base_dir""$dest_connect_node"_connect.log
#数据库连接状态返回值
value_dest_connect=`$SQLPLUS -S scott/111111@$dest_connect_node << EOF
set colsep ','
set echo off
set feedback off
set heading off
set pagesize 0
set linesize 1000
set trimspool on
set timing off
set long 1000
Select 1 from dual;
exit
EOF`
#取SQL输出的首行,去除TAB、空格
value_dest_connect=`echo "$value_dest_connect" | sed -n '1p' | tr -d "[\011] "`
echo "$dest_connect_node"@"$dest_connect_node_group"": |""$value_dest_connect""|"
#如果value_dest_connect不等于"1",则说明有问题,打印至日志
#字符串相等比较
#使用单个等号、等号两边各有一个空格,$var旁边的x是为了预防空值比较的处理
if [ "$value_dest_connect"x != "1"x ]; then
echo "$dest_connect_node"" 数据库连接失败!" > $dest_connect_log
msg_title="$dest_connect_node""_connect_error"
msg_file=$dest_connect_log
msg_group="$dest_connect_node_group"
$SENDMSG "$msg_title" "$msg_file" "$msg_group"
fi
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#脚本执行结束
echo "$node_time:" `basename $0` "exit code 0"
4.8 数据库表空间告警(monitor_db_ts_check.sh)
---需要用到system用户登录(安全性??)
[oracle@zyx mail]$ vim /opt/scripts/mail/monitor_db_ts_check.sh
#!/bin/sh
. ~/.bash_profile
#--------------------------
#creator: shall
#create time: 2016-6-16
#function: copy tablespace info into to monitor
#usage: crontab on host
#--------------------------
#数据库环境变量
ORACLE_BASE=/u01/app/oracle
ORACLE_HOME=$ORACLE_BASE/product/11.2.0/db_1
SQLPLUS=$ORACLE_HOME/bin/sqlplus
#得到主机名,表空间日志文件
dbname=`hostname`
ts_log=/opt/scripts/mail/data/"$dbname"_ts.log
#表空间信息快照
$SQLPLUS -S system/sys << EOF
set colsep ',';
set echo off
set feedback off
set heading off
set pagesize 0
set linesize 1000
set trimspool on
set timing off
set long 1000
spool $ts_log
Select a.Tablespace_Name "ts_name",
Round(a.Bytes / 1024 / 1024 ) "total(M)",
--Round((a.Bytes - b.Bytes) / 1024 / 1024 ) "used",
Round(b.Bytes / 1024 / 1024 ) "free(M)",
Round(((a.Bytes - b.Bytes) / a.Bytes) * 100, 2)||'%' "used_percent"
From (Select Tablespace_Name, Sum(Bytes) Bytes
From Dba_Data_Files
Group By Tablespace_Name) a,
(Select Tablespace_Name, Sum(Bytes) Bytes, Max(Bytes) Largest
From Dba_Free_Space
Group By Tablespace_Name) b
Where a.Tablespace_Name = b.Tablespace_Name
And Round(((a.Bytes - b.Bytes) / a.Bytes) * 100, 2) >= 93
Union All
SELECT a.tablespace_name "ts_name",
a.BYTES/ 1024 / 1024 "total(M)",
(a.bytes - nvl(b.bytes, 0))/ 1024 / 1024 "free(M)",
ROUND(nvl(b.bytes, 0)/a.BYTES*100,2) || '%' "used_percent"
FROM (SELECT tablespace_name, SUM (bytes) bytes FROM dba_temp_files GROUP BY tablespace_name) a,
(SELECT tablespace_name, SUM (bytes_cached) bytes FROM v\$temp_extent_pool GROUP BY tablespace_name) b
WHERE a.tablespace_name = b.tablespace_name(+)
And a.tablespace_name='TEMP'
And ROUND(nvl(b.bytes, 0)/a.BYTES*100,2) >=85;
spool off
exit
EOF
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#邮件发送结束
echo "$node_time:" `basename $0` " exit code 0"
echo "--------------------------------------------------------------------------------"
#表空间负载文件所在地址
dest_dir=/opt/scripts/mail/data
SENDMSG=/opt/scripts/monitor/api_send_dbmobile.sh
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#这个列表也可以写在配置文件中
dest_servers="zyx.test.com orcl.com"
for dest_server in $dest_servers
do
disk_info="$dest_dir"/"$dest_server"_ts.log
echo $disk_info
#如果文件内容存在且不为空,则表示有错误信息
if [ -s "$disk_info" ]; then
msg_title="$dest_server""_tablespace_warning"
msg_file=$disk_info
msg_group="group1"
$SENDMSG "$msg_title" "$msg_file" "$msg_group"
fi
mv "$disk_info" "$disk_info".old
done
#得到当前时间戳
node_time=`date +"%Y-%m-%d %H:%M:%S"`
#脚本执行结束
echo "$node_time:" `basename $0` "exit code 0"
5.主库脚本定时任务
5.1 授予执行权限,并执行测试
[oracle@zyx time]$ cd /opt/scripts/monitor/
[oracle@zyx monitor]$ chmod a+x api_*
[oracle@zyx monitor]$ cd /opt/scripts/mail/
[oracle@zyx mail]$ chmod a+x monitor*
------在授予执行权限后,可以执行下脚本,看是否有报错
[oracle@zyx mail]$ sh monitor_alertlog_check.sh
[oracle@zyx mail]$ sh monitor_disk_check.sh
[oracle@zyx mail]$ sh monitor_tnsping_check.sh
[oracle@zyx mail]$ sh monitor_cpu_check.sh
[oracle@zyx mail]$ sh monitor_load_check.sh
[oracle@zyx mail]$ sh monitor_swap_check.sh
.........
------执行完后,可以看到生成一些文件
[oracle@zyx mail]$ ll data/
total 124
-rw-r--r-- 1 oracle oinstall 58 Jun 21 10:48 zyx.test.com_disk.log
-rw-r--r-- 1 oracle oinstall 76 Jun 21 10:48 zyx.test.com_disk.out
-rw-r--r-- 1 oracle oinstall 4156 Jun 21 10:48 zyx.test.com_test_alert.diff
-rw-r----- 1 oracle oinstall 50828 Jun 21 10:48 zyx.test.com_test_alert.log
-rw-r----- 1 oracle oinstall 50828 Jun 21 10:48 zyx.test.com_test_alert.old
-rw-r--r-- 1 oracle oinstall 76 Jun 21 10:53 zyx.test.com_tnsping.log
------登录邮箱查看告警邮件
5.2 添加计划任务,自动运行
....................
[oracle@orcl mail]$ crontab -e
*/15 * * * * /opt/scripts/mail/db_oracle_alert_info.sh > /opt/scripts/mail/db_oracle_alert_info.log
30 07 * * * /opt/scripts/mail/db_disk_info.sh > /opt/scripts/mail/db_disk_info.log
*/30 * * * * /opt/scripts/mail/db_tnsping_info.sh >/opt/scripts/mail/db_tnsping_info.log
*/10 * * * * /opt/scripts/mail/sys_warning.sh > /opt/scripts/mail/sys_warning.log
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/30130773/viewspace-2121157/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/30130773/viewspace-2121157/