shell编程学习之生产环境下的脚本学习

前言

前两天学习了shell的初级编程,初步掌握了awk,sed, grep三剑客的用法,今天学习几个大佬写出来的脚本,巩固下学到的知识,顺便记录下脚本的内容,方便以后使用查找。

生产环境下的统计类脚本

1. 统计设备资产明细脚本

可以使用Python或者Go语言封装shell脚本,获取脚本执行结果,然后以后端API的形式提供更好地数据展示结构,例如JSON。

脚本内容

#! /bin/bash

####get cpu info ####
cpu_num=`cat /proc/cpuinfo| grep "physical id"| sort| uniq| wc -l`
cpu_sum=`cat /proc/cpuinfo| grep processor| wc -l`
cpu_hz=`cat /proc/cpuinfo| grep "model name"| uniq -c| awk '{print $NF}'`

####get nic info####
mem_m=0
for i in `dmidecode -t memory| grep Size: |grep -v "No Module Installed" |awk '{print $2}'`;
do
  mem_m=`expr $mem_m + $i`
done
mem_sum=`echo $mem_m / 1024 | bc`
wan_num=`lspci | grep Ethernet | grep -E "0-Gigabit |10 Gigabi" | wc -l`

####get disk num####
B=`date +%s`
ssd_num=0
sata_num=0
for i in `lsblk | grep "disk" | awk '{print $1}' | grep -Ev "ram" | sort`;
do
  code=`cat /sys/block/$i/queue/rotational`
  if [ "$code" = "0" ];then
	ssd_num=`expr $ssd_num + 1` && echo $i >>/tmp/$B.ssd
  else
    sata_num=`expr $sata_num + 1` && echo $i >>/tmp/$B.sata
  fi
done

####get disk sum####
C=`date +%N`
ssd_sum=0
sata_sum=0
if [ -f /tmp/$B.ssd ];then
  for n in `cat /tmp/$B.ssd`;
  do
    fdisk -l /dev/$n >>/tmp/$C.ssd 2>$1
	for x in `grep "Disk /dev" /tmp/$C.ssd | awk '{print $3}'`;
	do
	  u=`echo $x / 1 |bc`
	done 
	ssd_sum=`expr $ssd_sum + $u + 1`
  done
fi

for m in `cat /tmp/$B.sata`;
do
  fdisk -l /dev/$m >> /tmp/$C.sata 2>&1
  for y in `grep "Disk /dev" /tmp/$C.sata | awk '{print $3}'`;
  do
    v=`echo $y / 1|bc`
  done
  sata_sum=`expr $sata_sum + $v + 1`
done

####ip info####
ip=`ifconfig eth0 | grep inet | awk '{print $2}'`

####show dev info####
echo -n "$ip `hostname` "
echo -n "CPU(物理核数,逻辑核数,频率): $cpu_num $cpu_sum $cpu_hz "
echo -n "内存(GB): $mem_sum"
echo "SSD数量:${ssd_num} SSD容量:${ssd_sum}GB SATA数量:${sata_num} SATA容量:${sata_sum}GB"

2. 统计重要业务程序是否正常可以运行

统计业务进程数量是否为1, 可以将重要的业务交给Supervisord守护进程托管。

#!/bin/bash
sync_redis_status=`ps aux | grep sync_redis.py | grep -v grep | wc -l`
if [ ${sync_redis_status} ne 1 ];then
	echo "Critical! sync_redis is Died"
	exit 2
else
	echo "OK! sync_redis is Alive"
	exit 0
fi

3. 统计机器的IP连接数

#!/bin/bash
# 脚本的$1和$2报警阈值可以根据业务的实际情况进行调整
# $1=5 $2=10
ip_conns=`netstat -an | grep tcp | grep EST | wc -l`
messages=`netstat -ant | awk '/^tcp/ {++S[$NF]} END {for (a in S) print a,S[a]}' | tr -s '\n' ',' | sed -r 's/(.*),/\1\n/g'`

if [[ $ip_conns -lt $1 ]];then
	echo "$messages, OK -connect counts is $ip_conns"
	exit 0
fi
if [[ $ip_conns -gt $1 && $ip_conns -lt $2 ]];then
	echo "$messages, Warning -connect counts is $ip_conns"
	exit 1
fi
if [[ $ip_conns -gt $2 ]];then
	echo "$messages,Critical -connect counts is $ip_conns"
	exit 2
fi

生产环境下的监控类脚本

1. 在Nginx负载均衡器上监控Nginx进程的脚本

系统使用Nginx+keepalived架构,脚本每隔5秒就监控一次Nginx的运行状态,如果发现有问题就关闭本机的keepalived程序,让VIP切换到从Nginx负载均衡器上。

#!/bin/bash
while :
do
	nginxpid=`ps -C nginx --no-header | wc -l`
	if	[[ $nginxpid -eq 0 ]];then
		ulimit -SHn 65535
		/usr/local/nginx/sbin/nginx
		sleep 5
		if [[ $nginxpid -eq 0 ]];then
			/etc/init.d/keepalived stop
		fi
	fi
	sleep 5
done	

2. 系统文件打开数检测脚本

查看Nginx进程下的最大文件打开数:

#!/bin/bash
for pid in `pa aux | grep nginx | grep -v grep | awk '{print $2}'`
do
	cat /proc/${pid}/limits | grep 'Max open files'
done

3. 检测机器的CPU利用率

检测信息包括:user, system, iowait, idle 几个参数。

使用方法

  • 脚本使用提示:
    在这里插入图片描述

  • 使用案例:通过-w设置警告级别阈值:

    执行脚本后打印出CPU使用信息,通过查看执行返回状态码为1,可以判断为CPU使用达到告警状态
    在这里插入图片描述

    通过-s设置严重级别阈值:

    执行脚本后打印出CPU使用信息,通过查看执行返回状态码为2,可以判断为CPU使用达到严重状态
    在这里插入图片描述

    #!/bin/bash
    # CPU Utilization Statistics plugin for Nagios
    #
    # USAGE : . /check cpu utili. sh [-w <user, system, iowait> ][ -c <user,system,iowait>] ([ -i <intervals in second> ] [ -n <report number> ])
    #
    # Exemple: . /check cpu utili. sh
    # ./check_cpu_utili.sh -w 70,40,30 -c 90,60,40
    # . /check_cpu_utili. sh -w 70, 40, 30 -c 90, 60, 40 -i 3 -n 5
    # Paths to commands used in this script. These may have to be modified to match your system setup.
    IOSTAT="/usr/bin/iostat"
    
    # Nagios return codes
    STATE_OK=0
    STATE_WARNING=1
    STATE_CRITICAL=2
    STATE_UNKNOWN=3
    
    # Plugin parameters value if not define
    LIST_WARNING_THRESHOLD="70,40,30"
    LIST_CRITICAL_THRESHOLD="90,60,40"
    INTERVAL_SEC=l
    NUM_REPORT=l
    
    # Plugin variable description
    PROGNAME=$(basename $0)
    
    if [[ ! -x $IOSTAT ]];then
    	echo "UNKNOWN: iostat not found or is not executable by the nagios user."
    	exit $STATE_UNKNOWN
    fi
    
    print_usage() {
    	echo ""
    	echo "$PROGNAME $RELEASE - CPU Utilization check script for Nagios"
    	echo ""
    	echo "Usage: check_cpu_utili.sh -w -c (-i -n)"
    	echo ""
    	echo " -w Warning threshold in % for warn_user,warn_system,warn_iowait
    CPU (default: 70,40,30)"
    	echo " Exit with WARNING status if cpu exceeds warn_n"
    	echo " -c Critical threshold in % for crit user,crit system,crit
    iowait CPU (default : 90,60,40)"
    	echo " Exit with CRITICAL status if cpu exceeds crit_n"
    	echo " -i Interval in seconds for iostat (default : 1)"
    	echo " -n Number report for iostat (default: 3)"
    	echo " -h Show this page"
    	echo ""
    	echo "Usage: $PROGNAME"
    	echo "Usage: $PROGNAME --help"
    	echo ""
    	exit 0
    }
    
    print_help() {
    	print_usage
    	echo ""
    	echo "This plugin will check cpu utilization (user,systerm,CPU Iowait in%)"
    	echo ""
    	exit 0
    }
    
    # Parse parameters
    while [[ "$#" -gt "0" ]]
    do
    	case "$1" in
    		-h | --help)
    			print_help
    			exit $STATE_OK
    			;;
    		-v | --version)
    			print_release
    			exit $STATE_OK
    			;;
    		-w | warning)
    			shift
    			LIST_WARNING_THRESHOLD=$1
    			;;
    		-c | --critical)
    			shift
    			LIST_CRITICAL_THRESHOLD=$1
    			;;
    		-i | --interval)
    			shift
    			INTERVAL_SEC=$1
    			;;
    		-n | --number)
    			shift
    			NUM_REPORT=$1
    			;;
    		*) echo "Unknown argument: $1"
    			print_usage
    			exit $STATE_UNKNOWN
    			;;
    	esac
    	shift
    done
    
    # List to Table for warning threshold (compatibility with
    TAB_WARNING_THRESHOLD=(`echo $LIST_WARNING_THRESHOLD | sed 's/,/ /g'`)
    if [[ "${#TAB_WARNING_THRESHOLD[@]}" -ne "3" ]];then
    	echo "ERROR : Bad count parameter in Warning threshold"
    	exit $STATE_WARNING
    else
    	USER_WARNING_THRESHOLD=`echo ${TAB_WARNING_THRESHOLD[0]}`
    	SYSTEM_WARNING_THRESHOLD=`echo ${TAB_WARNING_THRESHOLD[1]}`
    	IOWAIT_WARNING_THRESHOLD=`echo ${TAB_WARNING_THRESHOLD[2]}`
    fi
    
    # List to Table for critical threshold
    TAB_CRITICAL_THRESHOLD=(`echo $LIST_CRITICAL_THRESHOLD | sed 's/,/ /g'`)
    if [[ "${#TAB_CRITICAL_THRESHOLD[@]}" -ne "3" ]];then
    	echo "ERROR: Bad count parameter in CRITICAL Threshold"
    	exit $STATE_WARNING
    else
    	USER_CRITICAL_THRESHOLD=`echo ${TAB_CRITICAL_THRESHOLD[0]}`
    	SYSTEM_CRITICAL_THRESHOLD=`echo ${TAB_CRITICAL_THRESHOLD[1]}`
    	IOWAIT_CRITICAL_THRESHOLD=`echo ${TAB_CRITICAL_THRESHOLD[2]}`
    fi
    
    if [[ "${TAB_WARNING_THRESHOLD[0]}" -ge "${TAB_CRITICAL_THRESHOLD[0]}" && "${TAB_WARNING_THRESHOLD[1]}" -ge "${TAB_CRITICAL_THRESHOLD[1]}" && "${TAB_WARNING_THRESHOLD[2]}" -ge "${TAB_CRITICAL_THRESHOLD[2]}" ]];then
       echo "ERROR: Critical CPU Threshold lower as Warning CPU Threshol"
       exit $STATE_WARNING
    fi
    
    # 这里是阿里云上的格式,具体的需要根据情况截取
    CPU_REPORT=`iostat -c ${INTERVAL_SEC} ${NUM_REPORT} | sed '/^$/d'| sed -n "3p"`
    CPU_USER=`echo $CPU_REPORT | awk '{print $1}'`
    CPU_SYSTEM=`echo $CPU_REPORT | awk '{print $3}'`
    CPU_IOWAIT=`echo $CPU_REPORT | awk '{print $4}'`
    CPU_STEAL=`echo $CPU_REPORT | awk '{print $5}'`
    CPU_IDLE=`echo $CPU_REPORT | awk '{print $6}'`
    NAGIOS_STATUS="user=${CPU_USER}%, system=${CPU_SYSTEM}%, iowait=${CPU_IOWAIT}%, idle=${CPU_IDLE}%"
    NAGIOS_DATA="CpuUser=${CPU_USER};${TAB_WARNING_THRESHOLD[0]};${TAB_CRITICAL_THRESHOLD[0]};0"
    
    CPU_USER_MAJOR=`echo $CPU_USER | cut -d "." -f 1`
    CPU_SYSTEM_MAJOR=`echo $CPU_SYSTEM | cut -d "." -f 1`
    CPU_IOWAIT_MAJOR=`echo $CPU_IOWAIT | cut -d "." -f 1`
    CPU_IDLE_MAJOR=`echo $CPU_IDLE | cut -d "." -f 1`
    
    # return
    if [[ "${CPU_USER_MAJOR}" -ge "${USER_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK:${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_CRITICAL
    elif [[ "${CPU_SYSTEM_MAJOR}" -ge "${SYSTEM_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK:${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_CRITICAL
    elif [[ "${CPU_IOWAIT_MAJOR}" -ge "${IOWAIT_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK:${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_CRITICAL
    elif [[ "${CPU_USER_MAJOR}" -ge "${USER_WARNING_THRESHOLD}" ]] && [[ "${CPU_USER_MAJOR}" -lt "${USER_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK: ${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_WARNING
    elif [[ "${CPU_SYSTEM_MAJOR}" -ge "${SYSTEM_WARNING_THRESHOLD}" ]] && [[ "${CPU_SYSTEM_MAJOR}" -lt "${SYSTEM_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK: ${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_WARNING
    elif [[ "${CPU_IOWAIT_MAJOR}" -ge "${IOWAIT_WARNING_THRESHOLD}" ]] && [[ "${CPU_IOWAIT_MAJOR}" -lt "${IOWAIT_CRITICAL_THRESHOLD}" ]];then
    	echo "CPU STATISTICS OK: ${NAGIOS_STATUS} | CPU_USER=${CPU_USER}%;70;90;0;100"
    	exit $STATE_WARNING
    else
    	echo "CPU STATISTICS OK:${NAGIOS_STATUS} | CPU_USER= ${CPU_USER}%;70;90;0;100"
    	exit $STATE_OK
    fi 
    

生产环境下的运维开发类脚本

1. 控制shell多进程数量的脚本

脚本启动 run.py 程序,控制进程数量在8个。

#!/bin/bash
# 每5分钟运行一次脚本

CE_HOME=`/data/ContentEngine`
LOG_PATH=`/data/logs`

# 控制程序数量为8
MAX_SPIDER_COUNT=8

# 当前程序数量
count=`ps -ef | grep -v  grep | grep run.py | wc -l`
# 下面的代码逻辑是控制 run.py 进程数量始终为 8,以充分挖掘机器的性能,并且为了防止形成死循环,这
# 里没有采用 while 语句
try_time=0
cd $CE_HOME
while [[ "$count" -lt "$MAX_SPIDER_COUNT" -a "$try_time" -lt "$MAX_SPIDER_COUNT" ]];
do
	let try_time+=1
	nohup python run.py >> ${LOG_PATH}/spider.log 2>&1 &
	count=`ps -ef | grep -v grep | grep run.py | wc -l`
done	
  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一切如来心秘密

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值