Monitoring Clusters With SNMP

MONITORING the performance of hosts in your cluster is somewhat a boring task. I've seen various programs written to access system usage informations with handmade methods, such as filtering output of commands, or reading fields from files in the '/proc' directory. These days I have to write a program to retrieve informations, such as the CPU load, disk and memory usage, from a bunch of hosts, and I've found that SNMP was a good candidate to accomplish this task.

The Simple Network Management Protocol (SNMP) is an application layer protocol that facilitates the exchange of management information between network devices. It is part of the Transmission Control Protocol/Internet Protocol (TCP/IP) protocol suite. SNMP enables network administrators to manage network performance, find and solve network problems, and plan for network growth.

I'll introduce what I've found about SNMP in the following sections, including the installation, configuration of the agent, and how to use the manager the pull informations from the agent.

1. Installation

On my Ubuntu box I just issued one line of command to install all the stuffs:

$ sudo aptitude install snmp snmpd

and on other platforms this should be easy, too.


2. Configuration

Edit the configuration file of the SNMP agent:

$
sudo vi /etc/snmp/snmpd.conf

change the first step of access control section from:

#       sec.name source          community
com2sec paranoid default         public
#com2sec readonly default         public
#com2sec readwrite default         private

to:

#       sec.name source          community
com2sec paranoid localhost         public
com2sec readonly 10.0.0.0/16       public
#com2sec readwrite default         private

this allows hosts belonging to the local network 10.0.0.0/16 to access this box's information in readonly mode.

Then restart the service:

$ sudo /etc/init.d/snmpd restart

and you can test it with:

$ snmpwalk -v 2c -c public localhost system
SNMPv2-MIB::sysDescr.0 = STRING: Linux linkup 2.6.22-14-generic #1 SMP Sun Oct 14 23:05:12 GMT 2007 i686
SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10
DISMAN-EVENT-MIB::sysUpTimeInstance = Timeticks: (1098742) 3:03:07.42
SNMPv2-MIB::sysContact.0 = STRING: Root <root@localhost> (configure /etc/snmp/snmpd.local.conf)
...

However, we've not finished yet. If you issue the following command, which seems to be identical to the former, you'll be surprised to see the result(substitute 10.0.12.4 with your IP address):

$ snmpwalk -v 2c -c public 10.0.12.4 system
Timeout: No Response from 10.0.12.4

This is due to we didn't specify an IP address to listen to. By default, snmpd listen to the loopback interface. To change this, append the following line to the configuration file:

agentaddress 10.0.12.4:161

and restart the service, the problem will be solved.


3. Read informations

After set up on all your nodes, you may want to gather informations from them. The 'snmp' package contains a bunch of tools for sending SNMP requests to the agent, and receiving the responses. For example:

'snmpwalk' retrieve a subtree of management values using SNMP GETNEXT requests. This is useful for observing and debuging your scripts.

'snmpget' communicates with a network entity using SNMP GET requests. This is used to retrieve the informations you need.

All the commands need a OID argument. You can search 'SNMP MIB' with your favourite search engine to learn about it.

To retrieve the CPU load, memory and disk usage from hosts, you need to find the corresponding MIBs for them. I find that my linux box don't give result on the cpu load MIB(.1.3.6.1.2.1.25.3.3.1.2). So I have to get it with the ucdavis MIB. Another thing needs to be noticed is that from the hrStorageTable(.1.3.6.1.2.1.25.2.3) you can get both the memory and disk usage. Issue the following command and check which index is corresponding to your physical and virtual memory:

$ snmpwalk -v 2c -c public 10.0.12.4 .1.3.6.1.2.1.25.2.3.1.3
HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: Memory Buffers
HOST-RESOURCES-MIB::hrStorageDescr.2 = STRING: Real Memory
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Swap Space
HOST-RESOURCES-MIB::hrStorageDescr.4 = STRING: /
HOST-RESOURCES-MIB::hrStorageDescr.5 = STRING: /sys
HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: /boot
HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: /home
HOST-RESOURCES-MIB::hrStorageDescr.8 = STRING: /media/sda1
HOST-RESOURCES-MIB::hrStorageDescr.9 = STRING: /media/sda5
HOST-RESOURCES-MIB::hrStorageDescr.10 = STRING: /media/sda6
HOST-RESOURCES-MIB::hrStorageDescr.11 = STRING: /sys/kernel/security

the result indicates that the index of 2 is for my physical memory, and 3 for my swap space.

Following is a example to retrieve the CPU load, memory and disk usage from hosts:

#!/bin/sh

work_dir=$(dirname $0)

cpu_load_mib=.1.3.6.1.2.1.25.3.3.1.2
storage_unit_mib=.1.3.6.1.2.1.25.2.3.1.4
storage_size_mib=.1.3.6.1.2.1.25.2.3.1.5
storage_used_mib=.1.3.6.1.2.1.25.2.3.1.6
cpu_load_ucd_mib=.1.3.6.1.4.1.2021.10.1.5.1

for host; do
. $work_dir/$host

echo "cpu usage"
cpu_load=
if [ $platform = 'win32' ]; then
    cpu_count=
    for i in $cpu_indexes; do
      load=$(snmpget -v $snmp_version -c $snmp_community $host /
        $cpu_load_mib.$i | cut -d ' ' -f 4)
      cpu_count=$(($cpu_count+1))
      cpu_load=$(($cpu_load+$load))
    done
    cpu_load=$(($cpu_load/$cpu_count))
else
    cpu_load=$(snmpget -v $snmp_version -c $snmp_community $host /
      $cpu_load_ucd_mib | cut -d ' ' -f 4)
fi
echo $cpu_load

echo "memory usage"
mem_size=
mem_used=
for i in $memory_indexes; do
    unit=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_unit_mib.$i | cut -d ' ' -f 4)
    size=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_size_mib.$i | cut -d ' ' -f 4)
    used=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_used_mib.$i | cut -d ' ' -f 4)
    mem_size=$(($mem_size+$unit/1024*$size))
    mem_used=$(($mem_used+$unit/1024*$used))
done
echo $mem_size $mem_used

echo "disk usage"
disk_size=
disk_used=
for i in $disk_indexes; do
    unit=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_unit_mib.$i | cut -d ' ' -f 4)
    size=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_size_mib.$i | cut -d ' ' -f 4)
    used=$(snmpget -v $snmp_version -c $snmp_community $host /
      $storage_used_mib.$i | cut -d ' ' -f 4)
    disk_size=$(($disk_size+$unit/1024*$size))
    disk_used=$(($disk_used+$unit/1024*$used))
done
echo $disk_size $disk_used

done


Following is two sample configuration files to the former script:
$ cat 10.0.15.141
platform=win32
snmp_version=2c
snmp_community=public

cpu_indexes='2 3 4 5'
memory_indexes='8'
disk_indexes='2 3 4'

$ cat 10.0.15.99
platform=linux
snmp_version=2c
snmp_community=public

cpu_indexes=''
memory_indexes='2'
disk_indexes='4'


4. Other tools

tkmib is useful to inspect the MIB tree. On ubuntu you can install it from aptitute.

MRTG monitor SNMP network devices and draw pretty pictures showing how much traffic has passed through each interface.

Cacti is regarded as the best graph front-end for SNMP network and the substitution to MRTG.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
资源包主要包含以下内容: ASP项目源码:每个资源包中都包含完整的ASP项目源码,这些源码采用了经典的ASP技术开发,结构清晰、注释详细,帮助用户轻松理解整个项目的逻辑和实现方式。通过这些源码,用户可以学习到ASP的基本语法、服务器端脚本编写方法、数据库操作、用户权限管理等关键技术。 数据库设计文件:为了方便用户更好地理解系统的后台逻辑,每个项目中都附带了完整的数据库设计文件。这些文件通常包括数据库结构图、数据表设计文档,以及示例数据SQL脚本。用户可以通过这些文件快速搭建项目所需的数据库环境,并了解各个数据表之间的关系和作用。 详细的开发文档:每个资源包都附有详细的开发文档,文档内容包括项目背景介绍、功能模块说明、系统流程图、用户界面设计以及关键代码解析等。这些文档为用户提供了深入的学习材料,使得即便是从零开始的开发者也能逐步掌握项目开发的全过程。 项目演示与使用指南:为帮助用户更好地理解和使用这些ASP项目,每个资源包中都包含项目的演示文件和使用指南。演示文件通常以视频或图文形式展示项目的主要功能和操作流程,使用指南则详细说明了如何配置开发环境、部署项目以及常见问题的解决方法。 毕业设计参考:对于正在准备毕业设计的学生来说,这些资源包是绝佳的参考材料。每个项目不仅功能完善、结构清晰,还符合常见的毕业设计要求和标准。通过这些项目,学生可以学习到如何从零开始构建一个完整的Web系统,并积累丰富的项目经验。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值