SNMP Informant How-To

转自http://www.opennms.org/index.php/SNMP_Informant_How-To
From OpenNMS

People using OpenNMS often wonder how to get SNMP information, such as traps and OIDs for data collection, into OpenNMS.

Recently, I did a rather complete examination of the SNMP-Informant standard MIB for a client, so I thought I would share the process in the Wiki page.

The first step in adding MIB information to OpenNMS is to find the MIB (grin). For SNMP-Informant, there is a MIBs directory in the folder that comes with the distribution. In it are both version 1 and version 2 mibs - it really doesn't matter which one we use.

There are two MIBs for the standard SNMP Informant agent: INFORMANT-STD.MIB and WTCS.MIB

The second step is to determine exactly what you want to get out of the MIB. There are two, distinctly different things: traps to convert to events and OIDs to collection for performance data.

A quick search for TRAP-TYPE and NOTIFICATION-TYPE in these two MIBs shows that neither contain traps, so we can ignore that here. Should you want to get trap information into OpenNMS, you need to use the mib2opennms tool, discussed elsewhere.

For data collection, there is another tool called the mibparser that will convert the OIDs in a MIB to a format that can be placed in the datacollection-config.xml file. There is even a convenient wrapper script to run it:

(Note: This wrapper works with Java 1.4.x ONLY)

$OPENNMS_HOME/contrib/mibparser/dist/parseMib.sh INFORMANT-STD.MIB

This gives me the error:

 ERROR: can't find parent 'informant' for textOid 'standard'
 Find which MIB the parent is defined in and add that to the command line

Since "informant" is defined in the WTCS.MIB file, I need to add that to my command:

$OPENNMS_HOME/contrib/mibparser/dist/parseMib.sh WTCS.MIB INFORMANT-STD.MIB

This returns a lot of output in a format that can be used in the datacollection-config.xml file.

Rather than post it here (I'm going to pretty much post it all anyway), I'll break it out later in the document.

Once I have successfully produced output from a MIB, I check it out to see how easy it will be to add it to OpenNMS. The things to look for are whether or not the data is in a table, and whether or not the data type is numeric.

This MIB provides for main areas of information: disk, memory, network, processes/threads and cpu.

Since the memory and processes information is not stored in a table, it's real easy to configure that and it is already included in the basic datacollection-config.xml file.

For example, the output for the mibParser looks like this:

 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.1" instance="0" alias="memoryAvailableBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.2" instance="0" alias="memoryAvailableKBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.3" instance="0" alias="memoryAvailableMBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.4" instance="0" alias="memoryCommittedBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.5" instance="0" alias="memoryCacheBytes" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.6" instance="0" alias="memoryCacheBytesPeakTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.7" instance="0" alias="memoryPageFaultsPerSecTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.8" instance="0" alias="memoryPagesInputPerSecTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.9" instance="0" alias="memoryPagesOutputPerSecTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.10" instance="0" alias="memoryPagesPerSec" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.11" instance="0" alias="memoryPoolNonpagedBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.12" instance="0" alias="memoryPoolPagedBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.13" instance="0" alias="memoryPoolPagedResidentBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.14" instance="0" alias="memorySystemCacheResidentBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.15" instance="0" alias="memorySystemCodeResidentBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.16" instance="0" alias="memorySystemCodeTotalBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.17" instance="0" alias="memorySystemDriverResidentBytesTOOLONG" type="Gauge32" />
 <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.18" instance="0" alias="memorySystemDriverTotalBytesTOOLONG" type="Gauge32" />

Note that the instance is numeric ("0") which means the data is not in a table. Since RRDTool/jRobin can only store numeric data, it also helps that the data type on all of these values is "Gauge32".

You'll note that the alias for most of these OIDs has the letters "TOOLONG" in it. RRDTool has a 19 character limitation, and this is the parser's way of indicating that something needs to be changed. I also like to indicate in the alias name what device/MIB the data is from, so this ends up in datacollection-config.xml as:

<group  name="snmpinformant-memory" ifType="ignore">
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.3" instance="0" alias="sinfMemAvailMB" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.4" instance="0" alias="sinfMemComBytes" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.5" instance="0" alias="sinfMemCacheBytes" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.6" instance="0" alias="sinfMemCacheBytesPk" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.7" instance="0" alias="sinfMemPageFaultsPS" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.8" instance="0" alias="sinfMemPagesInputPS" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.9" instance="0" alias="sinfMemPagesOutPS" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.10" instance="0" alias="sinfMemPagesPerSec" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.11" instance="0" alias="sinfMemPNonpagedByt" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.12" instance="0" alias="sinfMemPPagedBytes" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.13" instance="0" alias="sinfMemPPagedResByt" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.14" instance="0" alias="sinfMemSysCacheResB" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.15" instance="0" alias="sinfMemSysCodeResB" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.16" instance="0" alias="sinfMemSysCodeTotB" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.17" instance="0" alias="sinfMemSysDrvResB" type="Gauge" />
        <mibObj oid=".1.3.6.1.4.1.9600.1.1.2.18" instance="0" alias="sinfMemSysDrvTotB" type="Gauge" />
</group>

Note that each alias is 19 characters or less, and that "sinf" for SNMP-Informant has been prefixed to each one.

The other three groups in this MIB, which reside in tables, are not so easy. The problem lies with how SNMP-Informant uses instances. For example, this is the available information for disks (output from the mibparser):

<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.1" instance="lDiskInstance" alias="lDiskInstance" type="InstanceName" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.2" instance="lDiskInstance" alias="lDiskPercentDiskReadTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.3" instance="lDiskInstance" alias="lDiskPercentDiskTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.4" instance="lDiskInstance" alias="lDiskPercentDiskWriteTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5" instance="lDiskInstance" alias="lDiskPercentFreeSpaceTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.6" instance="lDiskInstance" alias="lDiskPercentIdleTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.7" instance="lDiskInstance" alias="lDiskAvgDiskQueueLengthTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.8" instance="lDiskInstance" alias="lDiskAvgDiskReadQueueLengthTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.9" instance="lDiskInstance" alias="lDiskAvgDiskWriteQueueLengthTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.10" instance="lDiskInstance" alias="lDiskAvgDiskSecPerReadTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.11" instance="lDiskInstance" alias="lDiskAvgDiskSecPerTransferTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.12" instance="lDiskInstance" alias="lDiskAvgDiskSecPerWriteTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.13" instance="lDiskInstance" alias="lDiskCurrentDiskQueueLengthTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.14" instance="lDiskInstance" alias="lDiskDiskBytesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.15" instance="lDiskInstance" alias="lDiskDiskReadBytesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.16" instance="lDiskInstance" alias="lDiskDiskReadsPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.17" instance="lDiskInstance" alias="lDiskDiskTransfersPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.18" instance="lDiskInstance" alias="lDiskDiskWriteBytesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.19" instance="lDiskInstance" alias="lDiskDiskWritesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.20" instance="lDiskInstance" alias="lDiskFreeMegabytes" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.21" instance="lDiskInstance" alias="lDiskSplitIOPerSec" type="Gauge32" />

You'll see that the "lDiskInstance" index into the table. This is where things get really weird.

First, you'll need to run "diskperf -y" as an Administrator on the command line of the target windows boxes, and you'll need to reboot to get any information about disks at all. On my lone Windows box, I have two disk drives, C: and D:. If I run:

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.1.1.1
 SNMPv2-SMI::enterprises.9600.1.1.1.1.1.2.67.58 = STRING: "C:"
 SNMPv2-SMI::enterprises.9600.1.1.1.1.1.2.68.58 = STRING: "D:"
 SNMPv2-SMI::enterprises.9600.1.1.1.1.1.6.95.84.111.116.97.108 = STRING: _Total"

You'll see that there are three instances listed: C:, D: and _Total.

Here's the weird part. Note that the instance for the first one is "2.67.58". In ASCII, the .2 is unprintable but 67:58 is "C:" and 68:58 is "D:". Thus it becomes pretty easy to understand which instance you'll need to collect, but could get weird for oddly named drives.

Now it becomes an exercise in cut and paste. Rather than paste the whole disk group for SNMP-Informant, let's take a look at one OID.

I looked at "lDiskPercentFreeSpace" and figured that would be a good place to start, since many people want to know when their disks are full.

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.1.1.5
 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.2.67.58 = Gauge32: 3
 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.2.68.58 = Gauge32: 98
 SNMPv2-SMI::enterprises.9600.1.1.1.1.5.6.95.84.111.116.97.108 = Gauge32: 75

It is pretty dead on. My C: drive is full while my D: drive is pretty empty. Note that the total drive space percentage is also available (although I am not sure how that is calculated).

If I wanted to collect this information, I would need to edit datacollection-config.xml and add something like:

<group  name="snmpinformant-disk" ifType="ignore">
       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.67" instance="58" alias="sinfDskPtFreeSpcC" type="Gauge32" />
       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.68" instance="58" alias="sinfDskPtFreeSpcD" type="Gauge32" />
       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.2.69" instance="58" alias="sinfDskPtFreeSpcE" type="Gauge32" />
       <mibObj oid=".1.3.6.1.4.1.9600.1.1.1.1.5.6.95.84.111.116.97" instance="108" alias="sinfDskPtFreeSpcTl" type="Gauge32" />
</group>

And then add the "snmpinformant-disk" entry to the system definitions at the bottom of the file. Note that I changed to alias names to reflect SNMP-Informant and fit within 19 characters.

Now, adding this to datacollection-config.xml and restarting OpenNMS will (should) start data collection.

The next step will be to add reports for these variables. Editing the snmp-graph.properties file and finding the SNMP Informant section, I added the following report:

report.sinf.diskfreeC.name=Available Disk Space (Drive C) (SNMP-Inf)
report.sinf.diskfreeC.columns=sinfDskPtFreeSpcC
report.sinf.diskfreeC.type=node
report.sinf.diskfreeC.command=--title="Windows Available Space Disk Drive C (SNMP-Informant)" /
 DEF:availspace={rrd1}:sinfDskPtFreeSpcC:AVERAGE /
 LINE2:availspace#ff0000:"% Avail." /
 GPRINT:availspace:AVERAGE:"Avg //: %10.2lf %s" /
 GPRINT:availspace:MIN:"Min //: %10.2lf %s" /
 GPRINT:availspace:MAX:"Max //: %10.2lf %s//n"

This will need to be repeated for all the other disks as well as adding it to the reports= line at the top of the file.

Finally, we want to know when the available disk gets to, say, 5%, so edit the thresholds.xml file and add:

<!-- SNMP Informant thresholds -->
<threshold type="low" ds-name="sinfDskPtFreeSpcC"  ds-type="node" value="5" rearm="10" trigger="1"/>
<threshold type="low" ds-name="sinfDskPtFreeSpcD"  ds-type="node" value="5" rearm="10" trigger="1"/>
<threshold type="low" ds-name="sinfDskPtFreeSpcE"  ds-type="node" value="5" rearm="10" trigger="1"/>
<threshold type="low" ds-name="sinfDskPtFreeSpcTl"  ds-type="node" value="5" rearm="10" trigger="1"/>

The next thing to look at are CPU stats:

<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.1" instance="cpuInstance" alias="cpuInstance" type="InstanceName" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.2" instance="cpuInstance" alias="cpuPercentDPCTime" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.3" instance="cpuInstance" alias="cpuPercentInterruptTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.4" instance="cpuInstance" alias="cpuPercentPrivilegedTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5" instance="cpuInstance" alias="cpuPercentProcessorTimeTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.6" instance="cpuInstance" alias="cpuPercentUserTime" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.7" instance="cpuInstance" alias="cpuAPCBypassesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.8" instance="cpuInstance" alias="cpuDPCBypassesPerSecTOOLONG" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.9" instance="cpuInstance" alias="cpuDPCRate" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.10" instance="cpuInstance" alias="cpuDPCsQueuedPerSec" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.11" instance="cpuInstance" alias="cpuInterruptsPerSec" type="Gauge32" />

I only have one CPU on my machine, but I get:

 $ snmpwalk -v 1 -c public butters.opennms.com .1.3.6.1.4.1.9600.1.1.5.1.1
 SNMPv2-SMI::enterprises.9600.1.1.5.1.1.1.48 = STRING: "0"
 SNMPv2-SMI::enterprises.9600.1.1.5.1.1.6.95.84.111.116.97.108 = STRING: "_Total"

and as you can see, we get both a single CPU and _Total.

The statistic that most people are interested in is how busy is the CPU? From the SNMP Informant MIB:

 cpuPercentProcessorTime OBJECT-TYPE
    SYNTAX     Gauge32
    MAX-ACCESS read-only
    STATUS     current
    DESCRIPTION
            "% Processor Time is the percentage of time
            that the processor is executing a non-Idle
            thread.  This counter was designed as a primary
            indicator of processor activity.  It is
            calculated by measuring the time that the
            processor spends executing the thread of the
            Idle process in each sample interval, and
            subtracting that value from 100%.  (Each
            processor has an Idle thread which consumes
            cycles when no other threads are ready to run).
            It can be viewed as the percentage of the
            sample interval spent doing useful work.  This
            counter displays the average percentage of busy
            time observed during the sample interval.  It
            is calculated by monitoring the time the
            service was inactive, and then subtracting that
            value from 100%."
    ::= { processorEntry 5 }

I especially liked "This counter was designed as a primary indicator of processor activity" since that is what we are looking for. So a value of 100% would be bad if sustained.

So off to modify datacollection-config.xml again:

<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="48" alias="sinfCpuPtProcTime0" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="49" alias="sinfCpuPtProcTime1" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="50" alias="sinfCpuPtProcTime2" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.1" instance="51" alias="sinfCpuPtProcTime3" type="Gauge32" />
<mibObj oid=".1.3.6.1.4.1.9600.1.1.5.1.5.6.95.84.111.116.97" instance="108" alias="sinfCpuPtProcTimeTl" type="Gauge32" />

This will collect the values we want.

And now for a sample graph to place in snmp-graph.properties:

 report.sinf.cpu0percent.name=Windows CPU 0 Percent Processor Time (SNMP-Inf)
 report.sinf.cpu0percent.columns=sinfCpuPtProcTime0
 report.sinf.cpu0percent.type=node
 report.sinf.cpu0percent.command=--title="Windows CPU 0 Utilization (SNMP-Informant)" /
  DEF:utilization={rrd1}:sinfCpuPtProcTime0:AVERAGE /
  LINE2:utilization#ff0000:"% util." /
  GPRINT:utilization:AVERAGE:"Avg //: %10.2lf %s" /
  GPRINT:utilization:MIN:"Min //: %10.2lf %s" /
  GPRINT:utilization:MAX:"Max //: %10.2lf %s//n"

Remember to add it to the "reports=" line at the top of the file.

For thresholds, it's similar to above:

<threshold type="high" ds-name="sinfCpuPtProcTime0"  ds-type="node" value="100" rearm="90" trigger="3"/>
<threshold type="high" ds-name="sinfCpuPtProcTime1"  ds-type="node" value="100" rearm="90" trigger="3"/>
<threshold type="high" ds-name="sinfCpuPtProcTime2"  ds-type="node" value="100" rearm="90" trigger="3"/>
<threshold type="high" ds-name="sinfCpuPtProcTime3"  ds-type="node" value="100" rearm="90" trigger="3"/>
<threshold type="high" ds-name="sinfCpuPtProcTimeTl"  ds-type="node" value="100" rearm="90" trigger="3"/>

This will require three consecutive polls where the CPU is at 100% before the alarm will be raised.  

----------------------------------------------------------------------------------

评论:

1.There are two, distinctly different things: traps to convert to events and OIDs to collection for performance data.

采集SNMP数据和接收是OpenNMS可以通过SNMP协议完成的两个动作。涉及两个工具mib2opennms,和mibparser,mib2opennms负责将mib中的snmp trap转化为OpenNMS可认识的数据格式,mibparser则负责将mib转化为OpenNMS采集格式协助管理员配置datacollection-config.xml文件。本文重点介绍了OpenNMS的mibparser的使用   

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值