CPU | Disk | Network |
You probably know nmon for Linux and AIX if you come to this page... It is a very simple and nice system monitoring and reporting tool developed by IBM engineer Nigel Griffiths. Recently (July 2009) nmon has been released to the OpenSource community.
NMON has for its reporting aspect many tools to represent the captured data. The main one is "nmon analyzer", to be downloaded from http://www.ibm.com/developerworks/aix/library/au-nmon_analyser/. This Excel macro loads a raw nmon file and generates graphs. I find Excel a perfect tool to manipulate the captured data and render as wish.
For more information on this tool and its creator Nigel:
- NMON at IBM DeveloperWorks http://www.ibm.com/developerworks/aix/library/au-analyze_aix/
- NMON Wiki http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmon+Manual
- NMON at SourceForge http://nmon.sourceforge.net
Working sometimes on Solaris, I could not find its equivalent for reporting purpose, especially the ability to setup the tool easily, and to get numerous OS raw measurements and graphs on Excel (as opposed to PDF or custom graphing tool).
So I decided to write such a tool, and I found the easiest way was to start from SAR tool (http://docs.sun.com/app/docs/doc/816-5165/sar-1?a=view) and to add few hooks in order to render system activity in NMON file format.
Sarmon also supports fully RRD output.
CPU | IOStat Service Time | Processes Wait Times |
No warranty given or implied when using sarmon.
Architecture
Sadc, the sar daemon which captures OS activity, has been modified to output also the nmon file. If sadc generates a file called for example sa17, then another file called sa17.hostname_yymmdd_hhmm.nmon is generated too.
sadc output native file format is not changed.
All sarmon code has been placed into two separate files (sarmon.c and sarmon.h) with most of its methods and variables being static. Any hook method placed in sadc.c will have its name prefixed by sarmon_ to avoid any confusion. There are currently 5 hooks (init, snap, close, sleep and one to capture usage per CPU) in sadc.c.
Additionally prstat project code has been used with out any change to log statistics per process and for accounting per zone or project. At the end of prstat.c, some code has been added to output statistics in nmon format.
Also iostat partial code has been used too in order to render mount points and NFS name to the raw block device name.
"Linux" OS is recognized by the analyzer via the "AAA,Linux" line inside the nmon file.
Project Ground Rules
The project will follow the following rules for its design and implementation:
- Minimum change in original SAR project code. Only few hooks shall be added to process nmon features outside original code. One key reason is that any change of sar project can be merged in minutes
- sarmon is an extension to sar, so any command parameter, feature and output shall remain unchanged
- sadc output raw file format shall not be changed. This means any data structure required for extending sar (i.e. monitor each CPU) shall be carried within sarmon code, and shall not be placed in raw sadc files
- sarmon can provide more monitoring feature, output shall be part of nmon report
- sarmon nmon reporting shall be compatible with nmon file format (well, not formally document thought!), so that tools such as "nmon analyzer" can process the file. Currently it has been tested with version 33D and 33e
- sarmon does not need to run as root
Source Code
Original SAR source code has been downloaded from OpenSolaris, under "Common Development and Distribution License" license. Base code version is build 130. Original source code locations can be found at:
- http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/sa/
- http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/prstat/
- http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/stat/c...
I have attached current release here. I will soon move to SourceForge as soon as the project matures a bit so that external parties can contribute to this project.
SourceForge project at http://sourceforge.net/projects/sarmon/
Download SARMON
Source Code | Download from SourceForge |
Binaries (i386 and SPARC) | |
Sample Excel Output |
Fields
Worksheet | Column | Description |
---|---|---|
CPU_ALL CPUnn | User% Sys% Wait% Idle% | Average CPU time %:
|
CPU% | User% + Sys% | |
CPUs | Number of CPUs | |
MEM | memtotal | (in MB) total usable physical memory |
swaptotal | = swapfree + swapused | |
memfree | (in MB) free physical memory. For Solaris file system cache (FSCache) is located inside this area. Same as 'vmstat.memory.free' value in MB | |
swapfree | (in MB) Free swap space Same as 'swap -s.available' Same as 'sar -r.freeswap / 2' (/2 since unit is block size) Close to 'vmstat.memory.swap' value in MB (which does not include reserved space) | |
swapused | (in MB) used swap (reserved + allocated) Same as 'swap -s.allocated' | |
MEMNEW | Not Used | - |
MEMUSE | %rcache %wcache | Cache hit ratio Same as 'sar -b' |
lread lwrite | (/s) accesses of system buffers Same as 'sar -b' | |
pread pwrite | (/s) transfers using raw (physical) device mechanism Same as 'sar -b' | |
%comp | Ignore (negative value) | |
bread bwrite | (/s) transfer of data between system buffers and disk or other block device Same as 'sar -b' | |
VM | minfaults | (pages/s) minor faults (hat and as minor faults) Same as 'vmstat.mf' |
majfaults | (pages/s) major faults | |
pgin pgout | (pages/s) pageins and outs | |
scans | (pages/s) pages examined by pageout daemon Same as 'vmstat.sr' | |
reclaims | (pages/s) pages freed by daemon or auto Same as 'vmstat.re' | |
pgpgin pgpgout | (KB/s) pages paged in and out Same as 'vmstat.pi and po' | |
pswpin pswpout | (KB/s) pages swapped in and out Same as 'vmstat.si and so' | |
pgfree | (KB/s) pages freed by daemon or auto Same as 'vmstat.fr' | |
DISKREAD IOSTATREAD | device name | (KB/s) read from block device (disk or other [nfs, partition, iopath, tape]) Same as 'iostat.kr/s'. For iostat, a disk is referred as a device. |
DISKWRITE IOSTATWRITE | device name | (KB/s) written to block device (disk or other [nfs, partition, iopath, tape]) Same as 'iostat.kw/s'. For iostat, a disk is referred as a device. |
DISKXFER IOSTATXFER | device name | (ops/s) read + write (disk or other [nfs, partition, iopath, tape]) Same as 'iostat.r/s+w/s'. For iostat, a disk is referred as a device. Same as 'sar -d.r/s+w/s' |
DISKBSIZE IOSTATBSIZE | device name | (KB/xfer) Average data size per block device transfer (disk or other [nfs, partition, iopath, tape]) Same as 'iostat.(kr/s+kw/s)/(r/s+w/s)'. For iostat, a disk is referred as a device. |
DISKBUSY IOSTATBUSY | device name | (%) Percent of time the block device is busy (transactions in progress) (disk or other [nfs, partition, iopath, tape]) Same as 'iostat.%b'. For iostat, a disk is referred as a device. Same as 'sar -d.%busy' |
DISKSVCTM IOSTATSVCTM | device name | (ms) Average service time Same as 'sar -d.avserv' Same as 'iostat -xn.asvc_t'. For iostat, a disk is referred as a device. |
DISKWAITTM IOSTATWAITTM | device name | (ms) Average wait time Same as 'sar -d.avwait' Same as 'iostat -xn.wsvc_t'. For iostat, a disk is referred as a device. |
DISK_SUM | Disk Read KB/sec | (KB/s) Total of all disk reads |
Disk Write KB/sec | (KB/s) Total of all disk writes | |
IO/sec | (ops/s) Total of all disk transfers | |
NET | if-read | (KB/s) KB read on this interface |
if-write | (KB/s) KB written to this interface | |
if-total | (KB/s) KB read + written for this interface | |
total-read | (KB/s) KB read for all interfaces | |
total-write | (KB/s) KB written for all interfaces | |
NETPACKET | if-reads/s | (packets/s) packets read on this interface |
if-writes/s | (packets/s) packets written to this interface | |
NETERROR | if-ierrs | (packets/s) incoming packets with error |
if-oerrs | (packets/s) outgoing packets with error | |
if-collisions | (col/s) collisions per second | |
FILE | iget | (/s) translations of i-node numbers to pointers to the i-node structure of a file or device. Calls to iget occur when a call to to namei has failed to find a pointer in the i-node cache. This figure should therefore be as close to 0 as possible Same as 'sar -a.iget/s' |
namei | (/s) calls to the directory search routine that finds the address of a v-node given a path name Same as 'sar -a.lookuppn/s' | |
dirblk | (/s) number of 512-byte blocks read by the directory search routine to locate a directory entry for a specific file Same as 'sar.-a.dirblk/s' | |
readch | (bytes/s) characters transferred by read system call Same as 'sar -c.rchar/s' | |
writech | (bytes/s) characters transferred by write system call Same as 'sar -c.wchar/s' | |
ttyrawch | (bytes/s) tty input queue characters Same as 'sar -y.rawch/s' | |
ttycanch | (bytes/s) tty canonical input queue characters Same as 'sar -y.canch/s' | |
ttyoutch | (bytes/s) tty output queue characters Same as 'sar -y.outch/s' | |
PROC | RunQueue | the average number of kernel threads in the run queue. This is reported as RunQueue on the nmon Kernel Internal Statistics panel. A value that exceeds 3x the number of CPUs may indicate CPU constraint Same as 'sar -q.runq-sz' |
Swap-in | the average number of kernel threads waiting to be paged in Same as 'sar -q.swpq-sz' | |
pswitch | (/s) the number of context switches Same as 'sar -w.pswch/s' | |
syscall | (/s) the total number of system calls Same as 'sar -c.scall/s' | |
read | (/s) the number of read system calls Same as 'sar -c.sread/s' | |
write | (/s) the number of write system calls Same as 'sar -c.swrit/s' | |
fork | (/s) the number of fork system calls Same as 'sar -c.fork/s' | |
exec | (/s) the number of exec system calls Same as 'sar -c.exec/s' | |
sem | (/s) the number of IPC semaphore primitives (creating, using and destroying) Same as 'sar -m.sema/s' | |
msg | (/s) the number of IPC message primitives (sending and receiving) Same as 'sar -m.msg/s' | |
PROCSOL | USR | (%) The percentage of time all processes have spent in user mode (estimation since terminated processes are not accounted) |
SYS | (%) The percentage of time all processes have spent in system mode (estimation since terminated processes are not accounted) | |
TRP | (%) The percentage of time all processes have spent in processing system traps (estimation since terminated processes are not accounted) | |
TFL | (%) The percentage of time all processes have spent processing text page faults (estimation since terminated processes are not accounted) | |
DFL | (%) The percentage of time all processes have spent processing data page faults (estimation since terminated processes are not accounted) | |
LAT | (%) The percentage of time all processes have spent waiting for CPU (estimation since terminated processes are not accounted) | |
WLMPROJECTCPU WLMZONECPU WLMTASKCPU WLMUSERCPU | project name zone name task id username | CPU% for this project or zone or task. This value is approximative since processes that terminated during the previous laps can not be accounted Same as 'prstat -J.CPU' (or -Z or -T or -a) |
WLMPROJECTMEM WLMZONEMEM WLMTASKMEM WLMUSERMEM | project name zone name task id username | MEM% for this project or zone or task Close to 'prstat -J.MEMORY' (or -Z or -T or -a). Since getvmusage (used in prstat to accurately calculate memory usage) is a Solaris 10 private API, it is not used, the memory usage is the sum of memory of all processes |
TOP | PID | process id. Only processes with %CPU >= .1% are listed |
%CPU | (%) average amount of CPU used by this process Same as 'prstat.CPU' | |
%Usr | (%) average amount of user-mode CPU used by this process Equal to 'prstat.CPU * prstat -v.USR / (prstat -v.USR + prstat -v.SYS)' | |
%Sys | (%) average amount of kernel-mode CPU used by this process Equal to 'prstat.CPU * prstat -v.SYS / (prstat -v.USR + prstat -v.SYS)' | |
Threads | Number of LWPs of this process Same as 'prstat.NLWP' | |
Size | (KB) total virtual memory size of this process Same as 'prstat.SIZE' | |
ResSize | (KB) Resident set size of the process Same as 'prstat.RSS' | |
ResData | =0 | |
CharIO | (bytes/s) count of bytes/sec being passed via the read and write system calls | |
%RAM | (%) = 100 * ResSize / total physical memory | |
Paging | (/s) sum of all page faults for this process | |
Command | Name of the process Same as 'prstat.PROCESS' | |
Username | The real user (login) name or real user ID Same as 'prstat.USERNAME' | |
Project | Project name | |
Zone | Zone name | |
USR | (%) of time the process has spent in user mode Same as 'prstat -v.USR' | |
SYS | (%) The percentage of time the process has spent in system mode Same as 'prstat -v.SYS' | |
TRP | (%) of time the process has spent in processing system traps Same as 'prstat -v.TRP' | |
TFL | (%) of time the process has spent processing text page faults Same as 'prstat -v.TFL' | |
DFL | (%) of time the process has spent processing data page faults Same as 'prstat -v.DFL' | |
LCK | (%) of time the process has spent waiting for user locks Same as 'prstat -v.LCK' | |
SLP | (%) of time the process has spent sleeping Same as 'prstat -v.SLP' | |
LAT | (%) of time the process has spent waiting for CPU Same as 'prstat -v.LAT' | |
JFSFILE | mount point | (%) of used disk space Same as 'df.capacity' |
JFSINODE | mount point | (%) of used inode space Same as 'df -o i.%iused' |
Environment Variables
Since sarmon follows sadc syntax, there is no room to alter sarmon behavior from the command line. Environment variables is the mechanism choosen in replacement.
Name | Description |
NMONRRDDIR | If set, sarmon will generate RRD graphs (see bellow) |
NMONDEBUG | If set, sarmon will output debug information on the console |
NMONDEVICEINCLUDE NMONDEVICEEXCLUDE | Use either one to reduce the number of devices shown in DISK* or IOSTAT* graphs. INCLUDE will only include the devices specified, while EXCLUDE will include all devices except the one specified. Device name is the one shown in sar report. Use blank (space) as delimiter. For example: export NMONDEVICEINCLUDE="sd0 sd0,a sd0,h nfs1" |
TIMESTAMP NMON_START NMON_SNAP NMON_END NMON_ONE_IN | Allows external data collectors. Please read nmon wiki for more information |
RRD Support
Sarmon since v1.02 supports RRD output (tested with v1.2.19). To enable this feature set the environment variable NMONRRDDIR to an existing directory prior to starting sarmon. For example:
export NMONRRDDIR=/var/adm/sa/sa12rrd
Sarmon will then output 5 files in a append mode. So if the files already exist, then new lines are added at the end
- genall: script which executes the 3 rrd_ create, update and graph scripts. Execute this script to generate the graphs
- rrd_create: to create the RRD databases
- rrd_update: to insert new values to the databases
- rrd_graph: to generate graphs
- index.html: load with your browser to view graphs
RRD files can be processed real time with the FIFO file approach, for example
mkfifo /var/adm/sa/sa12rrd/rrd_update
Test SARMON
For this, just download the binaries and put sadc inside any location. Then run the command './sadc 5 4 tst1' which will take 20 seconds (4 snapshots, 5 seconds in between) to run. This will output 2 files, tst1 and tst1.xxx.nmon. You can then process the nmon file via the nmon analyzer Excel macro.
How to Install SARMON
Once sarmon has been tested successfully, there are two ways to install SARMON:
- Place the entire bin/ directory content at any location, for example under a standard UNIX user home directory or /usr/local/sarmon, modify sa1 script with correct path. Then setup crontab for that user to run sa1 daily. Refer to /usr/cmd/sa/README or UNIX manual of sar for instructions. For example to run sarmon daily, with snapshots every 10 minutes, add the following entry to crontab of that standard UNIX user (avoid using root)
0 0 * * * /usr/local/sarmon/sa1 300 288 &
- Replace /usr/lib/sa/sadc, /usr/bin/sar and timex by the ones inside the bin/ directory. Make sure you take a backup of the original executables!
How to compile SARMON
SARMON is currently being developed and tested with GCC. Makefile.master has been updated at 2 locations accordingly, search for keyword 'SARMON' to locate the changes.
- Install gcc if not present. Binary can be downloaded from http://sunfreeware.com/. Code has been tested with gcc v3.4.6 (i386) and v3.4.3 (sparc). According to ON documentation, one needs to build a higher version, which makes the task hard. Hence step 5 is required to support an old gcc version
- Install ON build tools SUNWonbld-DATE.PLATFORM.tar.bz2. Binary can be downloaded from http://hub.opensolaris.org/bin/view/downloads/on
- Place source code, for example /a/b/sa
- Setup environment variables as bellow (ksh syntax)
export PATH=/usr/bin:/usr/openwin/bin:/usr/ucb:/usr/ccs/bin export MACH=`uname -p` export CLOSED_IS_PRESENT=no export CW_NO_SHADOW=Y export SRC=/a/b/sa/src/usr
- If building for SPARC (to support old gcc 3.4.3)
export CW_GCC_DIR=/a/b/sa/sparcgcc
- Due to some incorrect inclusion, modify file /usr/include/sys/scsi/adapters/scsi_vhci.h and comment out lines that include mpapi_impl.h and mpapi_scsi_vhci.h
- Go to the correct directory
cd /a/b/sa/src/usr/cmd/sa
- Build the code
make
Report Issues or Request Enhancements
Just click on the "Contact" link inside the top left box. In case of issue, I am glad to track down what went wrong and get sarmon fixed ASAP.
Up-Coming Enhancements
RRD Support:
- TOP graph
- NET: summary of all interfaces (r & w kb/s)
- System summary: CPU %busy and summary disk IO /s
- IO Summary: summary disk IO r+w kb/s and summary disk IO /s
Version History
Version | Date | Notes |
---|---|---|
0.01 | 21-nov-2009 | Initial release
|
0.02 | 29-nov-2009 | Added memory related graphs
nmon filename changed More output on BBBP tab |
0.03 | 11-dec-2009 | Added memory related graphs
Added disk related graphs
Added network related graphs
|
0.04 | 20-dec-2009 | Added VM graphs
Added disk related graphs for TAPE (shows only if tape is available) Added SRM related graphs
|
1.00 | 31-dec-2009 | Support for SPARC (gcc 3.4.3) List all links inside /dev/dsk, /dev/vx/dsk, /dev/md/dsk Allign source code on ON build 130, which includes removing sag (bug 6905472) For device name, use kstat name instead of module name An interface is found in the kstat when type is net, name is not mac, and has 3 properties defined: ifspeed, rbytes (or rbytes64) and obytes (or obytes64) Fix: Interface i/o errors output correctly Show mount points and nfs path |
1.01 | 07-jan-2010 | SAR in version 130 changes iodev time internal from kios.wlastupdate to be ks.ks_snaptime. Need to apply the same change on sarmon Fix: AAA,date value was incorrect Support nmon environment variables (debug, call external scripts, etc) : NMONDEBUG, TIMESTAMP, NMON_START, NMON_SNAP, NMON_END, NMON_ONE_IN nmon consolidator is working fine now Tested with nmon analyzer v. 33e Add project list (projects -l) to BBBP sheet Added to TOP stats: CharIO, Faults, Project and Zone Added JFS related graphs
|
1.02 | 08-feb-2010 | Added devices related graphs
Code cleanup, removed string length limitations, minor optimizations Fix: CPUnn T0001 was missing Validated memory use with Solaris Memory Debuggers (watchmalloc.so.1 and libumem.so.1) Fix: DISK / IOSTAT BSIZE and BUSY where still using old sa time range. Moved to v130 (c.f. v1.01). Disk descriptions have been fixed so that disk summary graph title appears correctly on Excel No code hard limit in CPU, IODEV, Network Interface, Projects and Zones Added process stats
Support of RRD. This required a full rewrite of the output mechanism Sleep time is exact so that there is no time drift |
1.03 | 12-may-2010 | Fix: negative and NaN values were improperly nullified Fix: MEM.memtotal showed 0 for large values Enhancement: mount points are also shown on DISK stats (not limited to partitions, i.e. for devices such as mdnnn) Removed limitation of 99 CPUs Increase MAX_VARIABLES to 255 (number of columns in Excel - 1 for date time column). Some users faced "exceeded number of variables per line" with a high number of disks attached Added task related graphs
|
1.04 | TBD | Remove completely limit of columns (MAX_VARIABLES) Ability to select only a subset of devices to be part of nmon report via environment variables NMONDEVICEINCLUDE and NMONDEVICEEXCLUDE |