A more detailed threshold we should monitor on UNIX side

最新推荐文章于 2024-06-12 21:30:30 发布

cjysdw2000

最新推荐文章于 2024-06-12 21:30:30 发布

阅读量96

点赞数

this is an example of SUN solaris data, it may be a bit different on other platform.

FYI:

[@more@]

The iostat output contains summary information for all devices.

Field	Description
r/s	Shows the number of reads/second
w/s	Shows the number of writes/second
kr/s	Shows the number of kilobytes read/second
kw/s	Shows the number of kilobytes written/second
wait	Average number of transactions waiting for service (queue length)
actv	Average number of transactions actively being serviced
wsvc_t	Average service time in wait queue, in milliseconds
asvc_t	Average service time of active transactions, in milliseconds
%w	Percent of time there are transactions waiting for service
%b	Percent of time the disk is busy
device	Device name

What to look for

Average service times greater than 20msec for long duration.

High average wait times.

Field Descriptions

Field	Description
cpu	Processor ID
minf	Minor faults
mif	Major Faults
xcal	Processor cross-calls (when one CPU wakes up another by interrupting it).
intr	Interrupts
ithr	Interrupts as threads (except clock)
csw	Context switches
icsw	Involuntary context switches
migr	Thread migrations to another processor
smtx	Number of times a CPU failed to obtain a mutex
srw	Number of times a CPU failed to obtain a read/write lock on the first try
syscl	Number of system calls
usr	Percentage of CPU cycles spent on user processes
sys	Percentage of CPU cycles spent on system processes
wt	Percentage of CPU cycles spent waiting on event
idl	Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

Involuntary context switches (this is probably the more relevant statistic when examining performance issues.)

Number of times a CPU failed to obtain a mutex. Values consistently greater than 200 per CPU causes system time to increase.

xcal is very important, show processor migration

Section 1: Netstat -ain

Field	Description
name	Device name of interface
Mtu	Maximum transmission unit
Net	Network Segment Address
address	Network address of the device
ipkts	Input packets
Ierrs	Input errors
opkts	Output Packets
Oerrs	Output errors
collis	Collisions
queue	Number in the Queue

The information in Section 1 will help diagnose network problems when there is connectivity but response is slow.

Values to look at:

Collisions (Collis)

Output packets (Opkts)

Input errors (Ierrs)

Input packets (Ipkts)

Network collision rate = Output collision / Output packets

For a switched network, the collisions should be 0.1 percent or less (see the Cisco web site as a reference) of the output packets

vmstat output is actually broken up into six sections: procs, memory, page, disk, faults and CPU. Each section is outlined in the following table.

Field	Description
PROCS
r	Number of processes that are in a wait state and basically not doing anything but waiting to run
b	Number of processes that were in sleep mode and were interrupted since the last update
w	Number of processes that have been swapped out by mm and vm subsystems and have yet to run
MEMORY
swap	The amount of swap space currently available free The size of the free list
PAGE
re	page reclaims
mf	minor faults
pi	kilobytes paged in
po	kilobytes paged out
fr	kilobytes freed
de	anticipated short-term memory shortfall (Kbytes)
sr	pages scanned by clock algorithm
DISK
Bi	Disk blocks sent to disk devices in blocks per second
FAULTS
In	Interrupts per second, including the CPU clocks
Sy	System calls
Cs	Context switches per second within the kernel
CPU
Us	Percentage of CPU cycles spent on user processes
Sy	Percentage of CPU cycles spent on system processes
Id	Percentage of unused CPU cycles or idle time when the CPU is basically doing nothing

What to look for

The following information should be used as a guideline and not considered hard and fast rules. The information documented below comes from Adrian Cockcroft's book, Sun Performance Tuning. Other operating systems like HP and Linux may have different thresholds.

Large run queue. Adrian Cockcroft defines anything over 4 processes per CPU on the run queue as the threshold for CPU saturation. This is certainly a problem if this last for any long period of time.

CPU utilization. The amount of time spent running system code should not exceed 30% especially if idle time is close to 0%.

A combination of large run queue with no idle CPU is an indication the system has insufficient CPU capacity.

Memory bottlenecks are determined by the scan rate (sr) . The scan rate is the pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously over 200 pages per second then there is a memory shortage.

Disk problems may be identified if the number of processes blocked exceeds the number of processes on run queue.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/67/viewspace-1002812/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/67/viewspace-1002812/