mpstat输出列的含义

http://www.c0t0d0s0.org/~3/k3bizFzXMq0/6546-Meet-the-stats-today-mpstat.html
Meet the stats - today: mpstat

Monday, May 10. 2010

<!--
In this installment of the "Meet the Stats" series i want to talk with you about the mpstat. In my opinion, mpstat is one of the most useful tools to find what your processors are really doing.

Using mpstat

Let's execute mpstat on a system. I've used my fileserver for this task on Saturday morning, it's a system with four cores. So mpstat reports 4 lines to me.
$ mpstat 1
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 4 0 28 679 154 573 3 12 8 0 767 1 2 0 96
1 4 0 22 504 145 485 2 9 4 0 661 1 2 0 97
2 4 0 30 579 81 425 3 12 6 0 519 1 2 0 97
3 5 0 25 505 250 517 3 12 5 0 758 1 3 0 96
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 0 0 9 567 182 372 0 2 0 0 338 0 1 0 99
1 0 0 21 454 174 468 1 1 2 0 468 1 1 0 98
2 0 0 12 480 15 304 1 4 1 0 249 2 1 0 97
3 27 0 15 157 68 147 1 2 0 0 422 0 1 0 99
jmoekamp@hivemind:~$


Internals

You need some knowledge about the inner workings of an operating system to really understand the output of this command, but a basic understanding is relatively easy to reach.
  • All those operations are normal occurrences in a Solaris system. You can't say "oh ... i have too much events of this kind" just by looking at this numbers, because the observed pattern is possibly normal for your load. So it's very reasonable to run mpstat from time to time during load times just to get a baseline. Debugging in the event of a major fsckup is much easier with such historical data because otherwise you look for a pattern that is looks pathological, but it is maybe just the way things go in your application.
  • Forget about this wt column. It's the "wait time" column, but it isn't computed anymore, it's simply set to zero. The reason for keeping this column is the binary compatibility guarantee. You can't leave it out because one column less could break programs and you can't fill it with a dash, as programs may expect a number here.
  • When not otherwise stated the numbers are "events per second". Exceptions are the last four columns and obviously the first one.


A recommended reading

In the following description i sacrified complete correctness for understandability, as i simplified some of the dependencies.To understand the full implications of all the numbers presented by all the *stat commands you should start to gather some knowledge about the internals of Solaris. There is an excellent book about it. It's " Solaris Internals: Solaris 10 and Open Solaris Kernel Architecture
" written by Richard McDougall and Jim Mauro. The ISBN of this great book is 0-13-148209-2.

DTrace examples

The DTrace examples are from my standard cheat sheet i'm using at customer site. However they aren't mine, i've gathered them from the Prefetch.net's Dtrace Cookbook.

The numbers and their meaning

sys,usr and idl have obvious meanings, they tell you the percentage of time the system stays in kernelland, in userland or idling. When you really want to know something about the load on your system, look at this values and forget about load average.

The other values are a little more difficult to explain. Basically you can divide the columns 2-12 into three groups: The virtual memory part (row 2 and 3), the interrupt part (row 4-7), the scheduling part (8-11) and the locks part (row 11-12)

The virtual memory part
To understand the meaning of the both rows regarding the virtual memory subsystem you should have some knowledge about the concept of virtual memory. But i will try to give you some insight into this part without getting overly complex.

At first it's important to know that memory is organized in pages. Those pages are chunks of memory. The possible sizes of the page are hardware dependent, but let's assume that we have pages in the size of 8 Kilobytes.

As you may know, modern operating system doesn't allocate real memory when your application is requesting for memory. Instead it allocated virtual memory.

When you access a memory page the first time, a page fault occurs. This page fault leads to the mapping of a physical memory page to the virtual memory page.

The mapping is done by adding an entry to the Hash Page Table. And here the minor and major page faults differs:
  • minf:
    When the memory subsystem doesn't find a mapping in the Hash Page Table, but knows that a page with the same content is on the list of free pages, a minor fault occurs. The page is just inserted in the Hash Page Table and the system works with the data in the memory already there. You can measure what applications create minor page faults with a short dtrace oneliner:
    #  dtrace -n 'vminfo:::as_fault{@execs[execname]=count()}'
    dtrace: description 'vminfo:::as_fault
    ' matched 1 probe
    ^C

    dtrace 104
    jmoekamp@hivemind:~#
  • majf:
    A major fault has much more severe consequences. It occurs, when there is no mapping to a physical page in the hash page table and the content of the page was migrated to the swap space. Obviously takes some time.
    # dtrace -n 'vminfo:::maj_fault{@execs[execname] = count() }'
    dtrace: description 'vminfo:::maj_fault' matched 1 probe
    ^C

Obviously major faults have a bigger impact to the system performance than minor faults, as the second one doesn't need to access a rotating rust device aka hard disk. However even minor faults can have a significant impact to performance. But that's enough stuff for an own article or an evening with the book mentioned above.

The interrupt part

  • xcal:
    xcal's or cross calls is a special kind of interrupt. Whenever a processors need another processor to do something for them, a so-called cross call is issued. There are several reasons to issue cross calls like updating certain tables on other processors.
    # dtrace -n 'sysinfo:::xcalls{@execs[execname] = count();}'
    dtrace: description 'sysinfo:::xcalls' matched 1 probe
    ^C

    firefox-bin 6
    thunderbird-bin 6
    VBoxHeadless 9
    dtrace 24
    pageout 1660
    sched 1834
    jmoekamp@hivemind:~#
  • intr:
    An interrupt interrupts preempts the current work on the processor and forces it to execute the code needed to handle the interrupt, for example to trigger the processing incoming network packages. To get some insight into the drivers generating interrupts, it's interesting to use the intrstat command:
    jmoekamp@hivemind:~# intrstat
    device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim
    -------------+------------------------------------------------------------
    [...]
    e1000g#0 | 0 0,0 0 0,0 0 0,0 0 0,0
    e1000g#1 | 0 0,0 0 0,0 0 0,0 0 0,0
    ehci#0 | 0 0,0 0 0,0 0 0,0 0 0,0
    ehci#1 | 0 0,0 0 0,0 0 0,0 0 0,0
    hci1394#0 | 0 0,0 0 0,0 123 0,0 0 0,0
    [...]
    pci-ide#0 | 0 0,0 0 0,0 247 0,3 0 0,0
    rge#0 | 0 0,0 0 0,0 0 0,0 11 0,0
  • ithr:
    ithr or "interrupts as threads" refers to a special mechanism to handle those interrupts. Many interrupts are handled in threads that are triggered by an interrupts. This column counts the interrupts handled by such threads.

Interrupts are important for the operation of the system, however they interrupt (they are called "interrupts" for a reason) the application running on this processor. Thus a high number of interrupts can lead into a situation, where many interrupts significantly slows down the application.

There are some tricks to reduce this interruption. For example you can force the interrupts on a subset of all processors by declaring most of the CPUs as "non-interrupt".

The scheduling part

  • csw:
    Context switches take place, when a currently running thread doesn't have anything to compute on the processor. For example because it wait's for data from the disk. The process gives back the processor to scheduling and a different process is scheduled on the proc. As the other process has a totally different set of register contents for example, the OS has to switch from the context of the old process to the one of the new thread. This is called context switch. Obviously there is a performance penalty bound to this event, as the switching takes some time. When you want to know what processes causes the context switches, you can use the sysinfo:::psswitch probe helps you:
    # dtrace -n 'sysinfo:::pswitch{@execs[execname] = count(); }'
    dtrace: description 'sysinfo:::pswitch' matched 3 probes
    ^C

    fmd 1
    [...]
    VBoxHeadless 2054
    sched 9657
    jmoekamp@hivemind:~#
  • icsw:
    involuntary context switches is the forced variant of a context switch. Whenever a processor has consumed it's time slice or when a higher priority process is ready for execution, an involuntary context switch is done. It just forces the process off from the processor.
    # dtrace -n 'sysinfo::preempt:inv_swtch{@execs[execname] = count();}'
    dtrace: description 'sysinfo::preempt:inv_swtch
    ' matched 1 probe
    ^C

    VBoxHeadless 1
    VBoxSVC 1
    gam_server 1
    thunderbird-bin 2
    firefox-bin 3
    gnome-netstatus- 3
    Obviously a large number of involuntary context switches should be avoided.
  • migr:
    A "thread migrations" is counted when a process is scheduled on a different processor than it's last time. This can have a certain a big performance impact, as the caches in the processor aren't warmed for the process, thus leading to more cache misses thus leading to more accesses to the slower main memory instead to the caches.
    # dtrace -n ' sched:::off-cpu{self->cpu = cpu;}

    sched:::on-cpu /self->cpu != cpu/
    {
    printf("%s migrated from cpu %d to cpu %d\n",execname,self->cpu,cpu);
    self->cpu = 0;
    }'
    dtrace: description ' sched:::off-cpu
    ' matched 6 probes
    ^C
    CPU ID FUNCTION:NAME
    2 10067 resume:on-cpu firefox-bin migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu thunderbird-bin migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu nskernd migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu nskernd migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu VBoxHeadless migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu sched migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu gnome-netstatus- migrated from cpu 0 to cpu 2
    2 10067 resume:on-cpu gnome-netstatus- migrated from cpu 0 to cpu 2

The locks part.

  • smtx:
    smtx or "spins on mutexes" reports how often the code flow on the processor wasn't able to gather a mutex lock. Mutex is a shorthand for "Mutual Exclusion". A mutex lock provides exclusive read and write access to the thread owning it.
    # dtrace -n 'lockstat:::adaptive-spin, lockstat:::adaptive-block
    > {
    > @execs[execname,probename] = count();
    > }'
    dtrace: description 'lockstat:::adaptive-spin, lockstat:::adaptive-block
    ' matched 2 probes
    ^C

    gnome-netstatus- adaptive-spin 2
    sched adaptive-block 5
    zpool-datapool adaptive-block 5
    VBoxHeadless adaptive-spin 21
    zpool-datapool adaptive-spin 78
    sched adaptive-spin 262
    #
    The number of spins is an interesting number because of the nature of spin locks. Imagine this lock like a lock at a lavatory door, where the "Busy/Vacant" part is out of order. You have two ways to get to the lavatory. Shaking every few seconds at the door to check if it's still closed or you can leave it, doing something else and wait for a few minutes and then check again. The spin lock is the equivalent to this very annoying person rattling at the door. Being annoying is no problem in the computer, but rechecking it again and again uses your clock cycles you could use better. However it has a big advantage: You get the lavatory immediately when it's free and you don't have to do a context switch from doing a telephone call and going to the lavatory. Of course the reality is a little bit more complex than the lavatory door, but it should get you the picture. It's the same with spin locks: There is a tight loop that tries to acquire the lock again and again, until it get's the lock and stays on a CPU until the time quantum for the thread is used up or a thread with a higher priority leads to the preemption of the thread. So counting the "spinning on mutexes" event is a good indication, how often your computer rattles at the lavatory door and burns CPU cycles while doing so. Furthermore it's a good indication if there are any highly contended locks in the codepath you are using, as it get's more probable that a lock has to spin, when many threads want to use the same codepath synchronized by this lock. However this is just a vastly simplified description, the handling of locks in Solaris is an own article,too or another evening with the book already mentioned.
  • srw:
    srw or "spins on reader/writer locks" counts the number of spins on reader/writer locks. rwlocks are another kind of locks in Solaris. They allow just one thread to own the write lock, but other threads just readind are able to to acquire it. When you want to know what processes spin are responsible for the srw events, a dtrace one-liner can help you:
    # dtrace -n 'lockstat:::rw-block
    {
    @execs[execname] = count();
    }'
    dtrace: description 'lockstat:::rw-block
    ' matched 1 probe
    ^C

    VBoxHeadless 8
    #


Do you want to learn more?


man pages
docs.sun.com: intrstat
docs.sun.com: mpstat

Misc
Prefetch.net: Dtrace Cookbook

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/228190/viewspace-662358/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/228190/viewspace-662358/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值