NAME

      collectl - Collects data that describes the current system status.

简单翻译成中文就是:收集当前系统状态数据并予以显示


Collectl是一个系统指标收集工具。可以守护进程方式和交互方式运行。支持从一系列的子系统中收集数据。包含一个Graphite接口,使得数据可以轻易地传递给Graphite进行存储。


下面是官方的介绍:

There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation. Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.


下载: http://sourceforge.net/projects/collectl/files/


安装就不啰嗦了,非常简单!rpn包或源码安装!


使用使用介绍


collectl有三种运行模式:

1. Interactive Mode(交互模式): This is the default and in this mode data is read from /proc and passes through analyze.


2. Record Mode(记录模式):read data from live system and write to file or display on terminal

使用语法:collectl [-f file] [options]


3. Playback Mode(回放模式):read data from one or more raw data files and display on terminal

使用语法:collectl -p file1 [file2 ...] [options]


众多监控工具中、collectl支持的性能数据种类应该是最全的一个,监控的子系统项类型:

SUMMARY SUBSYSTEMS --摘要子系统:显示的比较简单.


             b - buddy info (memory fragmentation)

             c - CPU

             d - Disk

             f - NFS V3 Data

             i - Inode and File System

             j - Interrupts

             l - Lustre

             m - Memory

             n - Networks

             s - Sockets

             t - TCP

             x - Interconnect

             y - Slabs (system object caches)


DETAIL SUBSYSTEMS --细节子系统:显示比较详细的信息.

     C - CPU

             D - Disk

             E - Environmental data (fan, power, temp),  via ipmitool

             F - NFS Data

             J - Interrupts

             L - Lustre OST detail OR client Filesystem detail

             M - Memory node data, which is also known as numa data

             N - Networks

             T - 65 TCP counters only available in plot format

             X - Interconnect

             Y - Slabs (system object caches)

             Z - Processes


上面这些监控项目必须要以 -s 参数来指定,如:collectl -ss ,并且是运行在回放模式下.



常用的参数及说明:

collect 默认不带参数的情况下显示如下:

[root@twexdb1 qzhijun]# collectl

waiting for 1 second sample...

#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

  0   0  1032    439      0      0      0      0      2     23      6      21

  0   0  1049    345      8     16    265     10      0      3      1       6

  0   0  1074    229      0      0      0      0      3     25      6      23

  0   0  1091    226      0      0      0      0      2     19      3      16


可以看到显示的内容:CPU/Disks/Network,显示的比较简单。


-s 显示子系统

1.显示摘要子系统信息指定项目信息:

举例:

1).只显示CPU的简单信息

[root@twexdb1 qzhijun]# collectl -sc

waiting for 1 second sample...

#<----CPU[HYPER]----->

#cpu sys inter  ctxsw

  0   0  1099    342

  0   0  1060    355

  0   0  1115    266

  0   0  1032    147

Ouch!


2).同时显示内存和磁盘的简单信息

[root@twexdb1 qzhijun]# collectl -sdm

waiting for 1 second sample...

#<-----------Memory-----------><----------Disks----------->

#Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes

118M 270M   5G   5G 223M   1G      0      0    264      8

118M 270M   5G   5G 223M   1G      0      0      0      0

118M 270M   5G   5G 223M   1G      0      0     52     10

119M 270M   5G   5G 223M   1G      8     16   1157     52

119M 270M   5G   5G 223M   1G      0      0      0      0

Ouch!



这个子系统也可以原来collectl这个命令不带任何参数的情况下追加或减少显示的信息,用+/-.

3).增加内存的显示信息:

[root@twexdb1 qzhijun]# collectl -s+m

waiting for 1 second sample...

#<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

  0   0  2348   1851 116M 270M   5G   5G 223M   1G      0      0      0      0      2     22      4      19

  1   0  3513   3354 116M 270M   5G   5G 223M   1G      0      0    316     18     78    777    120     701

  0   0  1108    304 116M 270M   5G   5G 223M   1G      8     16      1      1    142   1605    184    1368

  0   0  1151    683 115M 270M   5G   5G 223M   1G      0      0     28      4      9     65     31      60

Ouch!


4).同时增加内存与网络的显示信息:

[root@twexdb1 qzhijun]# collectl -s+mn

waiting for 1 second sample...

#<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network---------->

#cpu sys inter  ctxsw Free Buff Cach Inac Slab  Map KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

  0   0  1032    554 116M 270M   5G   5G 224M   1G      0      0    352      9      4     40     11      35

  0   0  1032    180 116M 270M   5G   5G 224M   1G      0      0      0      0      1     11      2      12

  0   0  1026    174 116M 270M   5G   5G 224M   1G      8     16      1      1      1      4      1       6

  0   0  1032    177 116M 270M   5G   5G 224M   1G      0      0      0      0      1      4      1       7

Ouch!


5).在默认显示信息的基础上减去CPU的信息:

[root@twexdb1 qzhijun]# collectl -s-c

waiting for 1 second sample...

#<----------Disks-----------><----------Network---------->

#KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut

     8     16      1      1     29    278     52     230

     0      0      0      0     50    556     69     463

     0      0     20      3      6     49     14      46

     0      0   1516     81     74    675    235     603

     8     16    337      8      2     18      8      21

     0      0      0      0      1      4      1       6

Ouch!


2.显示详细子系统指定项目信息:

[root@twexdb1 qzhijun]# collectl -sD

waiting for 1 second sample...


# DISK STATISTICS (/sec)

#          <---------reads---------><---------writes---------><--------averages--------> Pct

#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util

c0d0             0      0    0    0       0      0    0    0       0     0     0      0    0

sda              8      0   16    1       0      0    1    1       0     1     0      0    0

sdb              0      0    0    0      44      5    6    7       7     2     0      0    0

sdc              0      0    0    0       0      0    0    0       0     0     0      0    0

dm-0             8      0   16    1       0      0    1    1       0     1     0      0    0

dm-1             0      0    0    0      44      0   11    4       4     4     0      0    0

dm-2             0      0    0    0       0      0    0    0       0     0     0      0    0

dm-3             0      0    0    0       0      0    0    0       0     0     0      0    0

c0d0             0      0    0    0       0      0    0    0       0     0     0      0    0


还可以指定特定的磁盘:--dskfilt

[root@twexdb1 qzhijun]# collectl -sD --dskfilt sdb

waiting for 1 second sample...



监控某个特定的进程:

[root@twexdb1 qzhijun]# collectl -sZ --procfilt Cmysql --procopts c

waiting for 60 second sample...


# PROCESS SUMMARY (counters are /sec)

# PID  User     PR  PPID THRD S   VSZ   RSS CP  SysT  UsrT Pct  AccuTime MajF MinF Command

6839  root     18     1    0 S   10M    1M  3  0.00  0.00   0  00:00.09    0    0 /bin/sh

7002  mysql    14  6839  300 S    2G    1G 15  0.18  3.96   6 728:25:39    0    0 /usr/local/mysql/bin/mysqld

Ouch!


# DISK STATISTICS (/sec)

#          <---------reads---------><---------writes---------><--------averages--------> Pct

#Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size  RWSize  QLen  Wait SvcTim Util

sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

sdb              0      0    0    0       0      0    0    0       0     0     0      0    0

Ouch!


--procfilt Process Filters

     c  -  substring  of the command being executed as explicitly read from /proc/pid/stat.  Note that this can actually be a perl expression, so if you

             want a command that ends in a particular string all you need to is append a to the end of the string.  Otherwise it would match any  commands  con-

             taining that string.

             C - any command that starts with the specified string

             f - full path of the command, including arguments, as read from /proc/pid/cmdline.  Like the c modifier this too can be a perl expression.

             p - pid

             P - parent pid

             u - any process ownerd by this user’s UID or in the range specifide by uxxx-yyy

             U - any process owned by this username


--top 类似以linux下面的top工具那样实时显示.

如:

collectl -sCj --top


--iosize :显示平均的I/O大小(多了Size字段)


显示时间戳:

-oT 显示时间

-oD 显示日期和时间

-oDm 显示日期时间和毫秒


-i 指定监控时间间隔(以秒为单位)

[root@twexdb1 qzhijun]# collectl -sm -i 2

waiting for 2 second sample...

#<-----------Memory----------->

#Free Buff Cach Inac Slab  Map

120M 276M   5G   5G 224M   1G

120M 276M   5G   5G 224M   1G

120M 276M   5G   5G 224M   1G

120M 276M   5G   5G 224M   1G

121M 276M   5G   5G 224M   1G

121M 276M   5G   5G 223M   1G




例:

以1/4秒采集系统数据并保存到日志文件中:

collectl -i.25 -oDm --iosize > testPerf.log


该程序还支持发送数据到远程主机,请参看man说明: man collectl



[root@twexdb1 qzhijun]# collectl --help

This is a subset of the most common switches and even the descriptions are

abbreviated.  To see all type 'collectl -x', to get started just type 'collectl'


usage: collectl [switches]

 -c, --count      count      collect this number of samples and exit

 -f, --filename   file       name of directory/file to write to

 -i, --interval   int        collection interval in seconds [default=1]

 -o, --options    options    misc formatting options, --showoptions for all

                               d|D - include date in output

                                 T - include time in output

                                 z - turn off compression of plot files

 -p, --playback   file       playback results from 'file' (be sure to quote

     if wild carded) or the shell might mess it up

 -P, --plot                  generate output in 'plot' format

 -s, --subsys     subsys     specify one or more subsystems [default=cdn]

     --verbose               display output in verbose format (automatically

                             selected when brief doesn't make sense)


Various types of help

 -h, --help                  print this text

 -v, --version               print version

 -V, --showdefs              print operational defaults

 -x, --helpextend            extended help, more details descriptions too

 -X, --helpall               shows all help concatenated together


 --showoptions               show all the options

 --showsubsys                show all the subsystems

 --showsubopts               show all subsystem specific options

 --showtopopts               show --top options


 --showheader                show file header that 'would be' generated

 --showcolheaders            show column headers that 'would be' generated

 --showslabaliases           for SLUB allocator, show non-root aliases

 --showrootslabs             same as --showslabaliases but use 'root' names