AIX的topas命令详解

最新推荐文章于 2022-09-21 19:27:20 发布

达摩院扫地僧

最新推荐文章于 2022-09-21 19:27:20 发布

阅读量2.4k

点赞数 1

分类专栏： linux 文章标签： topas AIX负载

本文链接：https://blog.csdn.net/yougou_sully/article/details/84900528

版权

linux 专栏收录该内容

25 篇文章

订阅专栏

本文详细介绍AIX系统下topas命令的各个区域信息，包括CPU、网络、磁盘、内存及进程的状态监控，帮助理解系统性能瓶颈。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

说明

topas命令的说明可以直接执行man topas了解，或者直接看IBM给的原始文档，路径为：https://www.ibm.com/support/knowledgecenter/en/ssw_aix_71/com.ibm.aix.cmds5/topas.htm

命令详解

先上在AIX服务器上执行topas命令后的图片

区域1：反映CPU使用率和工作状况。

Kernel

说明：操作系统的内核占用的CPU时间比率。
操作系统作为基础软件，为应用程序支持和服务的同时，本身的运行也需要一定的CPU和内存资源（顺便提到内存资源，后面不再阐述这个内容了），特别是内存资源，系统负载越重，相应的内核占用的CPU和内存资源也会越多。一般来说，内核占用的CPU时间不会太多的。一般小于应用的CPU使用率。

User

说明：用户进程占用的CPU时间比率。
这个为CPU使用率的关键数值。该使用率反映了用户在操作系统基础上运行的各种软件占用的CPU时间比率的总和。一般来说，如果User+Kernel连续大于70%，即可以认为系统可能存在CPU上的严重性能问题。

Wait

说明：CPU处于等待状态占CPU时间的比率。
CPU的等待一般都为等待IO的响应，众所周知，目前计算机的主要瓶颈都在IO。应用程序执行的时候，需要读写磁盘等外部存储的数据，进程就会发起IO请求后等待IO完成。这个等待的过程占用CPU时间就是wait。当这个值很高的时候，就说明IO来不及响应很多的IO请求，这个时候，就只能从IO层面想办法优化了。

Idle：

说明：CPU空闲时间比率，这个就不用说了吧。就是CPU多少时间比率在闲着。
CPU占用率出问题的主要可能原因：数据库服务器执行某一个SQL或者存储过程（存储过程就是封装起来的sql程序包而已）需要大量的运算（一般为软件设计不合理）。或者应用程序中存在异常的地方，比如死循环，或者其他写程序时的逻辑错误导致。一般程序出错会导致一个CPU被全部占用，比如上述的20%占用的原因就是一个交易程序长期占用一个CPU全部时间片（系统共计5个CPU）。

区域2：反映网络使用率的状况

Netwok

列出了网卡接口，KBPS即每秒钟多少KB（千字节） I-Pack每秒钟输入的数据包个数， O-Pack 每秒钟输出的数据包个数 KB-In每秒钟输入的字节数KB-Out每秒钟输出的字节数。
当我们发现网络拥堵时（出现网卡传输失效的报错，即网卡发送数据包失败。或者网络响应明显变慢的时候，如果CPU没有问题，那么请检查网络流量）发现某一个网卡的KBPS持续大于四位数，甚至五位数时（这个值要是网卡千兆还是百兆而定）。就要看看这个网卡是什么网卡，在处理什么业务了。在命令行执行netstat–in 查看对应en*接口的ip地址，通过ip地址看看是带官网卡还是生产服务网卡流量高。然后通过netstat–v en* 看看网卡的详细工作状态，出现了多少错包，冲突包，crc校验错或者网络重置过等信息。上述信息请详细看netstat–v en*的输出.如果出现大量crc，错包的话，可能网线有问题或者接触不良。
如果上述均正常，而网络反应慢，则有可能是交换机拥堵。
网络出现问题的可能原因：通过百兆的带管网加载大量数据（以前出现过），大量队列的长时间的ftp传输，或者网线，交换机问题等。

区域3：反映磁盘使用率的状况。

Disk Busy%磁盘繁忙的百分比，即磁盘能满足的最大IOPS（每秒IO操作数）和当前IO数量的比率。其他的参数不再解释。望文生义即可。
一般主要看磁盘的Busy%,当磁盘的Busy%持续大于85%时，即认为磁盘相当繁忙，已经可能要出问题了。当然，自己知道已经确定要产生大量IO操作的内容则不必在意，等其完成即可。
出现问题的原因：应用服务器上面写日志进程或者查询日志的进程大量读写日志，导致磁盘繁忙率高，或者其他程序频繁读写磁盘导致。系统中hdisk0，hdisk1一般为系统盘，内置SCSI磁盘的相对IOPS是较低的。很容易满负荷运行。

区域4：反映进程信息的状况。

Name

进程的名称，即进程被执行时启动的二进制文件的名称。

PID

进程的ID，进程的ID在系统中唯一，是我们了解跟踪进程信息重要数值。

跟踪进程的CPU使用，磁盘IO读写，进程的内存和pagingspace占用等等均需要使用。

CPU%

进程占用CPU时间的比率。

PgSp

进程占用的pagingspace的空间大小。

Owner

进程的属主，即由哪个操作用户用户启动了这个进程。

在topas中，默认是列出占用cpu最高的前几个的进程信息供参考，如果前面第一区域的的CPU使用率持续高，就要看看这里是那个进程占用了大量的CPU资源，看看是哪个用户的进程，如果自己执行的，则杀掉或者找项目组解决即可。

区域5：反映内存页面和换页空间信息的状况。

换页空间即磁盘上的空间，在AIX操作系统中用来做内存空间使用。具体的理论就不再阐述了，详细信息请参阅操作系统内容。磁盘空间的速度当然相比内存，慢了不止10倍。所以，只是内存页面的一个暂时存放地，存放的还是那些长期不怎么用到的内存页面而已。如果paging大量出现，这时候就有麻烦了，说明：内存不够用了！

该区域主要关注PageIn，PageOut如果这两个数值均大于三位数，并且长期大于这个数值，在技术上叫做内存颠簸，即不停的把内存页面换到磁盘空间上，又从磁盘空间把内存页面读进来，系统的内存使用效率变的极差，系统响应性能也变慢了。

这个信息也可以用vmstat来看，pi和po列即与这里相对应。当然，如果只是有页面出，或者只有页面入，或者短时间的一些页面换入换出，则没有什么问题，关注一下即可。

区域6：反映内存使用的信息。

Real

MB操作系统实际拥有的内存的总量，单位是MB。

%Comp 和%Noncomp

%Comp，计算型内存占用比率，%Noncomp非计算型内存占用的比率。

%Client

%Client也为非计算型内存，Noncomp包涵Client型内存，jfs文件系统使用的内存为noncomp，为了区分，jfs2和nfs使用的内存为Client。

计算型内存就是进程实际使用的内存，例如我们写程序的时候malloc内存，或者在排序中使用了堆栈，进程中变量数值都需要在内存中保存，这部分内存为计算型内存（阐述不全面，仅供参考）。而操作系统在进行文件读写，需要的io缓冲区，或者我们在写程序的时候，打开文件，读写文件，均在文件缓冲区进行。（裸设备例外，CCCC的数据库采用RAC，数据的存储全部使用裸设备，在数据库服务器上，数据文件的缓冲在oracle的sga区的databuffer中（这个区域系统认为是计算型内存），是不会占用非计算内存的。）

导致内存出问题的可能原因很多。主要有：进程使用了更多的内存，例如，CCCC数据库服务器大量的oracle连接使用了很多内存，或者数据库中执行的某一个sql脚本或者存储过程的执行需要大量的内存来完成其操作（特例库中出现过这个情形，一个存储过程的执行导致操作系统内存被耗尽，pg也随之耗尽，操作系统自动执行PGSP_KILL,把该进程给干掉了，我也是第一次知道aix系统还有这个功能，呵呵）。第二个主要的问题就是内存泄漏，内存泄漏最简单的来说，就是申请了内存空间，使用后不再使用了，但是也没有释放。我们写程序的时候malloc，却没有free。这就导致了严重的问题，随着程序的执行，可用物理内存越来越少，最后就挂了，只好定期重启应用来解决。

操作系统的内存换页机制导致了程序中不用的内存页面最后都跑到pg上面去了，换页空间会持续增长的。因应用导致系统问题就是这么产生的。

区域7反映的是换页空间的使用率。

如果换页空间的使用率长期增长，就说明系统内存不足，已经开始使用磁盘空间来缓冲内存了，如果PG使用率持续增长，或者大于50%，需要警惕（到50%在监控平台已经是主要告警啦！），并马上提交系统管理员分析内存增长原因。如果该数值持续增长，系统一定会挂掉的

EVENTS/QUEUES

Displays the per-second frequency of selected system-global events and the average size of the thread run and wait queues

Cswitch

The number of context switches per second over the monitoring interval.

Syscalls

The total number of system calls per second that are run over the monitoring interval.

Reads

The number of read system calls per second that are run over the monitoring interval.

Writes

The number of write system calls per second that are run over the monitoring interval.

Forks

The number of fork system calls per second that are run over the monitoring interval.

Execs

The number of exec system calls per second that are run over the monitoring interval.

Runqueue

The average number of threads that were ready to run but were waiting for a processor to become available.

Waitqueue

The average number of threads that were waiting for paging to complete.

FILE/TTY

Readch

The amount of bytes read per second through the read system call over the monitoring interval.

Writech

The amount of bytes written per second through the write system call over the monitoring interval.

Rawin

The amount of raw bytes read per second from TTYs over the monitoring interval.

Ttyout

The amount of bytes written to TTYs per second over the monitoring interval.

Igets

The number of calls per second to the inode lookup routines over the monitoring interval.

Namei

The number of calls per second to the path name lookup routines over the monitoring interval.

Dirblk

The number of directory blocks scanned per second by the directory search routine over the monitoring interval.

PAGING

Displays the per-second frequency of paging statistics. The following data is reported:

Faults

The total number of page faults taken per second over the monitoring interval. This includes page faults that do not cause paging activity.

Steals

The physical memory 4 K frames stolen per second by the virtual memory manager over the monitoring interval.

PgspIn

The number of 4 K pages read from paging space per second over the monitoring interval.

PgspOut

The number of 4 K pages written to paging space per second over the monitoring interval.

PageIn

The number of 4 K pages read per second over the monitoring interval. This includes paging activity associated with reading from file systems. Subtract PgspIn from this value to get the number of 4K pages read from file systems per second over the monitoring interval.

PageOut

The number of 4 K pages written per second over the monitoring interval. This includes paging activity associated with writing to file systems. Subtract PgspOut from this value to get the number of 4K pages written to file systems per second over the monitoring interval.

Sios

The number of I/O requests per second issued by the virtual memory manager over the monitoring interval.

MEMORY

Displays the real memory size and the distribution of memory in use. The following data is reported:

Real,MB

The size of real memory in megabytes.

% Comp

The percentage of real memory currently allocated to computational page frames. Computational page frames are generally those that are backed by paging space.

% Noncomp

The percentage of real memory currently allocated to non-computational frames. Non-computational page frames are generally those that are backed by file space, either data files, executable files, or shared library files.

% Client

The percentage of real memory currently allocated to cache remotely mounted files.

PAGING SPACE

Displays the size and use of paging space. The following data is reported:

Size,MB

The sum of all paging spaces on the system, in megabytes.

% Used

The percentage of total paging space currently in use.

% Free

The percentage of total paging space currently free.

NFS

Displays the NFS statistics in calls per second. The following data is reported:

Server V2 calls/sec
Client V2 calls/sec
Server V3 calls/sec
Client V3 calls/sec

Total WPAR

Displays the total number of workload partitions that are defined in the system. The total amount of workload partitions can be in the following states: Defined, Active, Broken or Transition.