Monitoring Informix Dynamic Server for Higher Performance

最新推荐文章于 2024-02-23 15:47:01 发布

csdn20142

最新推荐文章于 2024-02-23 15:47:01 发布

阅读量435

点赞数

文章标签：数据库开发工具 shell

Monitoring Informix Dynamic Server for Higher Performance

from: http://www.informix.com.ua/articles/monitor/monitor.htm

http://www.ibm.com/developerworks/data/zones/informix/library/techarticle/0303fan/0303fan.html

Introduction

Many books and papers elaborate on Informix® Dynamic Server® (IDS), its architecture and performance tuning, but very few specialize on the topic of monitoring. Yet, effective monitoring is very critical in IDS administration. It helps us to collect valuable system and database performance statistics and to identify problems at very early stage, so that we can be proactive in troubleshooting and performance tuning. Once the Informix Dynamic Server is successfully installed and configured and Informix databases implemented, monitoring Informix Dynamic Server becomes the database administrator's top priority.

This article will discuss in detail how to effectively monitor Informix Dynamic Server at all levels and will provide some general tips on identifying Informix engine and database problems. It will cover troubleshooting as well as performance tuning.

Monitoring tools

Informix provides two major tools to monitor system and database performance:

The onstat utility.
Numerous system monitoring interface (SMI) tables in the sysmaster database, which is created automatically at the time of IDS first initialization.

Both the onstat utility and SMI tables monitor IDS performance by examining IDS shared memory activities, but there is a difference in the way of presenting those statistics. The onstat utility always presents statistics in a fixed way, whereas using SMI tables permits you to reorganize those statistics in a more meaningful, more readable format.

One thing we need to pay attention to is that statistics collected either by onstat or in the SMI tables are cumulative from the time of system reboot or IDS initialization. Therefore, we need to be very careful about those statistics, and always take into account IDS running time. For example, 100,000 bufwaits for a server that has been running over one month is a lot different that 100,000 bufwaits in a single day. To get current statistics, we need to do onstat -z to zero out the old values.

Informix also provides a graphical monitoring tool - onperf. Onperf collects IDS server performance statistics and plots it into metrics. It can also save those statistics into a text file for later analysis. Refer to the Performance Guide for Informix Dynamic Server for more details on the onperf utility.

IDS activities can be classified into three categories:

Instance activities
Database activities
Session activities

Using the tools discussed above, we can effectively monitor all those IDS activities.

Monitoring instance activity

An IDS instance refers to Informix shared memory, Informix processors, Informix databases and physical devices allocated to Informix. The following are some of the most important instance activities we need to monitor.

Operating mode

The first and most important instance activity is, of course, the operating mode of IDS. Is IDS running all right or is it having some problems, or is it down? The onstat -p command catches the current operating mode of IDS as follows:

 
Informix Dynamic Server 2000 Version 9.21.UC4     -- On-Line -- Up 01:01:17 --  
1654784 Kbytes 
 
Profile 
dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 
86923    101304   3116565  97.21   1651     15022    26196    93.70   
 
isamtot  open     start    read     write    rewrite  delete   commit   rollbk 
2585879  118500   286631   1032967  1972     914      2        2        0 
 
gp_read  gp_write gp_rewrt gp_del   gp_alloc gp_free  gp_curs  
0        0        0        0        0        0        0        
 
ovlock   ovuserthread ovbuff   usercpu  syscpu   numckpts flushes  
0        0            0        478.11   71.63    13       26       
 
bufwaits lokwaits lockreqs deadlks  dltouts  ckpwaits compress seqscans 
3502     0        7065882  0        0        0        1266     11280    
 
ixda-RA  idx-RA   da-RA    RA-pgsused lchwaits 
10120    51       69387    79557      482

We can also query the sysprofile table in the sysmaster database to get same statistics.

The first line of the output displays the current IDS operating mode. In this case, Informix engine is "On-Line". There are six operating modes, of which three are particularly important: Off-Line, Quiescent, and On-Line. The Off-Line mode indicates that IDS is not running. The Quiescent mode indicates that IDS is running in a single user mode where only DBA can do administrative and maintenance work. The On-Line mode indicates that IDS is running normal and all users can connect to the database server and perform. any kind of database operations. In most circumstances, IDS should always be in the On-Line mode. If for some reason, IDS is down or in the Off-Line mode, the above command will display the following message:

 
Shared memory not initialized for INFORMIXSERVER 'cassprod_shm'

In this case, you will need to check the message log or the Informix online log to further identify the root cause of the problem (see Message Log).

In addition to the current operating mode, the above output also provides some important Informix instance performance statistics. The two %cache fields indicate how effectively IDS is using memory cache. The first %cache field shows the percentage of read cache rate, and the second one shows the write cache rate. Read cache rate and write cache rate can vary dramatically depending upon the applications and the type and size of the data being operated on. But, in general, both read cache rate and write cache rate should be in the 80 to 90th percentile. Those are fairly conservative numbers and should be adjusted according to specific environment. If these rates are consistently lower than 80%, you need to consider increasing the value of the BUFFERS parameter in your Informix configuration file to achieve higher read and write cache rates. Low read and write cache rates indicate IDS is doing a lot more disk reads and writes than it should, which will greatly slow down overall database engine performance.

The seqscan field of the output indicates how many sequential scans have been performed since the database has been up or online. If this number is large, say over 100,000, and ever increasing, it may indicate some performance problems, especially if your system is in OLTP environment. You then need to do some further investigations to find out the root cause of excessive sequential scans. We will discuss this in more detail later in this article.

The ovlock field indicates the number of times IDS attempted to exceed the maximum number of locks. If this number is non-zero, you may need to increase the value of the LOCKS parameter in the configuration file. The ovbuf field indicates the number of times IDS attempted to exceed the maximum number of buffers. If this number is huge, say more than 100,000, then you need to increase your BUFFERS parameter so that users do not have to wait for buffers when they need to access data from disk. This will reduce response time and, hence improve overall performance. We need also examine LRU related parameters and adjust their values to lower bufwaits. Refer to the Administrator's Guide for Informix Dynamic Server for more details.

Another group of important fields includes ixda-RA, idx-RA, da-RA, and RA-pgused. Together, these fields indicate how effectively IDS is using the Informix read-ahead mechanism. Read-ahead is the operation that reads ahead the number of data pages from disks into memory during sequential scans or index reads. The ideal situation is that the number of pages read ahead, that is the sum of ixda-RA, idx-RA and da-RA, is equal to the number of pages used during the sequential scans or index reads, that is RA-pgused. This indicates that the read-ahead pages are used 100% for sequential scans and index reads. If there is a significant difference between these two numbers, say more then 10,000, either positive or negative, IDS is not using read-ahead very efficiently, and you may need to tune your read-ahead parameters, that is RA_PAGES and RA_THRESHOLD for better performance. Refer to Administrator's Guide for Informix Dynamic Server (referred to as Administrator's Guide in this article) for details on how to tune these parameters.

Message log

The message log is also referred to as the online log. It contains a variety of information about critical instance activities, such as time and duration of checkpoints, instance startup and shutdown, backup and recovery status, logical log backup status, and changes to major configuration parameters. Message log also contains critical errors (referred to as assertion failures by Informix) such as disk I/O errors, mirroring errors, down chunks, data integrity errors, shared memory errors and so on. When assertion failures occur, message log will usually point us to the relative assertion failure ("af.xxx") file, which records more detailed information about instance activities when database engine went down and also will give us some suggestions on how to fix the problem. The following is an excerpt from message log:

 
00:57:53   
00:57:53  Assert Failed: Unexpected virtual processor termination, pid =586, exit = 0x9  
00:57:53   Who: Session(13709, omcadmin@nvlsys, 6538, 654709000) 
                Thread(13740, sqlexec, 2704a558, 1) 
00:57:53   Results: Fatal Internal Error requires system shutdown 
00:57:53   Action: Restart OnLine 
00:57:53   See Also: /var/tmp/af.35acfee1 
00:57:53  Stack for thread: 13740 sqlexec

The above output tells us that one of Informix virtual processors got terminated and corrupted database engine. This happened on the machine by the name of nvlsys when user "omcadmin" logged in and was performing some database operations, mostly improper execution of SQL queries. The file /var/tmp/af.35acfeel recorded the detailed statistics about the database engine status when the error occurred.

Chunk status

Chunks are the physical storage devices. They should always be online. If any chunks is down, it indicates data corruption and requires immediate attention. The onstat -d command monitors current chunk status and the following is the output of this command:

 
Informix Dynamic Server 2000 Version 9.21.UC4     -- On-Line -- Up 7 days 23:35:56 --  
1654784 Kbytes 
 
Dbspaces 
address  number   flags    fchunk     nchunks  flags    owner    name 
6510c7d0 1        0x1        1        1        N        informix rootdbs 
65866468 2        0x1        2        4        N        informix airgen_idx_dbs 
658665b0 3        0x1        3        3        N        informix spare 
658666f8 4        0x1        4        5        N        informix logs 
65866840 5        0x1        5        2        N        informix pm1 
65866988 6        0x1        7        1        N        informix pm_gen 
65866ad0 7        0x2001     8        1        N T      informix temp_dbspace2 
65866c18 8        0x1        10       2        N        informix pm2 
65866d60 9        0x1        11       3        N        informix airgen_main_dbs 
65866ea8 10       0x1        14       1        N        informix mso_meta.
65867018 11       0x1        16       2        N        informix pm3 
65867160 12       0x2001     18       1        N T      informix temp_dbspace3 
658672a8 13       0x2001     20       1        N T      informix temp_dbspace1 
658673f0 14       0x1        25       2        N        informix pm4 
65867538 15       0x2001     29       1        N T      informix temp_dbspace4 
 15 active, 2047 maximum 
 
Chunks 
address  chk/dbs offset   size     free     bpages   flags pathname 
6510c918 1   1   0        63069    51985             PO-   /usr/informix/dblink 
6514b5f0 2   2   65000    750000   1                 PO-   /usr/informix/dblink 
6514b760 3   3   815000   60000    59747             PO-   /usr/informix/dblink 
6514b8d0 4   4   875000   125000   4947              PO-   /usr/informix/dblink 
6514ba40 5   5   0        1000000  299290            PO-   /usr/informix/dblink1 
6514bbb0 6   2   0        1000000  207877            PO-   /usr/informix/dblink2 
6514bd20 7   6   0        200000   179043            PO-   /usr/informix/dblink3 
6514be90 8   7   200000   250000   249939            PO-   /usr/informix/dblink3 
6510ca88 9   3   450000   250000   249997            PO-   /usr/informix/dblink3 
6510cbf8 10  8   0        1000000  299086            PO-   /usr/informix/dblink4 
6510cd68 11  9   0        1000000  4                 PO-   /usr/informix/dblink5 
6513c830 12  9   0        500000   10                PO-   /usr/informix/dblink6 
6513c9a0 13  8   500000   300000   299997            PO-   /usr/informix/dblink6 
6513cb10 14  10  800000   200000   27596             PO-   /usr/informix/dblink6 
6513cc80 15  9   0        1000000  782331            PO-   /usr/informix/dblink7 
6513cdf0 16  11  0        1000000  296827            PO-   /usr/informix/dblink8 
65865018 17  4   0        400000   9997              PO-   /usr/informix/dblink9 
65865188 18  12  400000   250000   249947            PO-   /usr/informix/dblink9 
658652f8 19  5   0        300000   299997            PO-   /usr/informix/dblink10 
65865468 20  13  300000   250000   249947            PO-   /usr/informix/dblink10 
658655d8 21  4   550000   150000   14997             PO-   /usr/informix/dblink10 
65865748 22  4   0        350000   4997              PO-   /usr/informix/dblink11 
658658b8 23  11  350000   300000   299997            PO-   /usr/informix/dblink11 
65865a28 24  2   0        1000000  999997            PO-   /usr/informix/dblink12 
65865b98 25  14  0        1000000  299014            PO-   /usr/informix/dblink13 
65865d08 26  2   0        750000   749997            PO-   /usr/informix/dblink14 
65865e78 27  4   750000   250000   39997             PO-   /usr/informix/dblink14 
65866018 28  14  0        300000   299997            PO-   /usr/informix/dblink15 
65866188 29  15  300000   250000   249939            PO-   /usr/informix/dblink15 
658662f8 30  3   550000   50000    49997             PO-   /usr/informix/dblink15 
 30 active, 2047 maximum

The above output contains two sections. The first section lists all dbspaces, and the second lists all chunks. In the Chunk section, we need to pay close attention to flags field. The first character of this field indicates if the chunk is a primary (P) chunk or a mirrored (M) chunk. The second character indicates the current status of the chunk, whether it is online (O) or off-line (D). Since O and D look very much alike, especially when you are in a hurry, you may want to pipe the results to grep PD, that is onstat -d |grep PD, just to make sure that you don't miss any down chunks. If there are any down primary chunks, you need to perform. either a cold or warm recovery from backup tapes immediately to ensure data integrity. We can also query syschunks and sysdbspaces tables in the sysmaster database to get similar statistics.

Checkpoint

A checkpoint is the process of synchronizing pages on disk with pages in the shared memory buffer pool. During checkpoints, IDS prevents user threads form. entering critical session and blocks all transaction activities. Thus, if checkpoint duration is long, users may experience system hanging. This is especially true in OLTP environments where there are thousands of transactions and the response time is most critical. As noted above, you can monitor checkpoint duration by looking at the message log, but the better and faster way to do this is to use onstat -m command. Below is the sample output of this command:

 
15:25:10  Checkpoint Completed:  duration was 0 seconds. 
15:25:10  Checkpoint loguniq 231, logpos 0x1bb2018 
 
15:35:30  Checkpoint Completed:  duration was 19 seconds. 
15:35:30  Checkpoint loguniq 231, logpos 0x31b9018 
 
Fri Dec 20 11:48:02 2002 
 
11:48:02  Checkpoint Completed:  duration was 7 seconds. 
11:48:02  Checkpoint loguniq 231, logpos 0x32e5018 
 
14:27:37  Logical Log 231 Complete. 
14:27:40  Process exited with return code 142: /bin/sh /bin/sh -c  
/usr/informix/etc/log_full.sh 2 23 "Logical Log 231 Complete." "Logical Log 231 
Complete."  
14:28:24  Checkpoint Completed:  duration was 22 seconds. 
14:28:24  Checkpoint loguniq 232, logpos 0x458018 
 
14:38:46  Checkpoint Completed:  duration was 7 seconds. 
14:38:46  Checkpoint loguniq 232, logpos 0x10f5018

If the checkpoint duration is consistently longer than 10 seconds, you may need to reduce the values of LRU_MIN_DIRTY and LRU_MAX_DIRTY configuration parameters to achieve shorter checkpoint duration. Also, if the output of onstat -F shows extremely high chunk write, say more than 10,000 and this number is ever increasing, it may indicate one of two problems: either checkpoint intervals are too short and cleaners do not have enough time to write all modified buffers to disks between checkpoints, or there are too few AIO VPs to share the heavy disk writes during checkpoints. You then need to re-examine the settings of CKPINTVL, LRUS, CLEANERS, and NUMAIOVPS configuration parameters and increase their values accordingly. We may need also look at the output of onstat -F as a reference to determine the values of those parameters. For detailed information on how to tune those parameters, refer to Administrator's Guide.

Dbspace usage

It is very important for Informix database administrators to keep informed of the space in each dbspace. If one of the dbspaces is lacking or running out of space, then IDS will suffer. All kinds of problems may occur: we cannot import any databases, we can not create any tables and indexes, we can not even do insertion and update to any tables and indexes. This is very critical for production databases. We need to monitor the growth of each dbspaces so that we can take more proactive approaches to those problems. The script. below reports current space usage of each dbspace, and calculates their percentage.

 
select name dbspace, sum(chksize) allocated, sum(nfree) free, 
round(((sum(chksize) - sum(nfree))/sum(chksize))*100) pcused 
from sysdbspaces d, syschunks c 
where d.dbsnum = c.dbsnum 
group by name 
order by name

The output is shown as follows:

 
dbspace              allocated             free           pcused 
 
airgen_idx_dbs         1000000           763405               24 
airgen_main_dbs        1500000           295789               80 
llog                   1000000             9947               99 
rootdbs                  50000            36220               28 
temp1                   250000           249947                0 
temp2                   250000           249939                0

This output helps us to identify dbspaces that are running out of space. To be proactive, consider allocating additional disk space to the dbspace when its disk usage approaches 90%; in this case, we need to pay very close attention to llog dbspace and allocate more space to it if possible to prevent it running out of space.

Dbspace I/O

Dbsapce I/O is measured by disk reads and writes. If some dbspaces have heavy disk reads and writes while others scarcely have any, then the system may have some disk I/O bottlenecks. A well-balanced dbspace I/O will ease system disk I/O loads and, hence improve overall system performance. The following script. will display current I/O statistics for each dbspace:

 
select d.name, fname[15,25] path_name, sum(pagesread) diskreads, 
sum(pageswritten) diskwrites 
from syschkio c, syschunks k, sysdbspaces d 
where d.dbsnum = k.dbsnum 
and k.chknum = c.chknum 
group by 1, 2 
order by 1

The output is as follows:

 
name              path_name          diskreads       diskwrites 
 
airgen_idx_dbs   uild95/ltmp             3672             7964 
airgen_main_dbs  uild95/ltmp            13545            32903 
llog             uild95/ltmp               19            51633 
rootdbs          uild95/ltmp              211            43117 
temp1            uild95/ltmp             3015             3122 
temp2            uild95/ltmp             3218             3317

The goal is to achieve balanced disk reads and writes across all dbspaces. This is unrealistic in most cases, but the above output at least gives you an idea of how dbspace I/O is distributed and can help you to identify "hot" dbspaces - those dbapces with the most disk reads and writes. If disk reads and writes are extremely high for some dbspaces and extremely low for others, you may need to adjust or even rearrange your physical disk layout for Informix engine. We can get similar information using onstat -D, which displays disk reads and writes for each chunk, and onstat -g ioq which displays disk I/O waiting queue information.

You can further identify what tables have the most disk reads and writes by querying the sysptprof table in the sysmaster database:

 
select dbsname, tabname, (isreads + pagreads) diskreads, (iswrites + pagwrites)  
diskwrites 
from sysptprof 
order by 3 desc, 4 desc

The output looks like this:

 
dbsname         tabname                  diskreads    diskwrites 
 
airgen_10_0     fanout_param                 84567          3094 
airgen_cm_db    sysindices                   78381             0 
airgen_10_0     ne_nmo_i                     75819             5 
airgen_10_0     ne_nmo                       75440           297 
airgen_cm_db    sysprocbody                  62610         28322 
airgen_10_0     systables                    37342           466 
airgen_10_0     syscolumns                   34539          4609 
airgen_10_0      457_484                     32838            42 
airgen_10_0      453_480                     30009             1 
airgen_10_5_old syscolumns                   29531          4550 
airgen_10_5     syscolumns                   28824          4552 
airgen_10_0      456_483                     25448            14 
airgen_10_0      458_485                     23278           177 
airgen_10_5_old  452_483                     22412            31

Based upon the output you get from this query, you may need to move some tables among dbspaces to get better disk I/O balance.

Shared memory segments

Too many virtual share memory segments, usually more than three, indicate that the initial virtual share memory segment is too small and that the database engine has to constantly allocate additional virtual sheared memory segments. This adversely effects IDS performance and will eventually bring your system to its knees. The onstat -g seg command displays how many shared memory segments your Informix database engine currently has:

 
Informix Dynamic Server 2000 Version 9.21.UC4     -- On-Line -- Up 28 days  
15:49:33 -- 205824 Kbytes 
 
Segment Summary: 
id       key        addr     size       ovhd     class blkused  blkfree  
0        1381386241 a000000  177209344  220688   R     42984    280      
1        1381386242 14900000 8388608    856      V     2048     0        
2        1381386243 15100000 1048576    632      M     164      92       
3        1381386244 15200000 8388608    856      V     2048     0        
4        1381386245 15a00000 8388608    856      V     2008     40       
5        1381386246 16200000 8388608    856      V     50       1998     
Total:   -          -        211812352  -        -     49302    2410     
 
   (* segment locked in memory)

If the output shows more than three virtual shared memory segments, you need to increase the value of SHMVERSIZE parameter in your configuration file. The idea is to let IDS allocate enough virtual shared memory at initialization so that it doesn't need to allocate more when users are logging onto the system and doing database operations. You may also want to use UNIX® ipcs command to see sizes of Informix shared memory. For detailed information on how to calculate IDS virtual shared memory segment size, refer to the Administrator's Guide.

Overall operating system performance

Since the Informix database engine is always installed on an operating system, mostly UNIX, to accurately monitor or evaluate IDS performance, we need to take into account the behavior. of the operating system as a whole, especially if the database engine resides on a non-dedicated database server. If IDS takes too much RAM, for instance, if your system has 512MB RAM and IDS takes 400MB or more for its shared memory, your operating system may experience heavy swapping and hanging when users perform. memory-intense operations. When there is not enough memory, the system has to swap some data pages from memory to disk to leave more spaces for new data. And if the system is short of memory, the CPU may also suffer. There are many UNIX utilities to monitor overall operating system CPU and memory utilization. Below is the output from "top" utility:

 
load averages:  1.12,  1.02,  1.07                                     10:17:30 
123 processes: 120 sleeping, 1 zombie, 2 on cpu 
CPU states: 70.5% idle, 26.5% user,  2.8% kernel,  0.3% iowait,  0.0% swap 
Memory: 3072M real, 76M free, 588M swap in use, 440M swap free 
 
PID USERNAME THR PRI NICE  SIZE   RES STATE   TIME    CPU COMMAND 
28349 omcadmin   4   0    0   86M   55M cpu10 970:25  6.85% CS_App.prt 
17782 informix   5  30  -10 1631M 1594M sleep  50.0H  4.66% oninit.exe 
17784 informix   5  59  -10 1631M 1594M sleep 102.9H  4.12% oninit.exe 
17786 informix   5  59  -10 1631M 1591M sleep  25.5H  2.53% oninit.exe 
  571 root       1  58    0  361M  129M sleep  19.0H  1.36% em_mis 
17785 informix   5  59  -10 1631M 1592M sleep  57.8H  1.05% oninit.exe 
 5470 omcadmin   1   0    0 1960K 1408K cpu15   0:00  0.26% top

The output is has two sections. The first section gives you a summary of the overall operating system's CPU and memory usage, and the second one provides detailed information about each processor. Other utilities, such as vmstat, iostat, ps -ef and sar, are also useful in collecting current operation system performance statistics. Vmstat shows how much memory operating systems is currently swapped; iostat and sar display current disk I/O distribution among all physical disks; and ps -ef prints out detailed information about each current processor's login time, CPU, and memory usage. In addition, there are many graphic tools available that allow you to graph the dynamic movements of your operating system's resource utilization and performance.

Monitoring database activity

The purpose of monitoring database activities is to ensure that every database is performing at its peak ability at all time. This implies that you need to watch for potential performance problem, identifying their root cause and eliminating them from the very beginning. Here are few things to watch for.

Table extents

An extent is a block of physically contiguous pages. However, if a table has more than one extent, there is no guarantee that those extents are contiguous; extents may be scattered throughout the dbspace where the table resides. Contiguity of physical pages is important to performance. When pages of data are contiguous, the time used to access data on disk is minimized and the database can read rows sequentially. If tables have too many extents, it is very likely that those extents are interleaved. This harms performance a great deal, because when you retrieve data for a certain table, the disk head needs to seek multiple, noncontiguous extents belonging to this table instead of one large extent with contiguous physical pages. This slows down disk-seeking speed rather significantly. The following script. detects database tables with multiple extents:

 
select t.tabname, count(*) num_ext 
from sysmaster:sysextents e, airgen:systables t 
where e.tabname=t.tabname 
and dbsname = "airgen" 
and t.tabname not like "sys%" 
group by 1 
having count(*) > 1 
order by 2 desc

The output is as follows:

 
tabname                       num_ext 
 
nmoattrclassmap                    14 
attrclass                          11 
networkmoclass                      3 
fanout_param                        3 
fanout_comp                         2 
ne_nmo                              2 
nenmoclassmap                       2 
join_map                            2

If any tables has more than 10 extents, with the exception of large fragmented tables, you should consider rebuilding them to consolidate the extents. With larger databases, or poorly sized databases, we might also be concerned with maximum number of extents or the dreaded 32GB limit on indices. For details on how to estimate and allocate extent size for tables, refer to Performance Guide.

Index levels

The number of index levels may also adversely effects performance. The more index levels, the more probes IDS needs to get to index leaf nodes. Furthermore, if a leaf node gets split or merged, it may take more time for the whole index to adjust to this change. For example, if an index has only two levels, only two levels need to be adjusted, but if it has four levels, then all four levels need to be adjusted accordingly. The time used for this adjustment is, of course, much longer. This is especially true in an OLTP environment where there are constant inserts, deletes, and updates which will cause indexes to be constantly split or merged. The following script. identifies how many levels each index has:

 
select idxname, levels from sysindexes 
order by 2 desc

The output looks as follows:

 
idxname              levels 
 
objdesc                   3 
fanout_param_i            3 
 458_485                  3 
 457_484                  3 
idxname                   2 
tabgtor                   2 
tabgtee                   2

If any index has more than 4 levels, you may need to consider dropping and rebuilding it to consolidate its levels for better performance.

Index uniqueness

A highly duplicate index can severely impact performance for updates and deletes. Suppose you have an index on column customer_type on table customer, and there are only five possible customer_type codes. If the table has one million rows, there could potentially be 200,000 rows with the same customer_type code. The B-tree would store the key value, followed by a list of pointers to each of the physical rows. The problem occurs when you have to delete or update any of the key values. IDS must search through all the duplicates until it finds the correct key to delete or update!

The following is a script. to identify highly duplicate indexes:

 
select tabname, idxname, nrows, nunique 
from systables t, sysindexes I 
where t.tabid =i.tabid 
and t.tabid > 99 
and nrows > 0 
and nunique > 0

The output looks as follows:

 
tabname              idxname                    nrows     nunique 
 
bsc_dte              bscdte_i                       6           6 
omcgttready           231_413                       1           1 
systemrelease         451_478                       3           3 
neclass               452_479                      31          12 
sysrelneclassmap      453_480                      33           3 
proxynemgrmap         454_481                       1           1 
networkmoclass        455_482                     362         199 
nenmoclassmap         456_483                     492          12 
attrclass             457_484                    1191         924 
nmoattrclassmap       458_485                    2901         199 
fanout_comp          fanout_comp_i                915         199 
fanout_comp          fanout_comp_i2               915         199 
fanout_comp          fanout_comp_i3               915          82 
fanout_param         fanout_param_i              2894         196

Ideally, all values appearing in nunique column should equal all values in nrow column, meaning that every key in the index is unique. Based on the number of rows (nrows column above) and number of unique keys (nunique column above), we can calculate the percentage of uniqueness of each index:

 
(nunique/nrows)*100

The higher the percentage, the more unique the index is. To avoid the performance bottleneck of a highly duplicate index, you can replace the original index with a composite index that combines the highly duplicate column and a more unique column. Using the example above, you can add the primary key column customer_id to the original index and make it a composite index (for example, "create index index_name on customer (customer_type, customer_id)").

Sequential scans

Sequential access to a table is sometimes harmful to performance, since database engine has to scan the whole table to pick up the rows that satisfy query's conditions. If tables are small, say couple of hundred rows, it is ok; because when database engine first scan it, the table will be reside in the memory and next time when database engine scan it, all data in that table can be retrieved direct from memory. This is actually the efficient way of using sequential scans. But if the tables are huge, say tables with more than 100,000 rows, repeated sequential scans are deadly to performance. The following script. will identify tables with multiple sequential scans:

 
select dbsname, tabname, sum(seqscans) tot_scans 
from sysptprof 
where seqscans > 0 
and dbsname not like "sys%" 
group by 1,2 
order by 3 desc

The output is as follows:

 
dbsname              tabname                     tot_scans  
 
airgen_10_0          systemrelease                    2352 
airgen_10_5_old      systemrelease                    1596 
airgen_10_5          systemrelease                    1596 
airgen_10_0          fanout_comp                      1587 
airgen_10_5_old      sysusers                         1248 
airgen_10_0          sysusers                         1241 
airgen_10_5          sysusers                         1231 
airgen_10_0          join_map                         1036 
airgen_10_0          fanout_param                      958 
airgen_10_0          func_call                         770 
airgen_10_5          nenmoclassmap                     586 
airgen_10_5_old      nenmoclassmap                     586

From the output above, you can see that airgen_10_0 table has a high number of sequential scans. If this is a big table with thousands or even millions of rows, you may need to think of adding some indexes to this table or using program directives to force internal query optimizer choose indexes for accessing data in this table rather than sequential scans.

Monitoring session activity

Statistics about session activities are very useful in identifying potential performance problems and troubleshooting. What session activity statistics can we collect using the monitoring tools discussed earlier in the article?

General session statistics

The syssessions table in the sysmaster database stores general information about each session, such as login name, login time, host machine form. which the session was logged in, operation system's process ID, current state, and so on. You may query this table to get all this information using the following query:

 
select sid, username, hostname, connected logint_time,  
hex(state) s_state 
from syssessions

The output looks like this:

 
    sid username   hostname        logint_time s_state 
 
233989 omcadmin   localhost        1041348844 0x00080021 
233982 omcadmin   gcsys-e1         1041348608 0x00080001 
233981 omcadmin   nysys4-e1        1041348608 0x00080001 
233980 omcadmin   nysys5-e1        1041348608 0x00080001 
233979 omcadmin   ffsys-e1         1041348608 0x00080001 
233973 omcadmin   nysys1           1041348608 0x00080001 
233781 wsadmin2   gcmmi            1041346036 0x00080001 
233697 omcadmin   localhost        1041344008 0x00080001 
233694 wsadmin4   nymmi1           1041343932 0x00080001 
233693 wsadmin4   nymmi1           1041343932 0x00080001 
230550 omcadmin   nysys1           1041293396 0x00080001 
230476 omcadmin   ffsys-e1         1041292665 0x00080001 
230421 omcadmin   gcsys-e1         1041292365 0x00080001 
230278 omcadmin   nysys5-e1        1041291208 0x00080001

S_tate is a hex number that indicates session's current activity state, refer to page 27-38 of Administrator's Guide for a detailed description of each of these activity state. Login_time is an integer that indicates the time the session logged in, which can easily be converted to a regular time format using a C program. You may request this C program from Informix technical support.

The syssesprof table in the sysmaster database provides more detailed information about each session. With the following query, you can get a better idea how each session is interacting with the database:

 
select sid, (isreads+bufreads+bufwrites+pagreads+pagwrites) access, locksheld,  
seqscans, total_sorts, dsksorts 
from syssesprof

The output is as follows:

 
  Sid	   access       locksheld     seqscans	  tot_sorts	disksorts 
233982      246              0          2            0           0 
230421     7789             12        456         1000           0 
225679     9981            213        669          876           2 
247869    10098            440        578           98           2 
    78       70             45          6            0           0 
   447       46             89          6            0           0

The access field shows how often the session is hitting the database. Locksheld shows how many locks each session is using. Seqscans indicates how frequently each session is using sequential scans to access data; if this number is too high, say more than 100,000, you may want to question whether the session ever uses indexes to retrieve data, and examine its query execution plan more closely to see if it is optimal. Total_sorts and dsksorts indicate how efficiently each session uses memory to do sort operations. You may calculate the percentage of how each session using memory to sort with the following formula:

 
((total_sorts - dsksorts)/total_sorts)*100

The higher the percentage, the more efficient are your sort operations. You can join syssessions table with syssesprof table to further identify each session's username and host machine name so that you will know where potential database and system problems are coming from. The following query can retrieve all this information:

 
select username, hostname, 
(isreads+bufreads+bufwrites+pagreads+pagwrites) access, locksheld, seqscans,  
total_sorts, dsksorts 
from syssessions s, syssesprof f 
where s.id =f.sid

You can also join the syssessions table with syslocks table to get more detailed information about locks, such as which table in which database is currently locked by which session to help you identify potential lock conflicts among users:

 
select owner, username, hostname, dbsname, tabname, type 
from syssessions s, syslocks l 
where sid  = owner 
and tabname not like "sys%"

The output looks something like this:

 
Owner	username	hostname	dbsname		tabname		type 
 
1422	wsine		apple	        prod	       customer        S 
1567	jlinder	  	sys3524         dev            products        S 
2237	ejhonson	case            prod           orders          X 
6679    cjz020          sys4800         dev            shipment        S 
889654  jfjianing       omega           test           prices          X 
77622   hong            build50         test           items           S

If there are some conflicts with lock usage, for instance if one user needs exclusive access to a table that has been locked by others, you can easily identity the owner of that lock and, based on the priority of users, issue onmode -z sid command to kill the session and release the lock; sid is a number picked up from owner field in the above output; please note that only user "Informix" can execute this command.

Query statistics

Query statistics are critical in troubleshooting and query optimization. The onstat -g sql sid command catches queries and related statistics for current sessions; where sid is the session ID that can be plugged in either manually by hard code or dynamically by UNIX shell scripts. For example, if you want to know what query a session is executing, you may first use the command onstat -g ses to find out its session ID and then plug this ID into the above command. For instance, if the session you want to monitor has ID 28953, you can then use onstat -g sql 28953 to catch its current query.

The output looks as follows:

 
Informix Dynamic Server 2000 Version 9.21.UC4     -- On-Line -- Up 38 days 
11:26:22 -- 1654784 Kbytes 
 
session                                      #RSAM    total      used       
id       user     tty      pid      hostname threads  memory     memory     
134709   omcadmin 3        29580    localhos 1        65536      61120      
 
tid      name     rstcb    flags    curstk   status 
147311   sqlexec  6511e728 Y--P---  1648     6511e728 cond wait(netnorm) 
 
Memory pools    count 1 
name         class addr     totalsize freesize #allocfrag #freefrag  
134709       V     669b9020 65536     4416     162        6          
 
name           free       used           name           free       used       
overhead       0          1648           scb            0          96         
opentable      0          6000           filetable      0          920        
log            0          2152           temprec        0          1608       
keys           0          192            ralloc         0          20480      
gentcb         0          1256           ostcb          0          2520       
sort           0          56             sqscb          0          11960      
sql            0          40             rdahead        0          640        
hashfiletab    0          280            osenv          0          1584       
buft_buffer    0          4272           sqtcb          0          3240       
fragman        0          2176            
 
Sess  SQL            Current            Iso Lock       SQL  ISAM F.E. 
Id    Stmt type      Database           Lvl Mode       ERR  ERR  Vers 
134709 SELECT         mso_db             CR  Not Wait   0    0    9.03 
 
Current SQL statement : 
  SELECT ne_type, config_set_version FROM ne WHERE ne_inst = 46176 AND 
    msospace_id = 1 
 
Last parsed SQL statement : 
  SELECT ne_type, config_set_version FROM ne WHERE ne_inst = 46176 AND 
    msospace_id = 1

The first section of the output gives you some general statistics about the query being executed, such as which database the query is executed against and its isolation level and lock mode. Two fields most interesting are SQL error and ISAM error. If these fields are non-zero, it indicates that the query is having some problems and is not executed properly. You may use Informix finderr utility to find out exactly what the problem is and then, after correcting the problem, kill the session and executed the query again.

The "Current SQL statement" section displays the complete SQL syntax of the query being executed. This is very helpful in diagnosing a problem query and query optimization. If you find some problems with the query such as its response time is too long, and it consumes too much system CPU or memory, you can make a copy of the query as it is displayed for later investigation and analysis. You can then run the same query against Informix dbaccess utility and get more detailed statistics on the query's performance such as its execution plan and joint strategy to identify the root cause of the problem. Based on the statistics you collect from you study, you can further optimize your query for better performance. The "Last parsed SQL statement" section displays the query that has been parsed in the memory. Since the query that has been parsed in the memory is, in most cases, the query that is being executed, these two sections are the same most of the times.

Final tips

To take even more active approach in monitoring, you can modify the alarm program provided by Informix during IDS installation. The alarm program is actually a UNIX shell script. that is automatically called by IDS when certain errors occur. IDS classifies all instance errors into five severity levels; one being the lowest and five the highest. You can set up the alarm program so that it can email DBA or sends a message to his or her pager when instance assertion failures occur. For detailed information on how to modify the alarm program and sample programs, refer to Administrator's Guide and Performance Guide.

In addition, you should perform. some necessary maintenance routines to ensure that your databases are running healthy. Some basic routines are:

Verifying and repairing data and index integrity
Updating internal statistics for the query optimizer
Recycling unused shared memory segments

Conclusion

Monitoring Informix Dynamic Server's performance is an on-going task. The value of this task is not so much in collecting statistics, but in identifying and troubleshooting potential system and database problems.

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/611609/viewspace-683459/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/611609/viewspace-683459/

csdn20142

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Monitoring Informix Dynamic Server for Higher Performance

Monitoring Informix Dynamic Server for Higher Performancefrom: http://www.informix.com.ua/articles/mon...
复制链接

扫一扫