Many books and papers elaborate on Informix® Dynamic Server® (IDS), its architecture and performance tuning, but very few specialize on the topic of monitoring. Yet, effective monitoring is very critical in IDS administration. It helps us to collect valuable system and database performance statistics and to identify problems at very early stage, so that we can be proactive in troubleshooting and performance tuning. Once the Informix Dynamic Server is successfully installed and configured and Informix databases implemented, monitoring Informix Dynamic Server becomes the database administrator's top priority.
This article will discuss in detail how to effectively monitor Informix Dynamic Server at all levels and will provide some general tips on identifying Informix engine and database problems. It will cover troubleshooting as well as performance tuning.
Informix provides two major tools to monitor system and database performance:
- The onstat utility.
- Numerous system monitoring interface (SMI) tables in the sysmaster database, which is created automatically at the time of IDS first initialization.
Both the onstat utility and SMI tables monitor IDS performance by examining IDS shared memory activities, but there is a difference in the way of presenting those statistics. The onstat utility always presents statistics in a fixed way, whereas using SMI tables permits you to reorganize those statistics in a more meaningful, more readable format.
One thing we need to pay attention to is that statistics collected either by onstat or in the SMI tables are cumulative from the time of system reboot or IDS initialization. Therefore, we need to be very careful about those statistics, and always take into account IDS running time. For example, 100,000 bufwaits for a server that has been running over one month is a lot different that 100,000 bufwaits in a single day. To get current statistics, we need to do onstat -z to zero out the old values.
Informix also provides a graphical monitoring tool - onperf. Onperf collects IDS server performance statistics and plots it into metrics. It can also save those statistics into a text file for later analysis. Refer to the Performance Guide for Informix Dynamic Server for more details on the onperf utility.
IDS activities can be classified into three categories:
- Instance activities
- Database activities
- Session activities
Using the tools discussed above, we can effectively monitor all those IDS activities.
An IDS instance refers to Informix shared memory, Informix processors, Informix databases and physical devices allocated to Informix. The following are some of the most important instance activities we need to monitor.
The first and most important instance activity is, of course, the operating mode of IDS. Is IDS running all right or is it having some problems, or is it down? The onstat -p command catches the current operating mode of IDS as follows:
Informix Dynamic Server 2000 Version 9.21.UC4 -- On-Line -- Up 01:01:17 -- 1654784 Kbytes Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 86923 101304 3116565 97.21 1651 15022 26196 93.70 isamtot open start read write rewrite delete commit rollbk 2585879 118500 286631 1032967 1972 914 2 2 0 gp_read gp_write gp_rewrt gp_del gp_alloc gp_free gp_curs 0 0 0 0 0 0 0 ovlock ovuserthread ovbuff usercpu syscpu numckpts flushes 0 0 0 478.11 71.63 13 26 bufwaits lokwaits lockreqs deadlks dltouts ckpwaits compress seqscans 3502 0 7065882 0 0 0 1266 11280 ixda-RA idx-RA da-RA RA-pgsused lchwaits 10120 51 69387 79557 482 |
We can also query the sysprofile table in the sysmaster database to get same statistics.
The first line of the output displays the current IDS operating mode. In this case, Informix engine is "On-Line". There are six operating modes, of which three are particularly important: Off-Line, Quiescent, and On-Line. The Off-Line mode indicates that IDS is not running. The Quiescent mode indicates that IDS is running in a single user mode where only DBA can do administrative and maintenance work. The On-Line mode indicates that IDS is running normal and all users can connect to the database server and perform. any kind of database operations. In most circumstances, IDS should always be in the On-Line mode. If for some reason, IDS is down or in the Off-Line mode, the above command will display the following message:
Shared memory not initialized for INFORMIXSERVER 'cassprod_shm' |
In this case, you will need to check the message log or the Informix online log to further identify the root cause of the problem (see Message Log).
In addition to the current operating mode, the above output also provides some important Informix instance performance statistics. The two %cache fields indicate how effectively IDS is using memory cache. The first %cache field shows the percentage of read cache rate, and the second one shows the write cache rate. Read cache rate and write cache rate can vary dramatically depending upon the applications and the type and size of the data being operated on. But, in general, both read cache rate and write cache rate should be in the 80 to 90th percentile. Those are fairly conservative numbers and should be adjusted according to specific environment. If these rates are consistently lower than 80%, you need to consider increasing the value of the BUFFERS parameter in your Informix configuration file to achieve higher read and write cache rates. Low read and write cache rates indicate IDS is doing a lot more disk reads and writes than it should, which will greatly slow down overall database engine performance.
The seqscan field of the output indicates how many sequential scans have been performed since the database has been up or online. If this number is large, say over 100,000, and ever increasing, it may indicate some performance problems, especially if your system is in OLTP environment. You then need to do some further investigations to find out the root cause of excessive sequential scans. We will discuss this in more detail later in this article.
The ovlock field indicates the number of times IDS attempted to exceed the maximum number of locks. If this number is non-zero, you may need to increase the value of the LOCKS parameter in the configuration file. The ovbuf field indicates the number of times IDS attempted to exceed the maximum number of buffers. If this number is huge, say more than 100,000, then you need to increase your BUFFERS parameter so that users do not have to wait for buffers when they need to access data from disk. This will reduce response time and, hence improve overall performance. We need also examine LRU related parameters and adjust their values to lower bufwaits. Refer to the Administrator's Guide for Informix Dynamic Server for more details.
Another group of important fields includes ixda-RA, idx-RA, da-RA, and RA-pgused. Together, these fields indicate how effectively IDS is using the Informix read-ahead mechanism. Read-ahead is the operation that reads ahead the number of data pages from disks into memory during sequential scans or index reads. The ideal situation is that the number of pages read ahead, that is the sum of ixda-RA, idx-RA and da-RA, is equal to the number of pages used during the sequential scans or index reads, that is RA-pgused. This indicates that the read-ahead pages are used 100% for sequential scans and index reads. If there is a significant difference between these two numbers, say more then 10,000, either positive or negative, IDS is not using read-ahead very efficiently, and you may need to tune your read-ahead parameters, that is RA_PAGES and RA_THRESHOLD for better performance. Refer to Administrator's Guide for Informix Dynamic Server (referred to as Administrator's Guide in this article) for details on how to tune these parameters.
The message log is also referred to as the online log. It contains a variety of information about critical instance activities, such as time and duration of checkpoints, instance startup and shutdown, backup and recovery status, logical log backup status, and changes to major configuration parameters. Message log also contains critical errors (referred to as assertion failures by Informix) such as disk I/O errors, mirroring errors, down chunks, data integrity errors, shared memory errors and so on. When assertion failures occur, message log will usually point us to the relative assertion failure ("af.xxx") file, which records more detailed information about instance activities when database engine went down and also will give us some suggestions on how to fix the problem. The following is an excerpt from message log:
00:57:53 00:57:53 Assert Failed: Unexpected virtual processor termination, pid =586, exit = 0x9 00:57:53 Who: Session(13709, omcadmin@nvlsys, 6538, 654709000) Thread(13740, sqlexec, 2704a558, 1) 00:57:53 Results: Fatal Internal Error requires system shutdown 00:57:53 Action: Restart OnLine 00:57:53 See Also: /var/tmp/af.35acfee1 00:57:53 Stack for thread: 13740 sqlexec |
The above output tells us that one of Informix virtual processors got terminated and corrupted database engine. This happened on the machine by the name of nvlsys when user "omcadmin" logged in and was performing some database operations, mostly improper execution of SQL queries. The file /var/tmp/af.35acfeel recorded the detailed statistics about the database engine status when the error occurred.
Chunks are the physical storage devices. They should always be online. If any chunks is down, it indicates data corruption and requires immediate attention. The onstat -d command monitors current chunk status and the following is the output of this command:
Informix Dynamic Server 2000 Version 9.21.UC4 -- On-Line -- Up 7 days 23:35:56 -- 1654784 Kbytes Dbspaces address number flags fchunk nchunks flags owner name 6510c7d0 1 0x1 1 1 N informix rootdbs 65866468 2 0x1 2 4 N informix airgen_idx_dbs 658665b0 3 0x1 3 3 N informix spare 658666f8 4 0x1 4 5 N informix logs 65866840 5 0x1 5 2 N informix pm1 65866988 6 0x1 7 1 N informix pm_gen 65866ad0 7 0x2001 8 1 N T informix temp_dbspace2 65866c18 8 0x1 10 2 N informix pm2 65866d60 9 0x1 11 3 N informix airgen_main_dbs 65866ea8 10 0x1 14 1 N informix mso_meta. 65867018 11 0x1 16 2 N informix pm3 65867160 12 0x2001 18 1 N T informix temp_dbspace3 658672a8 13 0x2001 20 1 N T informix temp_dbspace1 658673f0 14 0x1 25 2 N informix pm4 65867538 15 0x2001 29 1 N T informix temp_dbspace4 15 active, 2047 maximum Chunks address chk/dbs offset size free bpages flags pathname 6510c918 1 1 0 63069 51985 PO- /usr/informix/dblink 6514b5f0 2 2 65000 750000 1 PO- /usr/informix/dblink 6514b760 3 3 815000 60000 59747 PO- /usr/informix/dblink 6514b8d0 4 4 875000 125000 4947 PO- /usr/informix/dblink 6514ba40 5 5 0 1000000 299290 PO- /usr/informix/dblink1 6514bbb0 6 2 0 1000000 207877 PO- /usr/informix/dblink2 6514bd20 7 6 0 200000 179043 PO- /usr/informix/dblink3 6514be90 8 7 200000 250000 249939 PO- /usr/informix/dblink3 6510ca88 9 3 450000 250000 249997 PO- /usr/informix/dblink3 6510cbf8 10 8 0 1000000 299086 PO- /usr/informix/dblink4 6510cd68 11 9 0 1000000 4 PO- /usr/informix/dblink5 6513c830 12 9 0 500000 10 PO- /usr/informix/dblink6 6513c9a0 13 8 500000 300000 299997 PO- /usr/informix/dblink6 6513cb10 14 10 800000 200000 27596 PO- /usr/informix/dblink6 6513cc80 15 9 0 1000000 782331 PO- /usr/informix/dblink7 6513cdf0 16 11 0 1000000 296827 PO- /usr/informix/dblink8 65865018 17 4 0 400000 9997 PO- /usr/informix/dblink9 65865188 18 12 400000 250000 249947 PO- /usr/informix/dblink9 658652f8 19 5 0 300000 299997 PO- /usr/informix/dblink10 65865468 20 13 300000 250000 249947 PO- /usr/informix/dblink10 658655d8 21 4 550000 150000 14997 PO- /usr/informix/dblink10 65865748 22 4 0 350000 4997 PO- /usr/informix/dblink11 658658b8 23 11 350000 300000 299997 PO- /usr/informix/dblink11 65865a28 24 2 0 1000000 999997 PO- /usr/informix/dblink12 65865b98 25 14 0 1000000 299014 PO- /usr/informix/dblink13 65865d08 26 2 0 750000 749997 PO- /usr/informix/dblink14 65865e78 27 4 750000 250000 39997 PO- /usr/informix/dblink14 65866018 28 14 0 300000 299997 PO- /usr/informix/dblink15 65866188 29 15 300000 250000 249939 PO- /usr/informix/dblink15 658662f8 30 3 550000 50000 49997 PO- /usr/informix/dblink15 30 active, 2047 maximum |
The above output contains two sections. The first section lists all dbspaces, and the second lists all chunks. In the Chunk section, we need to pay close attention to flags field. The first character of this field indicates if the chunk is a primary (P) chunk or a mirrored (M) chunk. The second character indicates the current status of the chunk, whether it is online (O) or off-line (D). Since O and D look very much alike, especially when you are in a hurry, you may want to pipe the results to grep PD, that is onstat -d |grep PD, just to make sure that you don't miss any down chunks. If there are any down primary chunks, you need to perform. either a cold or warm recovery from backup tapes immediately to ensure data integrity. We can also query syschunks and sysdbspaces tables in the sysmaster database to get similar statistics.
A checkpoint is the process of synchronizing pages on disk with pages in the shared memory buffer pool. During checkpoints, IDS prevents user threads form. entering critical session and blocks all transaction activities. Thus, if checkpoint duration is long, users may experience system hanging. This is especially true in OLTP environments where there are thousands of transactions and the response time is most critical. As noted above, you can monitor checkpoint duration by looking at the message log, but the better and faster way to do this is to use onstat -m command. Below is the sample output of this command:
15:25:10 Checkpoint Completed: duration was 0 seconds. 15:25:10 Checkpoint loguniq 231, logpos 0x1bb2018 15:35:30 Checkpoint Completed: duration was 19 seconds. 15:35:30 Checkpoint loguniq 231, logpos 0x31b9018 Fri Dec 20 11:48:02 2002 11:48:02 Checkpoint Completed: duration was 7 seconds. 11:48:02 Checkpoint loguniq 231, logpos 0x32e5018 14:27:37 Logical Log 231 Complete. 14:27:40 Process exited with return code 142: /bin/sh /bin/sh -c /usr/informix/etc/log_full.sh 2 23 "Logical Log 231 Complete." "Logical Log 231 Complete." 14:28:24 Checkpoint Completed: duration was 22 seconds. 14:28:24 Checkpoint loguniq 232, logpos 0x458018 14:38:46 Checkpoint Completed: duration was 7 seconds. 14:38:46 Checkpoint loguniq 232, logpos 0x10f5018 |
If the checkpoint duration is consistently longer than 10 seconds, you may need to reduce the values of LRU_MIN_DIRTY and LRU_MAX_DIRTY configuration parameters to achieve shorter checkpoint duration. Also, if the output of onstat -F shows extremely high chunk write, say more than 10,000 and this number is ever increasing, it may indicate one of two problems: either checkpoint intervals are too short and cleaners do not have enough time to write all modified buffers to disks between checkpoints, or there are too few AIO VPs to share the heavy disk writes during checkpoints. You then need to re-examine the settings of CKPINTVL, LRUS, CLEANERS, and NUMAIOVPS configuration parameters and increase their values accordingly. We may need also look at the output of onstat -F as a reference to determine the values of those parameters. For detailed information on how to tune those parameters, refer to Administrator's Guide.
It is very important for Informix database administrators to keep informed of the space in each dbspace. If one of the dbspaces is lacking or running out of space, then IDS will suffer. All kinds of problems may occur: we cannot import any databases, we can not create any tables and indexes, we can not even do insertion and update to any tables and indexes. This is very critical for production databases. We need to monitor the growth of each dbspaces so that we can take more proactive approaches to those problems. The script. below reports current space usage of each dbspace, and calculates their percentage.
select name dbspace, sum(chksize) allocated, sum(nfree) free, round(((sum(chksize) - sum(nfree))/sum(chksize))*100) pcused from sysdbspaces d, syschunks c where d.dbsnum = c.dbsnum group by name order by name |
The output is shown as follows:
dbspace allocated free pcused airgen_idx_dbs 1000000 763405 24 airgen_main_dbs 1500000 295789 80 llog 1000000 9947 99 rootdbs 50000 36220 28 temp1 250000 249947 0 temp2 250000 249939 0 |
This output helps us to identify dbspaces that are running out of space. To be proactive, consider allocating additional disk space to the dbspace when its disk usage approaches 90%; in this case, we need to pay very close attention to llog dbspace and allocate more space to it if possible to prevent it running out of space.
Dbsapce I/O is measured by disk reads and writes. If some dbspaces have heavy disk reads and writes while others scarcely have any, then the system may have some disk I/O bottlenecks. A well-balanced dbspace I/O will ease system disk I/O loads and, hence improve overall system performance. The following script. will display current I/O statistics for each dbspace:
select d.name, fname[15,25] path_name, sum(pagesread) diskreads, sum(pageswritten) diskwrites from syschkio c, syschunks k, sysdbspaces d where d.dbsnum = k.dbsnum and k.chknum = c.chknum group by 1, 2 order by 1 |
The output is as follows:
name path_name diskreads diskwrites airgen_idx_dbs uild95/ltmp 3672 7964 airgen_main_dbs uild95/ltmp 13545 32903 llog uild95/ltmp 19 51633 rootdbs uild95/ltmp 211 43117 temp1 uild95/ltmp 3015 3122 temp2 uild95/ltmp 3218 3317 |
The goal is to achieve balanced disk reads and writes across all dbspaces. This is unrealistic in most cases, but the above output at least gives you an idea of how dbspace I/O is distributed and can help you to identify "hot" dbspaces - those dbapces with the most disk reads and writes. If disk reads and writes are extremely high for some dbspaces and extremely low for others, you may need to adjust or even rearrange your physical disk layout for Informix engine. We can get similar information using onstat -D, which displays disk reads and writes for each chunk, and onstat -g ioq which displays disk I/O waiting queue information.
You can further identify what tables have the most disk reads and writes by querying the sysptprof table in the sysmaster database:
select dbsname, tabname, (isreads + pagreads) diskreads, (iswrites + pagwrites) diskwrites from sysptprof order by 3 desc, 4 desc |
The output looks like this:
dbsname tabname diskreads diskwrites airgen_10_0 fanout_param 84567 3094 airgen_cm_db sysindices 78381 0 airgen_10_0 ne_nmo_i 75819 5 airgen_10_0 ne_nmo 75440 297 airgen_cm_db sysprocbody 62610 28322 airgen_10_0 systables 37342 466 airgen_10_0 syscolumns 34539 4609 airgen_10_0 457_484 32838 42 airgen_10_0 453_480 30009 1 airgen_10_5_old syscolumns 29531 4550 airgen_10_5 syscolumns 28824 4552 airgen_10_0 456_483 25448 14 airgen_10_0 458_485 23278 177 airgen_10_5_old 452_483 22412 31 |
Based upon the output you get from this query, you may need to move some tables among dbspaces to get better disk I/O balance.
Too many virtual share memory segments, usually more than three, indicate that the initial virtual share memory segment is too small and that the database engine has to constantly allocate additional virtual sheared memory segments. This adversely effects IDS performance and will eventually bring your system to its knees. The onstat -g seg command displays how many shared memory segments your Informix database engine currently has:
Informix Dynamic Server 2000 Version 9.21.UC4 -- On-Line -- Up 28 days 15:49:33 -- 205824 Kbytes Segment Summary: id key addr size ovhd class blkused blkfree 0 1381386241 a000000 177209344 220688 R 42984 280 1 1381386242 14900000 8388608 856 V 2048 0 2 1381386243 15100000 1048576 632 M 164 92 3 1381386244 15200000 8388608 856 V 2048 0 4 1381386245 15a00000 8388608 856 V 2008 40 5 1381386246 16200000 8388608 856 V 50 1998 Total: - - 211812352 - - 49302 2410 (* segment locked in memory) |
If the output shows more than three virtual shared memory segments, you need to increase the value of SHMVERSIZE parameter in your configuration file. The idea is to let IDS allocate enough virtual shared memory at initialization so that it doesn't need to allocate more when users are logging onto the system and doing database operations. You may also want to use UNIX® ipcs command to see sizes of Informix shared memory. For detailed information on how to calculate IDS virtual shared memory segment size, refer to the Administrator's Guide.
Overall operating system performance
Since the Informix database engine is always installed on an operating system, mostly UNIX, to accurately monitor or evaluate IDS performance, we need to take into account the behavior. of the operating system as a whole, especially if the database engine resides on a non-dedicated database server. If IDS takes too much RAM, for instance, if your system has 512MB RAM and IDS takes 400MB or more for its shared memory, your operating system may experience heavy swapping and hanging when users perform. memory-intense operations. When there is not enough memory, the system has to swap some data pages from memory to disk to leave more spaces for new data. And if the system is short of memory, the CPU may also suffer. There are many UNIX utilities to monitor overall operating system CPU and memory utilization. Below is the output from "top" utility:
load averages: 1.12, 1.02, 1.07 10:17:30 123 processes: 120 sleeping, 1 zombie, 2 on cpu CPU states: 70.5% idle, 26.5% user, 2.8% kernel, 0.3% iowait, 0.0% swap Memory: 3072M real, 76M free, 588M swap in use, 440M swap free PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND 28349 omcadmin 4 0 0 86M 55M cpu10 970:25 6.85% CS_App.prt 17782 informix 5 30 -10 1631M 1594M sleep 50.0H 4.66% oninit.exe 17784 informix 5 59 -10 1631M 1594M sleep 102.9H 4.12% oninit.exe 17786 informix 5 59 -10 1631M 1591M sleep 25.5H 2.53% oninit.exe 571 root 1 58 0 361M 129M sleep 19.0H 1.36% em_mis 17785 informix 5 59 -10 1631M 1592M sleep 57.8H 1.05% oninit.exe 5470 omcadmin 1 0 0 1960K 1408K cpu15 0:00 0.26% top |
The output is has two sections. The first section gives you a summary of the overall operating system's CPU and memory usage, and the second one provides detailed information about each processor. Other utilities, such as vmstat, iostat, ps -ef and sar, are also useful in collecting current operation system performance statistics. Vmstat shows how much memory operating systems is currently swapped; iostat and sar display current disk I/O distribution among all physical disks; and ps -ef prints out detailed information about each current processor's login time, CPU, and memory usage. In addition, there are many graphic tools available that allow you to graph the dynamic movements of your operating system's resource utilization and performance.
The purpose of monitoring database activities is to ensure that every database is performing at its peak ability at all time. This implies that you need to watch for potential performance problem, identifying their root cause and eliminating them from the very beginning. Here are few things to watch for.
An extent is a block of physically contiguous pages. However, if a table has more than one extent, there is no guarantee that those extents are contiguous; extents may be scattered throughout the dbspace where the table resides. Contiguity of physical pages is important to performance. When pages of data are contiguous, the time used to access data on disk is minimized and the database can read rows sequentially. If tables have too many extents, it is very likely that those extents are interleaved. This harms performance a great deal, because when you retrieve data for a certain table, the disk head needs to seek multiple, noncontiguous extents belonging to this table instead of one large extent with contiguous physical pages. This slows down disk-seeking speed rather significantly. The following script. detects database tables with multiple extents:
select t.tabname, count(*) num_ext from sysmaster:sysextents e, airgen:systables t where e.tabname=t.tabname and dbsname = "airgen" and t.tabname not like "sys%" group by 1 having count(*) > 1 order by 2 desc |
The output is as follows:
tabname num_ext nmoattrclassmap 14 attrclass 11 networkmoclass 3 fanout_param 3 fanout_comp 2 ne_nmo 2 nenmoclassmap 2 join_map 2 |
If any tables has more than 10 extents, with the exception of large fragmented tables, you should consider rebuilding them to consolidate the extents. With larger databases, or poorly sized databases, we might also be concerned with maximum number of extents or the dreaded 32GB limit on indices. For details on how to estimate and allocate extent size for tables, refer to Performance Guide.
The number of index levels may also adversely effects performance. The more index levels, the more probes IDS needs to get to index leaf nodes. Furthermore, if a leaf node gets split or merged, it may take more time for the whole index to adjust to this change. For example, if an index has only two levels, only two levels need to be adjusted, but if it has four levels, then all four levels need to be adjusted accordingly. The time used for this adjustment is, of course, much longer. This is especially true in an OLTP environment where there are constant inserts, deletes, and updates which will cause indexes to be constantly split or merged. The following script. identifies how many levels each index has:
select idxname, levels from sysindexes order by 2 desc |
The output looks as follows:
idxname levels objdesc 3 fanout_param_i 3 458_485 3 457_484 3 idxname 2 tabgtor 2 tabgtee 2 |
If any index has more than 4 levels, you may need to consider dropping and rebuilding it to consolidate its levels for better performance.
A highly duplicate index can severely impact performance for updates and deletes. Suppose you have an index on column customer_type on table customer, and there are only five possible customer_type codes. If the table has one million rows, there could potentially be 200,000 rows with the same customer_type code. The B-tree would store the key value, followed by a list of pointers to each of the physical rows. The problem occurs when you have to delete or update any of the key values. IDS must search through all the duplicates until it finds the correct key to delete or update!
The following is a script. to identify highly duplicate indexes:
select tabname, idxname, nrows, nunique from systables t, sysindexes I where t.tabid =i.tabid and t.tabid > 99 and nrows > 0 and nunique > 0 |
The output looks as follows:
tabname idxname nrows nunique bsc_dte bscdte_i 6 6 omcgttready 231_413 1 1 systemrelease 451_478 3 3 neclass 452_479 31 12 sysrelneclassmap 453_480 33 3 proxynemgrmap 454_481 1 1 networkmoclass 455_482 362 199 nenmoclassmap 456_483 492 12 attrclass 457_484 1191 924 nmoattrclassmap 458_485 2901 199 fanout_comp fanout_comp_i 915 199 fanout_comp fanout_comp_i2 915 199 fanout_comp fanout_comp_i3 915 82 fanout_param fanout_param_i 2894 196 |
Ideally, all values appearing in nunique column should equal all values in nrow column, meaning that every key in the index is unique. Based on the number of rows (nrows column above) and number of unique keys (nunique column above), we can calculate the percentage of uniqueness of each index:
(nunique/nrows)*100 |
The higher the percentage, the more unique the index is. To avoid the performance bottleneck of a highly duplicate index, you can replace the original index with a composite index that combines the highly duplicate column and a more unique column. Using the example above, you can add the primary key column customer_id to the original index and make it a composite index (for example, "create index index_name on customer (customer_type, customer_id)").
Sequential access to a table is sometimes harmful to performance, since database engine has to scan the whole table to pick up the rows that satisfy query's conditions. If tables are small, say couple of hundred rows, it is ok; because when database engine first scan it, the table will be reside in the memory and next time when database engine scan it, all data in that table can be retrieved direct from memory. This is actually the efficient way of using sequential scans. But if the tables are huge, say tables with more than 100,000 rows, repeated sequential scans are deadly to performance. The following script. will identify tables with multiple sequential scans:
select dbsname, tabname, sum(seqscans) tot_scans from sysptprof where seqscans > 0 and dbsname not like "sys%" group by 1,2 order by 3 desc |
The output is as follows:
dbsname tabname tot_scans airgen_10_0 systemrelease 2352 airgen_10_5_old systemrelease 1596 airgen_10_5 systemrelease 1596 airgen_10_0 fanout_comp 1587 airgen_10_5_old sysusers 1248 airgen_10_0 sysusers 1241 airgen_10_5 sysusers 1231 airgen_10_0 join_map 1036 airgen_10_0 fanout_param 958 airgen_10_0 func_call 770 airgen_10_5 nenmoclassmap 586 airgen_10_5_old nenmoclassmap 586 |
From the output above, you can see that airgen_10_0 table has a high number of sequential scans. If this is a big table with thousands or even millions of rows, you may need to think of adding some indexes to this table or using program directives to force internal query optimizer choose indexes for accessing data in this table rather than sequential scans.
Statistics about session activities are very useful in identifying potential performance problems and troubleshooting. What session activity statistics can we collect using the monitoring tools discussed earlier in the article?
The syssessions table in the sysmaster database stores general information about each session, such as login name, login time, host machine form. which the session was logged in, operation system's process ID, current state, and so on. You may query this table to get all this information using the following query:
select sid, username, hostname, connected logint_time, hex(state) s_state from syssessions |
The output looks like this:
sid username hostname logint_time s_state 233989 omcadmin localhost 1041348844 0x00080021 233982 omcadmin gcsys-e1 1041348608 0x00080001 233981 omcadmin nysys4-e1 1041348608 0x00080001 233980 omcadmin nysys5-e1 1041348608 0x00080001 233979 omcadmin ffsys-e1 1041348608 0x00080001 233973 omcadmin nysys1 1041348608 0x00080001 233781 wsadmin2 gcmmi 1041346036 0x00080001 233697 omcadmin localhost 1041344008 0x00080001 233694 wsadmin4 nymmi1 1041343932 0x00080001 233693 wsadmin4 nymmi1 1041343932 0x00080001 230550 omcadmin nysys1 1041293396 0x00080001 230476 omcadmin ffsys-e1 1041292665 0x00080001 230421 omcadmin gcsys-e1 1041292365 0x00080001 230278 omcadmin nysys5-e1 1041291208 0x00080001 |
S_tate is a hex number that indicates session's current activity state, refer to page 27-38 of Administrator's Guide for a detailed description of each of these activity state. Login_time is an integer that indicates the time the session logged in, which can easily be converted to a regular time format using a C program. You may request this C program from Informix technical support.
The syssesprof table in the sysmaster database provides more detailed information about each session. With the following query, you can get a better idea how each session is interacting with the database:
select sid, (isreads+bufreads+bufwrites+pagreads+pagwrites) access, locksheld, seqscans, total_sorts, dsksorts from syssesprof |
The output is as follows:
Sid access locksheld seqscans tot_sorts disksorts 233982 246 0 2 0 0 230421 7789 12 456 1000 0 225679 9981 213 669 876 2 247869 10098 440 578 98 2 78 70 45 6 0 0 447 46 89 6 0 0 |
The access field shows how often the session is hitting the database. Locksheld shows how many locks each session is using. Seqscans indicates how frequently each session is using sequential scans to access data; if this number is too high, say more than 100,000, you may want to question whether the session ever uses indexes to retrieve data, and examine its query execution plan more closely to see if it is optimal. Total_sorts and dsksorts indicate how efficiently each session uses memory to do sort operations. You may calculate the percentage of how each session using memory to sort with the following formula:
((total_sorts - dsksorts)/total_sorts)*100 |
The higher the percentage, the more efficient are your sort operations. You can join syssessions table with syssesprof table to further identify each session's username and host machine name so that you will know where potential database and system problems are coming from. The following query can retrieve all this information:
select username, hostname, (isreads+bufreads+bufwrites+pagreads+pagwrites) access, locksheld, seqscans, total_sorts, dsksorts from syssessions s, syssesprof f where s.id =f.sid |
You can also join the syssessions table with syslocks table to get more detailed information about locks, such as which table in which database is currently locked by which session to help you identify potential lock conflicts among users:
select owner, username, hostname, dbsname, tabname, type from syssessions s, syslocks l where sid = owner and tabname not like "sys%" |
The output looks something like this:
Owner username hostname dbsname tabname type 1422 wsine apple prod customer S 1567 jlinder sys3524 dev products S 2237 ejhonson case prod orders X 6679 cjz020 sys4800 dev shipment S 889654 jfjianing omega test prices X 77622 hong build50 test items S |
If there are some conflicts with lock usage, for instance if one user needs exclusive access to a table that has been locked by others, you can easily identity the owner of that lock and, based on the priority of users, issue onmode -z sid command to kill the session and release the lock; sid is a number picked up from owner field in the above output; please note that only user "Informix" can execute this command.
Query statistics are critical in troubleshooting and query optimization. The onstat -g sql sid command catches queries and related statistics for current sessions; where sid is the session ID that can be plugged in either manually by hard code or dynamically by UNIX shell scripts. For example, if you want to know what query a session is executing, you may first use the command onstat -g ses to find out its session ID and then plug this ID into the above command. For instance, if the session you want to monitor has ID 28953, you can then use onstat -g sql 28953 to catch its current query.
The output looks as follows:
Informix Dynamic Server 2000 Version 9.21.UC4 -- On-Line -- Up 38 days 11:26:22 -- 1654784 Kbytes session #RSAM total used id user tty pid hostname threads memory memory 134709 omcadmin 3 29580 localhos 1 65536 61120 tid name rstcb flags curstk status 147311 sqlexec 6511e728 Y--P--- 1648 6511e728 cond wait(netnorm) Memory pools count 1 name class addr totalsize freesize #allocfrag #freefrag 134709 V 669b9020 65536 4416 162 6 name free used name free used overhead 0 1648 scb 0 96 opentable 0 6000 filetable 0 920 log 0 2152 temprec 0 1608 keys 0 192 ralloc 0 20480 gentcb 0 1256 ostcb 0 2520 sort 0 56 sqscb 0 11960 sql 0 40 rdahead 0 640 hashfiletab 0 280 osenv 0 1584 buft_buffer 0 4272 sqtcb 0 3240 fragman 0 2176 Sess SQL Current Iso Lock SQL ISAM F.E. Id Stmt type Database Lvl Mode ERR ERR Vers 134709 SELECT mso_db CR Not Wait 0 0 9.03 Current SQL statement : SELECT ne_type, config_set_version FROM ne WHERE ne_inst = 46176 AND msospace_id = 1 Last parsed SQL statement : SELECT ne_type, config_set_version FROM ne WHERE ne_inst = 46176 AND msospace_id = 1 |
The first section of the output gives you some general statistics about the query being executed, such as which database the query is executed against and its isolation level and lock mode. Two fields most interesting are SQL error and ISAM error. If these fields are non-zero, it indicates that the query is having some problems and is not executed properly. You may use Informix finderr utility to find out exactly what the problem is and then, after correcting the problem, kill the session and executed the query again.
The "Current SQL statement" section displays the complete SQL syntax of the query being executed. This is very helpful in diagnosing a problem query and query optimization. If you find some problems with the query such as its response time is too long, and it consumes too much system CPU or memory, you can make a copy of the query as it is displayed for later investigation and analysis. You can then run the same query against Informix dbaccess utility and get more detailed statistics on the query's performance such as its execution plan and joint strategy to identify the root cause of the problem. Based on the statistics you collect from you study, you can further optimize your query for better performance. The "Last parsed SQL statement" section displays the query that has been parsed in the memory. Since the query that has been parsed in the memory is, in most cases, the query that is being executed, these two sections are the same most of the times.
To take even more active approach in monitoring, you can modify the alarm program provided by Informix during IDS installation. The alarm program is actually a UNIX shell script. that is automatically called by IDS when certain errors occur. IDS classifies all instance errors into five severity levels; one being the lowest and five the highest. You can set up the alarm program so that it can email DBA or sends a message to his or her pager when instance assertion failures occur. For detailed information on how to modify the alarm program and sample programs, refer to Administrator's Guide and Performance Guide.
In addition, you should perform. some necessary maintenance routines to ensure that your databases are running healthy. Some basic routines are:
- Verifying and repairing data and index integrity
- Updating internal statistics for the query optimizer
- Recycling unused shared memory segments
Monitoring Informix Dynamic Server's performance is an on-going task. The value of this task is not so much in collecting statistics, but in identifying and troubleshooting potential system and database problems.
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/611609/viewspace-683459/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/611609/viewspace-683459/