DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guid

2 篇文章 0 订阅

Although DB2® high availability disaster recovery (HADR) is billed as a feature that's easy to set up, customers often have problems picking the right settings for their environment. This article is a use case, and it shows how you can use the HADR simulator tool to configure and troubleshoot your HADR configuration in a real-world scenario. Using the examples and generalized guidance that this article provides, you should be able to test your own setups and pick the optimal settings.

High availability disaster recovery overview

HADR is an easy-to-use high availability and disaster recovery feature that uses physical log shipping from the primary database to the standby database. Transactional logs are shipped from the primary to the standby, which is typically in a different location than the primary, and then replayed on the standby. HADR performance relies on log shipping and replay performance. These two factors, in turn, depend on the system configuration and how well the system is tuned and maintained. The HADR system should be able to cope with varying log generation rates, network bandwidth, and various other performance-influencing factors. You can find generalized best practices on tuning and maintaining the HADR system by consulting existing documentation. This particular article, however, is an in-depth exploration of the technical details of tuning a real-world setup that provides a step-by-step guide that should help you understand how to tune the configuration of your HADR system. Although this article focuses on an HADR setup for DB2 for Linux®, UNIX®, and Windows® Version 9.7, it is also applicable to subsequent releases.

Influences on HADR replication

The performance of HADR replication is influenced by various factors, including but not limited to, the following factors:

  1. System configuration on primary and standby (such as CPU and memory)
  2. The setting for the hadr_syncmode configuration parameter
  3. Network bandwidth
  4. File system I/O rate on the primary and standby
  5. Workload on the primary
  6. Replay speed on the standby

This article helps you understand and evaluate each of these items. The goal is to develop an HADR configuration that performs well with the given infrastructure.

Evaluating infrastructure capacity

A key initial step in choosing your HADR configuration is evaluating your system’s capacity. This is not just a one-off exercise, however; due to the continuous business (and as a result, database) growth and changing business demands, your requirements likely change over time. This leads to subsequent changes in hardware and software configuration. To make sure the system can handle the growing database and workload, and can continue to meet the service level agreement, you need to evaluate the infrastructure capacity not only at the initial setup time but also in the run time at certain intervals.

This article walks through the sequence of steps required to evaluate the system capacity. In this process, you can use various operating-system-level commands and the HADR Simulator tool to calculate and understand how well the system performs given the current set of configurations. The HADR Simulator is a lightweight tool that estimates HADR performance under various conditions without even requiring you to start any databases. As its name suggests, the HADR Simulator simulates DB2 log write and HADR log shipping. You can find more information on the tool and download the executable from here.

The example system

The primary system used in the demonstration is located in Beaverton, Oregon, USA, and the standby system is in San Jose, California, USA. The distance between the sites is approximately 1,000 km (660 miles).

Figure 1. HADR setup and WAN used for the example throughout this article
HADR setup and WAN used for the example throughout this article

The installed DB2 product is DB2 Version 9.7 Fix Pack 5. The two hosts, hadrPrimaryHost and hadrStandbyHost, have the following hardware:

  1. hadrPrimaryHost:
    1. CPU: 4 x 2 GHz AMD Opteron 846
    2. Memory: 12GB
    3. Disk: 2 x 73GB, 3x146GB @ 15k RPM
    4. Operating system: SUSE Linux Enterprise Server v10 SP3
  2. hadrStandbyHost:
    1. CPU:2.6 GHz dual-core AMD Opteron
    2. Memory: 24GB
    3. Disk: 4 x 200GB
    4. Operating system: SUSE Linux Enterprise Server v10 SP3

Allocating storage in an HADR environment

When allocating the storage for a database, it is important to understand the various storage options available and the storage performances. A database primarily needs storage for the following things:

  1. Transactional log files
  2. Overflow log files if the overflow path is set
  3. Mirror log files if the mirror log path is set
  4. Table space data
  5. Log archiving if the logarchmeth1 configuration parameter is set, the logarchmeth2 configuration parameter is set, or both parameters are set

Transactional logs are written in a sequential order, whereas table space data is mostly written in random order based on the page being accessed and written. Allocate a fast writing device to store transactional log files.

Most devices have documentation that provides disk write performance; however, if you do not have this information, you can use the following method to approximate those values. We used the IBM DB2 HADR Simulator to perform large writes (4096 pages per write) and small writes (1 page per write) on all disks. The simulator writes multiple times and gives us both a range and an average for each type of write. We consider the throughput value achieved with large writes as the transfer rate or throughput of the device; that is when a big write is performed, most of the I/O is spent in writing and the overhead (seek time and disk rotation) is negligible. Conversely, most of the reported I/O time for a small write is spent on disk rotation and seek. As a result, we consider the throughput value achieved with the small writes as the overhead.

Perform the following tasks on the primary and standby hosts:

  1. List all the available storage on the system using the df command.
  2. Run the HADR Simulator with the -write option to calculate disk write speed for each file system listed by the df command.

    The -write option takes one argument: the file system with the file name to which the data is written. Use the -flushsize option to control the size of each write. The default value for flush size (16 pages) is sufficient for an OLTP system.

    ~/simhadr -write /filesystem/temp_file/file1 -flushsize 4096
  3. Allocate your storage according to the results in Step 2.

    On a system that does not use the DB2 pureScale Feature, the active log (containing transaction log records) is written to a single logical device by a single thread. Each transaction is written to the log but not necessarily to a table space on disk. An application commit must wait for logs to be written to the active log path (disk). Table space data updates are buffered and written asynchronously by sophisticated and efficient algorithms. As a result, the bottleneck is at the single thread that writes all of the transaction log records to the active log path sequentially. The file system allocated to active logs must handle the peak logging rate. To handle the peak logging rate, choose the best performing disk for active logs. For the archive path, the file system should provide greater than average logging rate throughput. At the peak logging time, the archive could fall behind but it catches up at the non-peak logging time, assuming there is enough space on active log path to buffer peak time logs.

Here are these steps and their corresponding output for our example system:

  1. On the primary (hadrPrimaryHost)
    df
                      
    Executing df command on hadrPrimaryHost -
    df -kl 
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda7 20641788 6020460 13572688 31% /
    udev 5931400 176 5931224 1% /dev
    /dev/sda6 313200 40896 272304 14% /boot
    /dev/sda9 1035660 1728 981324 1% /notnfs
    /dev/sda8 2071384 3232 1962928 1% /var/tmp
    /dev/sdb2 66421880 48542864 14504968 77% /work1
    /dev/sdc2 136987020 1852144 128176324 2% /work2
    /dev/sdd2 136987020 36976444 93052024 29% /work3
    /dev/sde2 136987020 1631756 128396712 2% /work4
    /dev/sda10 42354768 28268360 11934908 71% /work5
    tmpfs                  4194304     12876   4181428   1% /tmp
                      
    ---

    The command is run on all devices. Comparing the results, the file system /work3/kkchinta/ performed better. Here are the results for this disk:

    ~/simhadr -write /work3/kkchinta/simhadr.tmp -verbose -flushsize 4096
                      
    Measured sleep overhead: 0.003709 second, using spin time 0.004450 second.
    Simulation run time = 4 seconds
                      
    Writing to file /work3/kkchinta/simhadr.tmp
    Press Ctrl-C to stop.
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
                      
    Total 13 writes in 4.109773 seconds, 0.316136 sec/write, 4096 pages/write
                      
    Total 218.103808 MBytes written in 4.109773 seconds. 53.069551 MBytes/sec
                      
    Distribution of write time (unit is microsecond):
    Total 13 numbers, Sum 4109773, Min 303356, Max 330640, Avg 316136
    From 262144 to 524287             13 numbers
                      
    ---
                      
    ~/simhadr -write /work3/kkchinta/simhadr.tmp -verbose -flushsize 1
                      
    Total 3581 writes in 4.000320 seconds, 0.001117 sec/write, 1 pages/write
                      
    Total 14.667776 MBytes written in 4.000320 seconds. 3.666651 MBytes/sec
                      
    Distribution of write time (unit is microsecond):
    Total 3581 numbers, Sum 4000320, Min 325, Max 25220, Avg 1117
    From 256 to 511                 1143 numbers
    From 512 to 1023                2217 numbers
    From 1024 to 2047                  1 numbers
    From 2048 to 4095                  9 numbers
    From 4096 to 8191                 86 numbers
    From 8192 to 16383               105 numbers
    From 16384 to 32767               20 numbers
                      
    ---
  2. On the standby (hadrStandbyHost)
    df
                      
    Executing df command on hadrStandbyHost -
    df -khl
    Filesystem            Size  Used Avail Use% Mounted on
    /dev/sda3             276G  151G  125G  55% /
    udev                   12G  216K   12G   1% /dev
    /dev/sda1             134M   96M   38M  72% /boot
    /dev/sdb1             280G  220G   60G  79% /home
    /dev/sdc1             181G   80G   93G  47% /perf1
    /dev/sdd1             181G  116G   57G  68% /perf2
    /dev/sde1             181G   58G  115G  34% /perf3
    /dev/sdf1             181G  116G   57G  68% /perf4
    /dev/sdg1             181G   58G  115G  34% /perf5
    /dev/sdh1             181G  116G   57G  68% /perf6
    /dev/sdj1             181G  147G   26G  86% /perf8
    /dev/sdi1             136G   58G   79G  43% /perf7
    /dev/md0              139G   99G   40G  72% /stripe
                      
                      
    ---
                      
    simhadr -write /perf5/kkchinta/simhadr.tmp -verbose -flushsize 4096
                      
    Measured sleep overhead: 0.003970 second, using spin time 0.004764 second.
    Simulation run time = 4 seconds
                      
    Writing to file /perf5/kkchinta/simhadr.tmp
    Press Ctrl-C to stop.
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
    Writing 4096 pages
                      
    Total 16 writes in 4.252596 seconds, 0.265787 sec/write, 4096 pages/write
    
    Total 268.435456 MBytes written in 4.252596 seconds. 63.122727 MBytes/sec
                      
    Distribution of write time (unit is microsecond):
    Total 16 numbers, Sum 4252596, Min 246759, Max 328503, Avg 265787
    From 131072 to 262143              9 numbers
    From 262144 to 524287              7 numbers
                      
    ---
                      
    simhadr -write /perf5/kkchinta/simhadr.tmp -verbose -flushsize 1
                      
    Total 165 writes in 4.018807 seconds, 0.024356 sec/write, 1 pages/write
                      
    Total 0.675840 MBytes written in 4.018807 seconds. 0.168169 MBytes/sec
                      
    Distribution of write time (unit is microsecond):
    Total 165 numbers, Sum 4018807, Min 10614, Max 110876, Avg 24356
    From 8192 to 16383                26 numbers
    From 16384 to 32767              127 numbers
    From 32768 to 65535               11 numbers
    From 65536 to 131071               1 numbers
                      
    ---

Table 1 and Table 2 show the disk performance for the primary and standby:

Table 1. Performance results for hadrPrimaryHost
DiskSpeed
/work3/kkchinta63.122727 MB/s
Table 2. Performance results for hadrStandbyHost
DiskSpeed
/perf5/kkchinta/63.122727 MB/s

Based on these results, the recommended file system allocation is as follows:

  • On hadrPrimaryHost:
    • DB2 transactional log files: /work3/kkchinta (53.069551 MB/s)
    • Table space data: /u/kkchinta
    • Log archive: /work4/kkchinta
  • On hadrStandbyHost:
    • DB2 transactional log files: /perf5/kkchinta/ (63.122727 MB/s)
    • Table space data: /home/kkchinta
    • Log archive: /work1

Choosing an HADR synchronization mode

The HADR synchronization mode determines the degree of protection your HADR database solution has against transaction loss. Choosing the correct synchronization mode is one of the most important configuration decisions that you have to make because achieving the optimal network throughput and performance from your HADR pair is part of satisfying your business's service-level agreement. At the same time, a variety of factors have an impact on how fast transactions are processed. In other words, there can be a trade off between synchronization and performance.

The synchronization mode determines when the primary database considers a transaction complete. For the modes that specify tighter synchronization, SYNC and NEARSYNC, this means that the primary waits for an acknowledgement message from the standby. For the looser synchronization modes, the primary considers a transaction complete as soon as it sends the logs to the standby (ASYNC) or as soon as it writes the logs to its local log device (SUPERASYNC).

Although the general rule would be to choose a synchronization mode based on network speed, there are a number of other things to consider when choosing your synchronization mode:

  • Distance between the primary and standby site:

    At a high level, the suggested synchronization modes are follows:

    • SYNC if the primary and standby are located in the same data center
    • NEARSYNC if the primary and standby are located in different data centers but same city limits
    • ASYNC or SUPERASYNC if the primary and standby are separated by great distances

    As stated earlier, the distance between the sites in our example scenario is approximately 1,000 km (660 miles).

  • Network type between the primary and the standby:

    The general recommendation is to use SYNC or NEARSYNC for systems over a LAN and ASYNC, SUPERASYNC for systems over a WAN.

In our example scenario, a WAN connects the primary and standby sites:

  • Memory resources on the primary and standby

    The primary system has 12GB and the standby system has 24GB.

  • Log generation rate on the primary

    Defining the workload and estimating the amount of log data generated (as well as the flush size) is necessary to enable smooth log shipping and replay on standby.

You should estimate the number of write transactions per second that take place in your business and the maximum amount of data (transactional logs) written by each transaction. Alternatively, you can do a quick test run on a standard database. The equation for log generation rate is the following:

Total data generated/sec = num. of transaction per sec × data per transaction

In addition, if you are using the standby purely for disaster recovery and can tolerate some risk of data loss, you might also choose one of the less synchronous modes.

Using the HADR Simulator to determine performance of different synchronization modes

The best way to see how your HADR deployment will perform under different synchronization modes is to use the HADR Simulator to measure throughput and performance under different modes. Use the following command to describe your HADR setup to the simulator:

~/simhadr -role HADR_ROLE_value -lhost HADR_LOCAL_HOST_value 
          -lport HADR_LOCAL_PORT_value -rhost HADR_REMOTE_HOST_value 
          -rport 	HADR_REMOTE_PORT_value -syncmode HADR_SYNCMODE_value 
          -flushSize value -sockSndBuf TCP_socket_send_value 
          -sockRcvBuf TCP_socket_receive_value -disk transfer_rateoverhead

The HADR Simulator supports only port numbers. It does not support service names for the -lport and -rport options.

Choosing a value for -flushSize

The flush size is nondeterministic, so for the purposes of choosing a synchronization mode, keep the default setting of 16.

Choosing a value for -sockSndBuf and -sockRcvBuf

These parameters specify the socket send and receive buffer size for the HADR connection. On most platforms, the TCP buffer size is the same as the TCP window size. If the TCP window size (defined below) is too small, the network cannot fully utilize its bandwidth, and applications like HADR experience throughput lower than the nominal bandwidth. On WAN systems, you should pick a setting that is larger than the system default because of the relatively long round-trip time. On LAN systems, the system default socket buffer size is usually large enough because round-trip time is short.

The rule of thumb for choosing the appropriate TCP window size is:

TCP window size = send_rate × round_trip_time

Check with your network equipment vendor or service provider to know the send rate of your network. Alternatively, you can calculate the send rate with the existing (or default) TCP window sizes with one of the following methods:

  • Send data via FTP or by using the rcp command to the other host on the network and calculate data sent/time taken.
  • Use the test TCP (TTCP) tool.

We used the TTCP tool. First, run the tool on the receiving side to have a port waiting for data:

ttcp -r -s -p 16372
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=16372  tcp
ttcp-r: socket
ttcp-r: accept from 9.47.73.33
ttcp-r: 16777216 bytes in 7.93 real seconds = 2066.74 KB/sec +++
ttcp-r: 10609 I/O calls, msec/call = 0.77, calls/sec = 1338.26
ttcp-r: 0.0user 0.0sys 0:07real 0% 0i+0d 0maxrss 0+2pf 10608+1csw

Then run the tool on the sending side to send some data:

ttcp -t -s -p 16372 hadrStandbyHost.svl.ibm.com
ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=16372 
                     tcp -> hadrStandbyHost.svl.ibm.com
ttcp-t: socket
ttcp-t: connect
ttcp-t: 16777216 bytes in 7.92 real seconds = 2068.38 KB/sec +++
ttcp-t: 2048 I/O calls, msec/call = 3.96, calls/sec = 258.55
ttcp-t: 0.0user 0.0sys 0:07real 1% 0i+0d 0maxrss 0+3pf 526+0csw

Based on this test, the send rate in our setup is 2.02 MB/s.

To calculate the round-trip time, you can issue a ping command:

ping -c 10 hadrStandbyHost.svl
PING hadrStandbyHost.svl.ibm.com (9.30.4.113) 56(84) bytes of data.
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=1 ttl=51 time=26.0 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=2 ttl=51 time=25.8 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=3 ttl=51 time=26.8 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=4 ttl=51 time=34.1 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=5 ttl=51 time=26.0 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=6 ttl=51 time=26.5 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=7 ttl=51 time=27.3 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=8 ttl=51 time=28.4 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=9 ttl=51 time=29.1 ms
64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=10 ttl=51 time=26.5 ms
            
--- hadrStandbyHost.svl.ibm.com ping statistics ---
10 packets transmitted, 10 received, 0% packet loss, time 9024ms
rtt min/avg/max/mdev = 25.851/27.704/34.115/2.378 ms

In this scenario, choose the average value as 27.704 ms (0.02770 sec).

Based on the calculated send rate and round trip time, the minimum TCP/IP receive/send buffer window size as is follows:

TCP window size = send_rate × round_trip_time
                = 2.02      × 0.02770
                = 0.055 MB
                = 58672 bytes

If the system default is larger than the calculated value, then there is no need to provide the explicit buffer size or change any system settings. If the system default is smaller, then you might need to explicitly set the buffer size. Before setting the buffer size, however, confirm that your system allows this value as a buffer size. To do this on Linux, find the TCP receive and write memory values from your system configuration, namely the following three values:

  1. net.ipv4.tcp_rmem: TCP receive window
  2. net.ipv4.tcp_wmem: TCP send window
  3. net.ipv4.tcp_mem: Total TCP buffer space allocable

You can use the following command:

/sbin/sysctl -a | grep tcp
net.ipv4.tcp_rmem = 4096        87380   174760
net.ipv4.tcp_wmem = 4096        16384   131072
net.ipv4.tcp_mem = 196608       262144  393216

The three values returned for each indicate the minimum, default, and maximum setting in bytes. 58,672 bytes is an allowed value for now. If the amount of memory needed is not within the allowed system limit, then you should modify the allowed limit. You can get the current settings of TCP/IP networking parameters from the OS. For Linux, based on the version you are running, determine what parameter controls the maximum settings and run /sbin/sysctl –a | grep net to get the current settings. For AIX, look at the sb_max and rfc1323 settings. You can get the current settings for the sb_max and rfc1323 variables by running the no –a command. When changing these variables, a system reboot might be necessary. Verify that no other applications running on the same host are adversely impacted by this change.

After you determine the TCP window size, increase its value: try doubling or tripling (or more) its value—until you no longer see an increase in throughput. Each time you increase the TCP window size, run a test to test the network throughput. You will notice that at some point, the throughput stops increasing even after TCP window size is increased. When the throughput stops increasing, the last value used for the TCP window size is good to get the best use of the network.

Choosing a value for -disk

This parameter specifies disk speed (transfer rate and overhead) using two values: data rate in MB/s and per I/O operation overhead in seconds.

Earlier, we tested disk write speed with the HADR Simulator tool for the file systems dedicated to the transactional log files on the primary (/work3/kkchinta) and on the standby (/perf5/kkchinta/). Here is a snippet of those results:

~/simhadr -write /work3/kkchinta -verbose -flushsize 1
            
Total 3581 writes in 4.000320 seconds, 0.001117 sec/write, 1 pages/write
            
Total 14.667776 MBytes written in 4.000320 seconds. 3.666651 MBytes/sec
            
------------
~/simhadr -write /work3/kkchinta -verbose -flushsize 4096
            
Total 13 writes in 4.109773 seconds, 0.316136 sec/write, 4096 pages/write
            
Total 218.103808 MBytes written in 4.109773 seconds. 53.069551 MBytes/sec

You can use these results to determine the data rate and per I/O operation overhead as follows:

  • The write time for the run with a 1-page flush size is the I/O operation overhead.
  • The MB/s amount for the run with a large flush size (in our case, 4096 pages) is the transfer rate.

On the primary, the value for the overhead is 0.001117 s and the transfer rate is 53.069551 MB/s, and on the standby the value for the overhead is 0.024283 and the transfer rate is 63.122727.

You can do a run with a 1-page flush size. The reported write time is an approximation of per-write overhead. Then do a run with a large flush size such as 500 or 1000. The reported MB/s is an approximation of write rate.

Alternatively, you can solve the following equation to determine the write rate and per-write overhead:

IO_time = data_amount × data_rate + per_IO_overhead

Table 3 lists all of the set values in place to describe the system to the HADR Simulator tool. The next step is to try out the different synchronization modes, tabulate the results of each test, and then compare the performance of the different modes.

Table 3. Set values for the HADR Simulator tool
HosthadrPrimaryHosthadrStandbyHost
Sync mode Flush size (4 K pages)3232
Overhead per write (seconds)0.001117.024283
Transfer rate (MB/s)53.06955163.122727
TCP/IP send buffer size (bytes)5867258672
TCP/IP receive buffer size (bytes)5867258672
HADR receive buffer size (4K Pages)128128
Throughput(MB/s) (primary sending/standby receiving)  
Percentage of network wait  

Throughput achieved in SYNC mode

Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk; instead, it uses the provided numbers from the -disk option to simulate log writes.

For this example scenario, issue the following command on the primary:

~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970
          -rhost hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode sync 
          -flushSize 32 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117

Run the HADR Simulator tool with SYNC mode only for the purposes of comparing the results. Given the long distance between the two sites, that is not a realistic setting.

The output from the tool is as follows:

Measured sleep overhead: 0.003727 second, using spin time 0.004472 second.
Simulation run time = 4 seconds
            
Resolving local host hadrPrimaryHost via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
alias: hadrPrimaryHost
address_type=2 address_length=4
address: 9.47.73.33
            
Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
address_type=2 address_length=4
address: 9.30.4.113
            
Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0
            
Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Binding socket to local address.
Listening on local host TCP port 53970  
---> [The output stops here until the simhadr tool is executed on the standby]

Connected.
            
Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Sending handshake message:
syncMode=SYNC
flushSize=32
connTime=2012-04-13_11:58:15_PDT
            
Sending log flushes. Press Ctrl-C to stop.
            
SYNC: Total 3014656 bytes in 4.126519 seconds, 0.730557 MBytes/sec
Total 23 flushes, 0.179414 sec/flush, 32 pages (131072 bytes)/flush
            
disk speed: 53.069551 MB/second, overhead: 0.001117 second/write
Total 3014656 bytes written in 0.082478 seconds. 36.551032 MBytes/sec
Total 23 write calls, 131.072 kBytes/write, 0.003586 sec/write
            
Total 3014656 bytes sent in 4.126519 seconds. 0.730557 MBytes/sec
Total 57 send calls, 52.888 KBytes/send,
Total 34 congestions, 0.014269 seconds, 0.000419 second/congestion
            
Total 1104 bytes recv in 4.126519 seconds. 0.000268 MBytes/sec
Total 23 recv calls, 0.048 KBytes/recv
            
Distribution of log write size (unit is byte):
Total 23 numbers, Sum 3014656, Min 131072, Max 131072, Avg 131072
Exactly     131072          23 numbers
            
Distribution of log shipping time (unit is microsecond):
Total 23 numbers, Sum 4043919, Min 139617, Max 267547, Avg 175822
From 131072 to 262143             22 numbers
From 262144 to 524287              1 numbers
            
Distribution of congestion duration (unit is microsecond):
Total 34 numbers, Sum 14269, Min 206, Max 893, Avg 419
From 128 to 255                    7 numbers
From 256 to 511                   24 numbers
From 512 to 1023                   3 numbers
            
Distribution of send size (unit is byte):
Total 57 numbers, Sum 3014656, Min 7992, Max 79640, Avg 52888
From 4096 to 8191                  1 numbers
From 8192 to 16383                 9 numbers
From 16384 to 32767                2 numbers
From 32768 to 65535               22 numbers
From 65536 to 131071              23 numbers
            
Distribution of recv size (unit is byte):
Total 23 numbers, Sum 1104, Min 48, Max 48, Avg 48
Exactly         48          23 numbers

Then, issue the following command on the standby:

~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245
          -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239
          -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283
+ simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245
          -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239
          -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283

The output from the tool is as follows:

Measured sleep overhead: 0.003931 second, using spin time 0.004717 second.
            
Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
alias: hadrStandbyHost
address_type=2 address_length=4
address: 9.30.4.113
            
Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
address_type=2 address_length=4
address: 9.47.73.33
            
Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0
            
Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Connecting to remote host TCP port 28239

Connected.
            
Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0

Received handshake message:
syncMode=SYNC
flushSize=32
connTime=2012-04-13_11:58:15_PDT

Standby receive buffer size 128 pages (524288 bytes)
Receiving log flushes. Press Ctrl-C on primary to stop.
Zero byte received. Remote end closed connection.
            
SYNC: Total 3014656 bytes in 4.118283 seconds, 0.732018 MBytes/sec
Total 23 flushes, 0.179056 sec/flush, 32 pages (131072 bytes)/flush

disk speed: 63.122727 MB/second, overhead: 0.024283 second/write
Total 3014656 bytes written in 2.743122 seconds. 1.098987 MBytes/sec
Total 111 write calls, 27.159 kBytes/write, 0.024713 sec/write
            
Total 1104 bytes sent in 4.118283 seconds. 0.000268 MBytes/sec
Total 23 send calls, 0.048 KBytes/send,
Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion
            
Total 3014656 bytes recv in 4.118283 seconds. 0.732018 MBytes/sec
Total 111 recv calls, 27.159 KBytes/recv
            
Distribution of log write size (unit is byte):
Total 111 numbers, Sum 3014656, Min 4096, Max 65536, Avg 27159
Exactly       4096           2 numbers
Exactly       8192           1 numbers
Exactly      16384          58 numbers
Exactly      32768          30 numbers
Exactly      49152          15 numbers
Exactly      65536           5 numbers

Distribution of send size (unit is byte):
Total 23 numbers, Sum 1104, Min 48, Max 48, Avg 48
Exactly         48          23 numbers

Distribution of recv size (unit is byte):
Total 111 numbers, Sum 3014656, Min 1024, Max 65536, Avg 27159
Exactly       4344           1 numbers
Exactly       8688           1 numbers
Exactly      16384          57 numbers
Exactly      32768          30 numbers
Exactly      18712           1 numbers
Exactly       1024           1 numbers
Exactly      49152          15 numbers
Exactly      65536           5 numbers

After the test is complete, add the results to the table.

In Table 4, the last row Percentage of network wait is calculated the following way:

(time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time)

For our primary, it is (0.014269 / 4.126519) and for the standby, it is 0.

Table 4. Set values for the HADR Simulator tool
HosthadrPrimaryHosthadrStandbyHost
Sync modeSYNCSYNC
Sync mode Flush size (4 K pages)3232
Overhead per write (seconds)0.001117.024283
Transfer rate (MB/s)53.06955163.122727
TCP/IP send buffer size (bytes)5867258672
TCP/IP receive buffer size (bytes)5867258672
HADR receive buffer size (4K Pages)128128
Throughput(MB/s) (primary sending/standby receiving)0.7305570.732018
Percentage of network waitYES (0.3%)NO

Throughput achieved in NEARSYNC mode

Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk; instead, it uses the provided numbers from the –disk option to simulate log write.

For our example system, issue the following command on the primary:

~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970 -rhost
      hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode nearsync -flushSize 32
      -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117

The output from the tool is as follows:

Measured sleep overhead: 0.003609 second, using spin time 0.004330 second.
Simulation run time = 4 seconds
            
Resolving local host hadrPrimaryHost via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
alias: hadrPrimaryHost
address_type=2 address_length=4
address: 9.47.73.33
            
Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
address_type=2 address_length=4
address: 9.30.4.113
            
Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0
            
Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Binding socket to local address.
Listening on local host TCP port 53970
--> [The output stops here until the simhadr tool is executed on the standby]
         
Connected.
            
Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Sending handshake message:
syncMode=NEARSYNC
flushSize=32
connTime=2012-04-13_11:59:02_PDT
            
Sending log flushes. Press Ctrl-C to stop.
            
NEARSYNC: Total 3801088 bytes in 4.099373 seconds, 0.927236 MBytes/sec
Total 29 flushes, 0.141358 sec/flush, 32 pages (131072 bytes)/flush
            
disk speed: 53.069551 MB/second, overhead: 0.001117 second/write
Total 3801088 bytes written in 0.103994 seconds. 36.551032 MBytes/sec
Total 29 write calls, 131.072 kBytes/write, 0.003586 sec/write
            
Total 3801088 bytes sent in 4.099373 seconds. 0.927236 MBytes/sec
Total 80 send calls, 47.513 KBytes/send,
Total 51 congestions, 0.018008 seconds, 0.000353 second/congestion
            
Total 1392 bytes recv in 4.099373 seconds. 0.000340 MBytes/sec
Total 29 recv calls, 0.048 KBytes/recv
            
Distribution of log write size (unit is byte):
Total 29 numbers, Sum 3801088, Min 131072, Max 131072, Avg 131072
Exactly     131072          29 numbers
            
Distribution of log shipping time (unit is microsecond):
Total 29 numbers, Sum 4099263, Min 92349, Max 288847, Avg 141353
From 65536 to 131071               9 numbers
From 131072 to 262143             19 numbers
From 262144 to 524287              1 numbers
            
Distribution of congestion duration (unit is microsecond):
Total 51 numbers, Sum 18008, Min 189, Max 660, Avg 353
From 128 to 255                   18 numbers
From 256 to 511                   31 numbers
From 512 to 1023                   2 numbers
            
Distribution of send size (unit is byte):
Total 80 numbers, Sum 3801088, Min 752, Max 79640, Avg 47513
From 512 to 1023                   1 numbers
From 8192 to 16383                19 numbers
From 16384 to 32767                3 numbers
From 32768 to 65535               28 numbers
From 65536 to 131071              29 numbers
            
Distribution of recv size (unit is byte):
Total 29 numbers, Sum 1392, Min 48, Max 48, Avg 48
Exactly         48          29 numbers

Then issue the following command on the standby:

~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost 	
      hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672
         -sockRcvBuf 58672 -disk 63.122727 0.024283
+ simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost 	
      hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672
         -sockRcvBuf 58672 -disk 63.122727 0.024283

The output from the tool is as follows:

Measured sleep overhead: 0.003686 second, using spin time 0.004423 second.
            
Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
alias: hadrStandbyHost
address_type=2 address_length=4
address: 9.30.4.113
            
Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
address_type=2 address_length=4
address: 9.47.73.33
            
Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0
            
Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Connecting to remote host TCP port 28239
            
Connected.
            
Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Received handshake message:
syncMode=NEARSYNC
flushSize=32
connTime=2012-04-13_11:59:02_PDT
            
Standby receive buffer size 128 pages (524288 bytes)
Receiving log flushes. Press Ctrl-C on primary to stop.
Zero byte received. Remote end closed connection.
            
NEARSYNC: Total 3801088 bytes in 4.124563 seconds, 0.921574 MBytes/sec
Total 29 flushes, 0.142226 sec/flush, 32 pages (131072 bytes)/flush
            
disk speed: 63.122727 MB/second, overhead: 0.024283 second/write
Total 3801088 bytes written in 0.764411 seconds. 4.972571 MBytes/sec
Total 29 write calls, 131.072 kBytes/write, 0.026359 sec/write
            
Total 1392 bytes sent in 4.124563 seconds. 0.000337 MBytes/sec
Total 29 send calls, 0.048 KBytes/send,
Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion
            
Total 3801088 bytes recv in 4.124563 seconds. 0.921574 MBytes/sec
Total 175 recv calls, 21.720 KBytes/recv
            
Distribution of log write size (unit is byte):
Total 29 numbers, Sum 3801088, Min 131072, Max 131072, Avg 131072
Exactly     131072          29 numbers
            
Distribution of send size (unit is byte):
Total 29 numbers, Sum 1392, Min 48, Max 48, Avg 48
Exactly         48          29 numbers
            
Distribution of recv size (unit is byte):
Total 175 numbers, Sum 3801088, Min 1024, Max 65536, Avg 21720
From 1024 to 2047                  2 numbers
From 2048 to 4095                  2 numbers
From 4096 to 8191                  4 numbers
From 8192 to 16383                 4 numbers
From 16384 to 32767              135 numbers
From 32768 to 65535               14 numbers
From 65536 to 131071              14 numbers

After the test is complete, add the results to the table.

In Table 5, the last row Percentage of network wait is calculated the following way:

(time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time)

For our primary, it is (0.018008/4.099373) and for the standby, it is 0.

Table 5. Results for NEARSYNC mode
HosthadrPrimaryHosthadrStandbyHosthadrPrimaryHosthadrStandbyHost
Sync modeSYNCSYNCNEARSYNCNEARSYNC
Flush size (4 K pages)32323232
Overhead per write (seconds)0.001117.0242830.001117.024283
Transfer rate (MB/s)53.06955163.12272753.06955163.122727
TCP/IP send buffer size (bytes)58672586725867258672
TCP/IP receive buffer size (bytes)58672586725867258672
HADR receive buffer size (4K Pages)128128128128
Throughput (Mbytes/sec) (Primary sending/Standby receiving)0.7305570.7320180.9272360.921574
Percentage of network waitYES (0.3%)NOYES (0.4%)NO

Throughput achieved in ASYNC mode

Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk instead it uses the provided numbers from the –disk option to simulate log write.

For our example system, issue the following command on the primary:

~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970
          -rhost hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode async 
          -flushSize 32 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117

The output from the tool is as follows:

Measured sleep overhead: 0.003709 second, using spin time 0.004450 second.
Simulation run time = 4 seconds
            
Resolving local host hadrPrimaryHost via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
alias: hadrPrimaryHost
address_type=2 address_length=4
address: 9.47.73.33
            
Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
address_type=2 address_length=4
address: 9.30.4.113
            
Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0
            
Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Binding socket to local address.
Listening on local host TCP port 53970  
--> [The output stops here until the simhadr tool is executed on the standby]
            
Connected.
            
Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0
            
Sending handshake message:
syncMode=ASYNC
flushSize=32
connTime=2012-04-13_12:00:10_PDT
            
Sending log flushes. Press Ctrl-C to stop.
            
ASYNC: Total 8781824 bytes in 4.068864 seconds, 2.158299 MBytes/sec
Total 67 flushes, 0.060729 sec/flush, 32 pages (131072 bytes)/flush
            
disk speed: 53.069551 MB/second, overhead: 0.001117 second/write
Total 8781824 bytes written in 0.240262 seconds. 36.551032 MBytes/sec
Total 67 write calls, 131.072 kBytes/write, 0.003586 sec/write
            
Total 8781824 bytes sent in 4.068864 seconds. 2.158299 MBytes/sec
Total 292 send calls, 30.074 KBytes/send, 
Total 225 congestions, 4.058363 seconds, 0.018037 second/congestion
            
Total 0 bytes recv in 4.068864 seconds. 0.000000 MBytes/sec
Total 0 recv calls, 0.000 KBytes/recv
            
Distribution of log write size (unit is byte):
Total 67 numbers, Sum 8781824, Min 131072, Max 131072, Avg 131072
Exactly     131072          67 numbers

Distribution of log shipping time (unit is microsecond):
Total 67 numbers, Sum 4063639, Min 878, Max 186689, Avg 60651
From 512 to 1023                   1 numbers
From 1024 to 2047                  1 numbers
From 32768 to 65535               26 numbers
From 65536 to 131071              38 numbers
From 131072 to 262143              1 numbers

Distribution of congestion duration (unit is microsecond):
Total 225 numbers, Sum 4058363, Min 282, Max 104746, Avg 18037
From 256 to 511                    8 numbers
From 512 to 1023                 102 numbers
From 1024 to 2047                  6 numbers
From 4096 to 8191                  3 numbers
From 8192 to 16383                 1 numbers
From 32768 to 65535              104 numbers
From 65536 to 131071               1 numbers

Distribution of send size (unit is byte):
Total 292 numbers, Sum 8781824, Min 816, Max 79640, Avg 30074
From 512 to 1023                   1 numbers
From 1024 to 2047                  1 numbers
From 2048 to 4095                  4 numbers
From 4096 to 8191                 21 numbers
From 8192 to 16383                27 numbers
From 16384 to 32767               86 numbers
From 32768 to 65535              150 numbers
From 65536 to 131071               2 numbers

Then issue the following command on the standby:

~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 
          -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 
          -sockRcvBuf 58672 -disk 63.122727 0.024283

The output from the tool is as follows:

Measured sleep overhead: 0.003974 second, using spin time 0.004768 second.

Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname()
hostname=hadrStandbyHost.svl.ibm.com
alias: hadrStandbyHost
address_type=2 address_length=4
address: 9.30.4.113

Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname()
hostname=hadrPrimaryHost.beaverton.ibm.com
address_type=2 address_length=4
address: 9.47.73.33

Socket property upon creation
BlockingIO=true
NAGLE=true
SO_SNDBUF=16384
SO_RCVBUF=87380
SO_LINGER: onoff=0, length=0

Calling setsockopt(SO_SNDBUF)
Calling setsockopt(SO_RCVBUF)
Socket property upon buffer resizing
BlockingIO=true
NAGLE=true
SO_SNDBUF=117344
SO_RCVBUF=117344
SO_LINGER: onoff=0, length=0

Connecting to remote host TCP port 28239

Connected.

Calling fcntl(O_NONBLOCK)
Calling setsockopt(TCP_NODELAY)
Socket property upon connection
BlockingIO=false
NAGLE=false
SO_SNDBUF=117344
SO_RCVBUF=117344

SO_LINGER: onoff=0, length=0

Received handshake message:
syncMode=ASYNC
flushSize=32
connTime=2012-04-13_12:00:10_PDT

Standby receive buffer size 128 pages (524288 bytes)
Receiving log flushes. Press Ctrl-C on primary to stop.
Zero byte received. Remote end closed connection.

ASYNC: Total 8781824 bytes in 4.187657 seconds, 2.097073 MBytes/sec
Total 67 flushes, 0.062502 sec/flush, 32 pages (131072 bytes)/flush

disk speed: 63.122727 MB/second, overhead: 0.024283 second/write
Total 8781824 bytes written in 1.766053 seconds. 4.972571 MBytes/sec
Total 67 write calls, 131.072 kBytes/write, 0.026359 sec/write

Total 0 bytes sent in 4.187657 seconds. 0.000000 MBytes/sec
Total 0 send calls, 0.000 KBytes/send, 
Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion

Total 8781824 bytes recv in 4.187657 seconds. 2.097073 MBytes/sec
Total 429 recv calls, 20.470 KBytes/recv

Distribution of log write size (unit is byte):
Total 67 numbers, Sum 8781824, Min 131072, Max 131072, Avg 131072
Exactly     131072          67 numbers

Distribution of recv size (unit is byte):
Total 429 numbers, Sum 8781824, Min 2328, Max 65536, Avg 20470
From 2048 to 4095                  2 numbers
From 4096 to 8191                  1 numbers
From 8192 to 16383                 2 numbers
From 16384 to 32767              365 numbers
From 32768 to 65535               41 numbers
From 65536 to 131071              18 numbers

After the test is complete, add the results to the table.

In Table 6, the last row Percentage of network wait is calculated the following way:

(time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time)

For our primary, it is (4.058363/4.068864) and for the standby, it is 0.

Table 6. Results for ASYNC mode
HosthadrPrimaryHosthadrStandbyHosthadrPrimaryHosthadrStandbyHosthadrPrimaryHosthadrStandbyHost
Sync modeSYNCSYNCNEARSYNCNEARSYNC
Flush size (4 K pages)323232323232
Overhead per write (seconds0.001117.0242830.001117.0242830.001117.024283
Transfer rate (MB/s)53.06955163.12272753.06955163.12272753.06955163.122727
TCP/IP send buffer size (bytes)586725867258672586725867258672
TCP/IP receive buffer size (bytes)586725867258672586725867258672
HADR receive buffer size (4K Pages)128128128128128128
Throughput (MB/s) (Primary sending/Standby receiving)0.7305570.0002680.9272360.9215742.1582992.097073
Percentage of network waitYES (0.3%)NOYES (0.4%)NOYES (99.7%)NO

Analysis of results from synchronization mode tests

We achieved the highest throughput (2.158299 MB/s) in ASYNC mode. As you can see in the Percentage of network wait row of Table 6, we experienced congestion in all three modes that were tested.

We did not test SUPERASYNC mode. This is identical to RCU (remote catchup) with a flush size of 16. The results should be close to ASYNC because the primary does not wait for an acknowledgement from the standby.

The network being congested for a small period of time at peak workload might be normal. In SYNC and NEARSYNC mode, the primary waits for an acknowledgment from standby, so the primary is throttled. In ASYNC mode, the primary is not throttled because it does not wait for an acknowledgement from the standby. As soon as the send call to the TCP buffer is acknowledged, the primary is ready to send more and more as transactions are being processed. When log write is faster than network, you might see congestion in ASYNC mode.

In the next section, we demonstrate how to tune to address the network from being congested.

Tuning the configuration to address congestion

The next thing to do is set up HADR based on the results of the HADR Simulator tool. Then, you do a base run with a real production workload so that you can monitor specific aspects of the performance and then make the appropriate adjustments.

HADR configurations

Set the following configuration parameters and registry variables according to your testing:

  • logfilsiz: For more information and recommended settings, click here.

    We use 76800 (300MB) for our scenario.

  • logbufsz: For more information and recommended settings, click here.

    We use 2048 (8MB) for our scenario.

  • hadr_syncmode: For more information and recommended settings, click here.

    For our scenario, ASYNC is used because it provided us with the best throughput.

  • DB2_HADR_BUF_SIZE: For more information and recommended settings, click here.

    We use 4096 (2 × logbufsz) for our scenario.

  • HADR_PEER_WINDOW: For more information and recommended settings, click here.

    For our scenario, this variable is ignored because we are using ASYNC synchronization mode.

  • DB2_HADR_PEER_WAIT_LIMIT: Use this variable as necessary. For more information and recommended settings, click here.

    For our scenario, this is not set.

  • DB2_HADR_SORCVBUF and DB2_HADR_SOSNDBUF: For more information and recommended settings, click here.

    We use 58672 for our scenario.

Setting up HADR

  1. Set up HADR with the standard HADR-specific configuration parameters as well as the settings discussed in the preceding section. Our setup is as follows:
    1. On the primary:
      db2 restore db raki from /u/kkchinta/info/rakibackup/ 
      on /u/kkchinta/kkchinta DBPATH on
      /u/kkchinta/kkchinta NEWLOGPATH /work3/kkchinta without rolling forward
                              
      db2 "update db cfg for raki using 
                           HADR_LOCAL_HOST hadrPrimaryHost.beaverton.ibm.com
      HADR_REMOTE_HOST hadrStandbyHost.svl.ibm.com 
                           HADR_LOCAL_SVC 53970 HADR_REMOTE_SVC 28245
      HADR_REMOTE_INST kkchinta HADR_TIMEOUT 120 
                           HADR_SYNCMODE ASYNC LOGARCHMETH1
      DISK:/work4/kkchinta LOGINDEXBUILD ON LOGFILSIZ 76800 LOGBUFSZ 2048"
                              
      db2set
      DB2_HADR_SORCVBUF=58672
      DB2_HADR_SOSNDBUF=58672
      DB2_HADR_BUF_SIZE=4096
      DB2COMM=TCPIP
    2. On the standby:
      db2 restore db raki from /nfshome/kkchinta/rakibackup/ 
      on /home/kkchinta/kkchinta DBPATH
      on /home/kkchinta/kkchinta NEWLOGPATH /perf5/kkchinta/
                              
      db2 "update db cfg for raki using 
                                 HADR_LOCAL_HOST hadrStandbyHost.svl.ibm.com
      HADR_REMOTE_HOST hadrPrimaryHost.beaverton.ibm.com
                                 HADR_LOCAL_SVC 28245 HADR_REMOTE_SVC 
      28239 HADR_REMOTE_INST kkchinta HADR_TIMEOUT 120 
                                 HADR_SYNCMODE ASYNC LOGARCHMETH1
      DISK:/work1/kkchinta LOGINDEXBUILD ON LOGFILSIZ 76800 LOGBUFSZ 2048"
                              
      db2set
      DB2_HADR_SORCVBUF=58672
      DB2_HADR_SOSNDBUF=58672
      DB2_HADR_BUF_SIZE=4096
      DB2COMM=TCPIP
  2. Start HADR on both the primary and standby, and issue the db2pd command with the -hadr option to ensure they enter peer state:

    Important: The format of the db2pd command with the -hadr option output is different in releases 10.1 and later.

    db2pd -db raki -hadr
                      
    Database Partition 0 -- Database RAKI -- Active -- Up 0 days 00:22:37 
    -- Date 2012-04-13-14.45.17.920125
                      
    HADR Information:
    Role    State           SyncMode   HeartBeatsMissed   LogGapRunAvg (bytes)
    Primary Peer            Async      0                  0                   
                      
    ConnectStatus ConnectTime                           Timeout   
    Connected     Fri Apr 13 14:35:33 2012 (1334352933) 120       
                      
    LocalHost                                LocalService      
    hadrPrimaryHost.beaverton.ibm.com               53970             
                      
    RemoteHost                               RemoteService      RemoteInstance    
    hadrStandbyHost.svl.ibm.com                       28245           kkchinta          
                      
    PrimaryFile  PrimaryPg  PrimaryLSN        
    S0000000.LOG 1          0x000000000A329BF2
                      
    StandByFile  StandByPg  StandByLSN        
    S0000000.LOG 1          0x000000000A329BF2
                      
    ---
                      
    db2pd -db raki -hadr
                      
    Database Partition 0 -- Database RAKI -- Standby -- Up 0 days 00:09:47 
    -- Date 2012-04-13-14.45.19.109598
                      
    HADR Information:
    Role    State                SyncMode   HeartBeatsMissed   LogGapRunAvg (bytes)
    Standby Peer                 Async    0                  0                   
                      
    ConnectStatus ConnectTime                           Timeout   
    Connected     Fri Apr 13 14:35:33 2012 (1334352933) 120       
                      
    LocalHost                                LocalService      
    hadrStandbyHost.svl.ibm.com                       28245             
                      
    RemoteHost                               RemoteService      RemoteInstance    
    hadrPrimaryHost.beaverton.ibm.com                28239            kkchinta          
                      
    PrimaryFile  PrimaryPg  PrimaryLSN        
    S0000000.LOG 1          0x000000000A329BF2
                      
    StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
    S0000000.LOG 1          0x000000000A329BF2 0%

Running the workload and monitoring performance

Execute a real production workload and monitor it using the db2pd command with the -hadr option. Pay attention to the following fields:

  • State: This gives the current state of the database.
  • LogGapRunAvg: This gives the running average of the gap between the primary log sequence number (LSN) and the standby log LSN.
  • ConnectStatus (on the primary): This is where congestion is reported.
  • StandbyRcvBufUsed (on the standby): This is the percentage of standby log receiving buffer used.

You can use the following script:

for i in {1..15}; do echo 
"#################################################################" >> /tmp/kk_hadr; 
echo "Collecting stats $i" >> /tmp/kk_hadr; rsh hadrPrimaryHost  "/bin/bash -c
'~/sqllib/adm/db2pd -db raki -hadr'" >> /tmp/kk_hadr ; rsh hadrStandbyHost.svl  "/bin/bash
-c '~/sqllib/adm/db2pd -db raki -hadr'" >> /tmp/kk_hadr; sleep 5; done
            
Monitor the output to see if there is any congestion and if the standby’s receive memory 
is full. That was the case in our example, as we looked in the output file
~/perfpaper/db2pd.out:
            
egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd.out | grep -A5 Congested
Congested     Sun Apr 15 20:56:31 2012 (1334548591) 120       
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000001.LOG 72893      0x000000002EBE56D8 100%
--
Congested     Sun Apr 15 20:56:36 2012 (1334548596) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000001.LOG 73604      0x000000002EEAC37B 99% 
--
Congested     Sun Apr 15 20:56:43 2012 (1334548603) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000001.LOG 74385      0x000000002F1B9F8D 96% 
--
--
Congested     Sun Apr 15 20:57:00 2012 (1334548620) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000002.LOG 105        0x000000002FB918AC 100%
--
--
Congested     Sun Apr 15 20:57:10 2012 (1334548630) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000002.LOG 1619       0x000000003017B315 94% 
--
--
Congested     Sun Apr 15 20:57:23 2012 (1334548643) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000002.LOG 3840       0x0000000030A28AAF 96% 
--
--
Congested     Sun Apr 15 20:57:34 2012 (1334548654) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000002.LOG 5925       0x000000003124DBB4 94%

Increasing the buffer size

Try different settings for the HADR receive buffer to see if that addresses the congestion. The default is 2 times the primary’s setting for the logbufsz configuration parameter. To absorb the primary logging peak, a larger value is often needed. As you try different settings for the HADR buffer size, gather your results in a table as in the following example for our scenario:

Table 7. Results from initial run of workload
TestTest1
Synchronization modeASYNC
logfilsiz (4K)76800
logbufsz2048
DB2_HADR_BUF_SIZE4096
HADR_PEER_WINDOWIgnored in ASYNC
SOSNDBUF/SORCVBUF58672
Commit delay observedYES
CongestionYES

Increase DB2_HADR_BUF_SIZE to 8192 and restart the instance.

You can rerun the workload and capture the following relevant data:

egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd_2.out | grep -A5 Congested
Congested     Sun Apr 15 22:53:03 2012 (1334555583) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 22423      0x0000000047EBFC28 100%
--
Congested     Sun Apr 15 22:53:09 2012 (1334555589) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 23293      0x0000000048225D75 100%
--
Congested     Sun Apr 15 22:53:14 2012 (1334555594) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 24286      0x0000000048606FE7 100%
--
Congested     Sun Apr 15 22:53:20 2012 (1334555600) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 25196      0x00000000489945F5 100%
--
--
Congested     Sun Apr 15 22:53:56 2012 (1334555636) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 29613      0x0000000049AD5932 100%
--
--
Congested     Sun Apr 15 22:54:08 2012 (1334555648) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 30744      0x0000000049F40E75 100%
--
Congested     Sun Apr 15 22:54:13 2012 (1334555653) 120       
            
--
StandByFile  StandByPg  StandByLSN         StandByRcvBufUsed
S0000003.LOG 31350      0x000000004A19EC51 100%

You can see the output (and Table 8) that the standby receive buffer size is still not sufficient.

Table 8. Results from second run of workload
TestTest1Test2
Synchronization modeASYNCASYNC
logfilsiz (4K)7680076800
logbufsz20482048
DB2_HADR_BUF_SIZE40968192
HADR_PEER_WINDOWIgnored in ASYNCIgnored in ASYNC
SOSNDBUF/SORCVBUF5867258672
Commit delay observedYESYES
CongestionYESYES

Another thing to analyze is when the actual congestion occurs. As the output for our example shows, the congestion occurred when replaying log file S0000003.LOG. Take a look at the flush size by using the db2flushsize script, as described in the following steps:

  1. Find out where the transactional logs are stored, as in the following example:
    db2pd -db raki -dbcfg | egrep -i "Path to log files|LOGARCHMETH"
    Path to log files (memory)     /work3/kkchinta/                                 
    Path to log files (disk)       /work3/kkchinta/                                 
    LOGARCHMETH1 (memory)          DISK:/work4/kkchinta/
    LOGARCHMETH1 (disk)            DISK:/work4/kkchinta/
    LOGARCHMETH2 (memory)          OFF
    LOGARCHMETH2 (disk)            OFF
  2. Look where S0000003.LOG exists and run the db2flushsize script. In our example, the script returns:
    Total 24897 flushes. Average flush size 2.3 pages

You can also query the relevant monitor elements and get an estimate of flush size according to the following formula:

Number of I/O operations per second = LOG_WRITE_TIME_S.LOG_WRITE_TIME_NS / NUM_LOG_WRITE_IO

In our example, the snapshot would be:

Click to see code listing

There are few things you can do to address the cause of the congestion:

  • Check if there is a replay speed issue on the standby. First, use db2pd command with the -hadr option to determine the standby’s replay speed by checking the LogGapRunAvg or comparing the LSNs on the primary (PrimaryLSN) and standby (StandbyLSN). Next, determine the primary log generation rate, as described earlier in this section of this document. If the standby log replay is moving at a constant speed but the standby log replay cannot catch up to the primary, then increase the HADR receive buffer size.
  • Check if there is an I/O issue on disk on the standby. To do this, run the DB2 HADR Simulator with the -disk option. As explained earlier in this document, the -disk option calculates write speed. If the write is taking too long, then a possible cause is that the standby’s disk is not powerful enough.
  • Check the receive buffer percentage by looking at the StandByRcvBufUsed value in the output for the db2pd command with the -hadr option.

In our case, the primary is flushing the logs at a faster rate than the standby can replay the logs (the value of StandByRcvBufUsed is 100%), so set the standby receive buffer to a much larger value: 262 144 (which equals 1GB).

After rerunning the workload, you can see that there is no congestion reported:

egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd_2.out | grep -A5 Congested
            
No Congestion reported.
Table 9. Results from third run of workload
TestTest1Test2Test3
Synchronization modeASYNCASYNCASYNC
logfilsiz (4K)768007680076800
logbufsz204820482048
DB2_HADR_BUF_SIZE40968192262144
HADR_PEER_WINDOWIgnored in ASYNCIgnored in ASYNCIgnored in ASYNC
SOSNDBUF/SORCVBUF586725867258672
Commit delay observedYESYESNO
CongestionYESYESNO

Note: In the example system, ASYNC is chosen as our synchronization mode because it is observed to have a better throughput in this mode compared to the others. ASYNC does not guarantee data protection and so it might not meet the business SLA. In such situations, you can either use SYNC or NEARSYNC, but it is observed that there is lower throughput in using these sync modes. At times like this, you should consider providing better resources and tuning the current set of resources to address the problem at hand. If the theoretical network bandwidth is low, then try moving the HADR log shipping network to a network with a higher bandwidth. Sharing the log shipping network with other applications can hurt the log shipping throughput and can lead to high commit times for transactions on the primary database.

Tuning tips for a growing database or workload

After you perform the previous steps and obtain the right configuration, you should see good HADR system performance. That said, if your business grows or if you adopt new technology, this poses challenges for the HADR system. With time, it is common for data to accumulate, increasing the size of the database and the amount of log files generated. As a result, the configuration that you initially come up with might not perform as well. In general, the size of the database might not be that important to HADR. What is important to HADR is the type of workload and an increase in workload. When the workload increases, your database could be generating logs at a higher rate and cause your initial configuration to be unable to keep up. If you observe this kind of performance degradation, consider one of the troubleshooting tips in the following section or from one of the HADR best practices documents. Alternatively, you can rerun the whole exercise with the DB2 HADR Simulator and develop an updated configuration.

Troubleshooting common problems

Slow replay

Using the db2pd command with the -hadr option, check whether there is a high log gap (LogGapRunAvg ) and whether the HADR receive buffer (StandByRcvBufUsed ) is full. If there is a high log gap and the receive bugger is full, then the replay might be processing a large database-wide transaction like a reorganization, which would make it appear that replay is slow. Avoid running database-wide transactions during peak business hours and plan for maintenance activity to be run during idle or low-activity times. You can keep monitoring if the replay is making progress from the db2pd output. Network congestion can occur if the standby does not make progress over a period of time.

If the standby's log receive buffer (determined by the DB2_HADR_BUF_SIZE registry variable) fills up because of slow replay on the standby or a spike in transactions on the primary, this can block new transactions from being performed on the primary, which cannot send anymore log data to the standby.

One way to avoid this is to use SUPERASYNC mode, which prevents back pressure on the primary because P never waits for S. However, you might not want to use SUPERASYNC because you want control over how far the standby can fall behind the primary (which, in turn, influences how long a graceful takeover takes to complete) or you do not want the potential for data loss if the primary fails.

Another alternative, introduced in version 10.1, is to use log spooling. For more related information, consult the HADR Multiple standbys white paper and the DB2 Information Center. Log spooling allows the standby to continue receiving log data, which is written to disk on the standby and replayed later, meaning that the standby can catch up when the primary's logging rate is lower. Log spooling is enabled by default starting in version 10.5. The advantage of this feature over using SUPERASYNC mode (although the two methods can be used in tandem) is that you have protection from data loss. Basically, you're choosing where to spool the yet-to-be-replayed log data: on the primary (SUPERASYNC) or on the standby (log spooling). Note that you should choose your spool limit setting carefully. A huge spool (for example, if you set it to unlimited, the spool can be as large as the disk space in the active log path) can lead to a long takeover time because the standby cannot become the primary until it has replayed all of the spooled data.

Consider revisiting the storage level design and database design to see whether the design is still holding good. Confirm that the table spaces and the transactional logs are not on the same file system. Make sure hot and dependent objects like tables and indexes are placed in different table spaces. On the standby, replay works in parallel and when these objects fall into different table spaces, there is less contention in that parallel replay, resulting in better parallelism. Also, if you are using reads on standby to read data from the standby database, using different table spaces for indexes and table data helps improve I/O efficiency. Finally, using a large extent size can be beneficial in cases where applications are performing load, import, and bulk insert operations, or issuing create index statements.

Primary hang

When a transaction on the primary database appears to hang, it could be hanging for reasons not related to HADR. If it is an HADR issue, a typical cause is network congestion when the database is using SYNC, NEARSYNC, or ASYNC synchronization modes. If that is the case, then this could be a side effect of a slow replay mentioned above, or because the workload is generating more logs than originally estimated. To understand this better, monitor the TCP/IP buffer usage and make sure that there are no issues at that layer. You can repeat the steps on calculating the flush size using the db2flushsize script, and getting an estimate of what the workload log generation is. Reconfigure the HADR system for a better workload transaction throughput.

Transaction delay on the primary

If you observe a transaction delay for a short period of time and this happens intermittently, then you might be running into resource contention. If the database is using SYNC mode, then a transaction can commit only after the transaction update is received to standby and written to disk on the standby. If the disk I/O on the standby system is not as good as on the primary system, the commits on the primary could be slowed down because the primary waits to hear that the standby finished writing the log to disk. Check the I/O statistics on the disks of both the primary and standby and compare them. You might even see this situation in NEARSYNC mode if there is bad disk I/O on the standby. Even though NEARSYNC mode does not require the log page to be written to disk on the standby before the transaction is committed, there can be high commit times because when the HADR standby thread is writing the pages to disk, it is unresponsive to new data that is sent by the primary.

If the HADR network is over an unstable WAN that is causing transaction delays on primary, consider using the DB2_HADR_PEER_WAIT_LIMIT registry variable to avoid the transaction delays. If the network issues are not fixed and you expect that they will be sustained for a long period of time, explore the option of using SUPERASYNC as your synchronization mode. This mode does not guarantee data protection, but it is very useful in helping you avoid transaction delays. Use this mode if you value data availability much more than data protection.

Application performance is dropped after a takeover

In environments where the application server is located much closer to the primary site than the standby site, you might see some drop in application performance after a takeover occurs. The performance drop is likely if the round-trip time (RTT) between the application server and the new primary server (the previous standby server) is much higher than the RTT between the application server and previous primary server. You can address this performance drop by using combinations of different optimizations. If a secondary application server exists close to the standby server, consider failing over to the secondary application server. Explore hardware and software network and protocol compression solutions. Some WAN optimization technologies can significantly improve data replication performance. You might also be able to tune your workload to optimize the data transferred between the client and server, by using DB2 stored procedures or compound SQL.

Conclusion

This exercise covers most of the basic configurations but is not exhaustive. We recommend the HADR configuration and tuning wiki page for details about several other tuning parameters and the uses of those parameters.

Acknowledgements

We would like to thank Yuke Zhuge and Roger Zheng for their technical contributions and Eric Koeck for his editorial contributions.


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值