A Performance Optimization for C/C++ Systems That Employ Time-Stamping

A Performance Optimization for C/C++ Systems That Employ Time-Stamping

By Amjad Khan and Neelakanth Nadgir, November 23, 2004

This article describes how to optimize the performance of enterprise systems that employ extensive time-stamping using the time(2) system call in the Solaris Operating System. This optimization applies especially to the financial market, and is based on our work with a number of different independent software vendors (ISVs).

We have observed that the common practice of "time-stamping" messages, transactions, or other objects in a system can consume more resources than the developer might expect. In these systems, the time(2) system call is used to obtain the current time with which to stamp each message or object. (The time(2) system call returns the value of time in seconds since 00:00:00 UTC, January 1, 1970.)

With many -- often thousands, or tens of thousands -- of active objects in typical enterprise system, this can lead to an excessively high use of system CPU cycles. We have observed systems processing thousands of transactions or messages every second, each of which requires a time stamp every time it is acted upon. Such systems can end up calling time(2) several thousands of times per second, incurring a significant overhead in system resources.

Two ways are available to reduce time(2) system call overhead. The first is to use our proposed optimized time(2) replacement solution that uses the caching technique to reduce thetime(2) system call frequency. The second is to reduce the frequency of time(2) system calls in the application code. The suggested quick solution employs interposed libraries so there is no need to change the original application code.

As an example, we have taken a sample application that performs data distribution for analysis. The application handles thousands of messages every second. Each message is time stamped with the current time, using the time(2) system call. One way to find out the frequency of use of time(2), or any other system call, is to use the truss(1) command, a utility in the Solaris OS that traces system calls and signals. For example:

% truss -c -p pid

Here pid is the process ID for the sample application and the -c option is used to count traced system calls, faults, and signals (rather than displaying the trace line-by-line, which is the default behavior). A summary report is produced after the traced command terminates or when truss is interrupted by Ctrl C.

In Code Sample 1, we see an example truss output for the sample application process (whose pid was 1365). In this case, the truss command was terminated after a sufficiently long sample interval by a Ctrl-C.

Code Sample 1: truss Output Before Any Optimization
% truss -c -p 1365
^C
syscall         seconds      calls   errors
read              .639        18636   956 
time             8.376       785118
semop             .007          544   170
poll              .362        23378
writev            .627        32191
recv              .000           14
sendmsg           .031         1028
                ------       ------  ----
sys totals:     10.045       860909  1126
usr time:       39.000
elapsed:        84.980

The results show that 785,118 calls were made to time(2) in the sample time of 84.98 seconds. That is nearly 10,000 calls to time() every second. A large amount of system time (10.045 seconds) was devoted to servicing these calls.

Since the time(2) call has a one-second granularity, making this call several thousand times per second is certainly unnecessary. We can optimize the use of time(2) for the purposes of time stamping by implementing a local time() function which caches the current time, and only makes a system call when enough time has elapsed between calls. If insufficient time has elapsed since the last call to our local time function, we simply return the cached value. We can do this because we have, in the Solaris OS, access to another time function that is substantially faster thantime(2), which is gethrtime(3C). (See "Measuring Execution Time in POSIX Compliant Programs and UNIX" in References section.)

The book Inside Solaris, by Richard Mc Dougall and Jim Mauro, says the following about gethrtime(3C):

gethrtime(3C) is known as a fast trap system call. This means that an invocation of gethrtime(3C) does not incur the normal overhead of a typical system call. Rather, it generates a fast trap into the kernel, which reads the hardware TICK register value and returns. While many system calls may take microseconds to execute (non-I/O system calls, that is; I/O system calls will be throttled by the speed of the device they're reading or writing), gethrtime(3C) takes a few hundred nanoseconds on a 300 MHz UltraSPARC processor. It's about 1,000 times faster than a typical system call.

The source code for the shared library (libfasttime.so) is given below. In this module, the symbol for time(2) is interposed to execute the optimized, caching time() library function. Thus, code changes in the rest of the application are unnecessary. The new function obtains the current high-resolution time (in nanoseconds) using gethrtime(3C), and compares it to the (cached) value of when the function was last called. If the call was issued within a certain delta, in the code below defined to be 1 millisecond, the cached value is returned, and no time-consuming system call is made. Once sufficient time has elapsed between the original call to time() and the current one, the system call is made, the cached value is reset, and the process starts over.

To compile the time.c file to build a libfasttime.so library, use:

 % cc -G -Kpic -o libfasttime.so -xO3 -xarch=v8plus time.c

For a quick performance testing, this library can be preloaded for the purposes of linking with an application by setting the following (in bash):

 
LD_FLAGS_32=preload=/tmp/libfasttime.so

However, the preferred way is to link this libfasttime.so library during the build of your application.

Note: This library can also be compiled in 64-bit mode for 64-bit applications by using:

  % cc -G -Kpic -o libfasttime.so -xO3 -xarch=v9 time.c

The library also can be preloaded by setting the following (in bash):

 LD_FLAGS_64=preload=/tmp/libfasttime.so  

In Code Sample 2, we provide the source code for the time(2) wrapper.

Code Sample 2: Source Code for time(2) Wrapper (File time.c)
/*
 *
 * Copyright 2004 Sun Microsystems, Inc.
 * 4150 Network Circle, Santa Clara, CA 95054
 * All Rights Reserved.
 *
 * This software is the proprietary information of Sun Microsystems, Inc.
 * This code is provided by Sun "as is" and "with all faults." Sun 
 * makes no representations or warranties concerning the quality, safety 
 * or suitability of the code, either express or implied, including 
 * without limitation any implied warranties of merchantability, fitness 
 * for a particular purpose, or non-infringement. In no event will Sun 
 * be liable for any direct, indirect, punitive, special, incidental 
 * or consequential damages arising from the use of this code. By 
 * downloading or otherwise utilizing this codes, you agree that you 
 * have read, understood, and agreed to these terms.
 *
 */
#include 
         
         
          
          
#include 
          
          
           
           
#include 
           
           
            
            
#include 
            
            
             
             

/* to compile, use cc -G -Kpic -o libfasttime.so -xO3 -xarch=v8plus time.c */

/* time in nanoseconds to cache the time system call */
#define DELTA 1000000   /* 1 millisecond */

static time_t (*func) (time_t *);

time_t time(time_t *tloc)
{
        static time_t global = 0;
        static hrtime_t old = 0;

        hrtime_t new = gethrtime();
        if(new - old > DELTA ){
                global = func(tloc);
                old = new;
        }
        return global;
}

#pragma init (init_func)
void init_func()
{
        func = (time_t (*) (time_t *)) dlsym (RTLD_NEXT, "time");
        if (!func)
        {
               fprintf(stderr, "Error initializing library/n");
        }
}

            
            
           
           
          
          
         
         

Code Sample 3: truss Output After Linking With Optimized fasttime Library
% truss -c -p 1701
^C
syscall         seconds        calls   errors
read             1.205         36702   2766
time              .762         71953
semop             .006           541    169
poll              .672         44705
writev           1.204         59945
recv              .000            12
sendmsg           .003            84
                 ------        ------  ----
sys totals:      3.855         213942  2935
usr time:       62.183
elapsed:        84.700

These code samples show that the number of times time(2) was called decreased by 90 percent, and the system time was reduced by 60 percent. This improved the performance of the sample data distribution application overall. The sample application was able to provide noticeably more throughput per second compared to when it was running without the libfasttime.solibrary. Since sampling theory tells us that to completely capture a signal we need only sample at twice the rate of the highest frequency, DELTA in Code Sample 2 could be changed to 500 milliseconds with no change of behavior and with potentially even more time savings.

So if you have a system that makes extensive use of time stamping, or otherwise makes frequent calls to the time(2) function, try the optimization we have outlined here.

References

A Performance Optimization for C/C++ Systems That Employ Time-Stamping

By Amjad Khan and Neelakanth Nadgir, November 23, 2004

This article describes how to optimize the performance of enterprise systems that employ extensive time-stamping using the time(2) system call in the Solaris Operating System. This optimization applies especially to the financial market, and is based on our work with a number of different independent software vendors (ISVs).

We have observed that the common practice of "time-stamping" messages, transactions, or other objects in a system can consume more resources than the developer might expect. In these systems, the time(2) system call is used to obtain the current time with which to stamp each message or object. (The time(2) system call returns the value of time in seconds since 00:00:00 UTC, January 1, 1970.)

With many -- often thousands, or tens of thousands -- of active objects in typical enterprise system, this can lead to an excessively high use of system CPU cycles. We have observed systems processing thousands of transactions or messages every second, each of which requires a time stamp every time it is acted upon. Such systems can end up calling time(2) several thousands of times per second, incurring a significant overhead in system resources.

Two ways are available to reduce time(2) system call overhead. The first is to use our proposed optimized time(2) replacement solution that uses the caching technique to reduce thetime(2) system call frequency. The second is to reduce the frequency of time(2) system calls in the application code. The suggested quick solution employs interposed libraries so there is no need to change the original application code.

As an example, we have taken a sample application that performs data distribution for analysis. The application handles thousands of messages every second. Each message is time stamped with the current time, using the time(2) system call. One way to find out the frequency of use of time(2), or any other system call, is to use the truss(1) command, a utility in the Solaris OS that traces system calls and signals. For example:

% truss -c -p pid

Here pid is the process ID for the sample application and the -c option is used to count traced system calls, faults, and signals (rather than displaying the trace line-by-line, which is the default behavior). A summary report is produced after the traced command terminates or when truss is interrupted by Ctrl C.

In Code Sample 1, we see an example truss output for the sample application process (whose pid was 1365). In this case, the truss command was terminated after a sufficiently long sample interval by a Ctrl-C.

Code Sample 1: truss Output Before Any Optimization
% truss -c -p 1365
^C
syscall         seconds      calls   errors
read              .639        18636   956 
time             8.376       785118
semop             .007          544   170
poll              .362        23378
writev            .627        32191
recv              .000           14
sendmsg           .031         1028
                ------       ------  ----
sys totals:     10.045       860909  1126
usr time:       39.000
elapsed:        84.980

The results show that 785,118 calls were made to time(2) in the sample time of 84.98 seconds. That is nearly 10,000 calls to time() every second. A large amount of system time (10.045 seconds) was devoted to servicing these calls.

Since the time(2) call has a one-second granularity, making this call several thousand times per second is certainly unnecessary. We can optimize the use of time(2) for the purposes of time stamping by implementing a local time() function which caches the current time, and only makes a system call when enough time has elapsed between calls. If insufficient time has elapsed since the last call to our local time function, we simply return the cached value. We can do this because we have, in the Solaris OS, access to another time function that is substantially faster thantime(2), which is gethrtime(3C). (See "Measuring Execution Time in POSIX Compliant Programs and UNIX" in References section.)

The book Inside Solaris, by Richard Mc Dougall and Jim Mauro, says the following about gethrtime(3C):

gethrtime(3C) is known as a fast trap system call. This means that an invocation of gethrtime(3C) does not incur the normal overhead of a typical system call. Rather, it generates a fast trap into the kernel, which reads the hardware TICK register value and returns. While many system calls may take microseconds to execute (non-I/O system calls, that is; I/O system calls will be throttled by the speed of the device they're reading or writing), gethrtime(3C) takes a few hundred nanoseconds on a 300 MHz UltraSPARC processor. It's about 1,000 times faster than a typical system call.

The source code for the shared library (libfasttime.so) is given below. In this module, the symbol for time(2) is interposed to execute the optimized, caching time() library function. Thus, code changes in the rest of the application are unnecessary. The new function obtains the current high-resolution time (in nanoseconds) using gethrtime(3C), and compares it to the (cached) value of when the function was last called. If the call was issued within a certain delta, in the code below defined to be 1 millisecond, the cached value is returned, and no time-consuming system call is made. Once sufficient time has elapsed between the original call to time() and the current one, the system call is made, the cached value is reset, and the process starts over.

To compile the time.c file to build a libfasttime.so library, use:

 % cc -G -Kpic -o libfasttime.so -xO3 -xarch=v8plus time.c

For a quick performance testing, this library can be preloaded for the purposes of linking with an application by setting the following (in bash):

 
LD_FLAGS_32=preload=/tmp/libfasttime.so

However, the preferred way is to link this libfasttime.so library during the build of your application.

Note: This library can also be compiled in 64-bit mode for 64-bit applications by using:

  % cc -G -Kpic -o libfasttime.so -xO3 -xarch=v9 time.c

The library also can be preloaded by setting the following (in bash):

 LD_FLAGS_64=preload=/tmp/libfasttime.so  

In Code Sample 2, we provide the source code for the time(2) wrapper.

Code Sample 2: Source Code for time(2) Wrapper (File time.c)
/*
 *
 * Copyright 2004 Sun Microsystems, Inc.
 * 4150 Network Circle, Santa Clara, CA 95054
 * All Rights Reserved.
 *
 * This software is the proprietary information of Sun Microsystems, Inc.
 * This code is provided by Sun "as is" and "with all faults." Sun 
 * makes no representations or warranties concerning the quality, safety 
 * or suitability of the code, either express or implied, including 
 * without limitation any implied warranties of merchantability, fitness 
 * for a particular purpose, or non-infringement. In no event will Sun 
 * be liable for any direct, indirect, punitive, special, incidental 
 * or consequential damages arising from the use of this code. By 
 * downloading or otherwise utilizing this codes, you agree that you 
 * have read, understood, and agreed to these terms.
 *
 */
#include 
         
         
          
          
#include 
          
          
           
           
#include 
           
           
            
            
#include 
            
            
             
             

/* to compile, use cc -G -Kpic -o libfasttime.so -xO3 -xarch=v8plus time.c */

/* time in nanoseconds to cache the time system call */
#define DELTA 1000000   /* 1 millisecond */

static time_t (*func) (time_t *);

time_t time(time_t *tloc)
{
        static time_t global = 0;
        static hrtime_t old = 0;

        hrtime_t new = gethrtime();
        if(new - old > DELTA ){
                global = func(tloc);
                old = new;
        }
        return global;
}

#pragma init (init_func)
void init_func()
{
        func = (time_t (*) (time_t *)) dlsym (RTLD_NEXT, "time");
        if (!func)
        {
               fprintf(stderr, "Error initializing library/n");
        }
}

            
            
           
           
          
          
         
         

Code Sample 3: truss Output After Linking With Optimized fasttime Library
% truss -c -p 1701
^C
syscall         seconds        calls   errors
read             1.205         36702   2766
time              .762         71953
semop             .006           541    169
poll              .672         44705
writev           1.204         59945
recv              .000            12
sendmsg           .003            84
                 ------        ------  ----
sys totals:      3.855         213942  2935
usr time:       62.183
elapsed:        84.700

These code samples show that the number of times time(2) was called decreased by 90 percent, and the system time was reduced by 60 percent. This improved the performance of the sample data distribution application overall. The sample application was able to provide noticeably more throughput per second compared to when it was running without the libfasttime.solibrary. Since sampling theory tells us that to completely capture a signal we need only sample at twice the rate of the highest frequency, DELTA in Code Sample 2 could be changed to 500 milliseconds with no change of behavior and with potentially even more time savings.

So if you have a system that makes extensive use of time stamping, or otherwise makes frequent calls to the time(2) function, try the optimization we have outlined here.

References
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值