Boost socket performance on LinuxFour ways to speed up your network applications |
Level: Intermediate M. Tim Jones (mailto:mtj@mtjones.com?subject=Boost socket performance on Linux&cc=tomyoung@us.ibm.com ), Consultant Engineer, Emulex 17 Jan 2006 The Sockets API lets you develop client and server applications that can communicate across a local network or across the world via the Internet. Like any API, you can use the Sockets API in ways that promote high performance -- or inhibit it. This article explores four ways to use the Sockets API to squeeze the greatest performance out your application and to tune the GNU/Linux® environment to achieve the best results. Editor's note: we updated Tip 3 to correct an error in the calculation for Bandwidth Delay Product (BDP), spotted by an alert reader.<!-- start RESERVED FOR FUTURE USE INCLUDE FILES--> <script type="text/javascript"> <!-- if (document.referrer&&document.referrer!="") { // document.write(document.referrer); var q = document.referrer; var engine = q; var isG = engine.search(/google\.com/i); var searchTerms; //var searchTermsForDisplay; if (isG != -1) { var i = q.search(/q=/); var q2 = q.substring(i+2); var j = q2.search(/&/); j = (j == -1)?q2.length:j; searchTerms = q.substring(i+2,i+2+j); if (searchTerms.length != 0) { searchQuery(searchTerms); document.write(" <div id=\"contents\"></div> "); } } } //--> </script><!-- end RESERVED FOR FUTURE USE INCLUDE FILES--> When developing a sockets application, job number one is usually establishing reliability and meeting the necessary requirements. With the four tips in this article, you can design and develop your sockets application for best performance, right from the beginning. This article covers use of the Sockets API, a couple of socket options that provide enhanced performance, and GNU/Linux tuning. To develop applications with lively performance capabilities, follow these tips:
Tip 1. Minimize packet transmit latency When you communicate through a TCP socket, the data are chopped into blocks so that they fit within the TCP payload for the given connection. The size of TCP payload depends on several factors (such as the maximum packet size along the path), but these factors are known at connection initiation time. To achieve the best performance, the goal is to fill each packet as much as possible with the available data. When insufficient data exist to fill a payload (otherwise known as the maximum segment size , or MSS), TCP employs the Nagle algorithm to automatically concatenate small buffers into a single segment. Doing so increases the efficiency of the application and reduces overall network congestion by minimizing the number of small packets that are sent. John Nagle's algorithm works well to minimize small packets by concatenating them into larger ones, but sometimes you simply want the ability to send small packets. A simple example is the telnet application, which allows a user to interact with a remote system, typically through a shell. If the user were required to fill a segment with typed characters before the packet was sent, the experience would be less than desirable. Another example is the HTTP protocol. Commonly, a client browser makes a small request (an HTTP request message), resulting in a much larger response by the Web server (the Web page). The first thing you should consider is that the Nagle algorithm fulfills a need. Because the algorithm coalesces data to try to fill a complete TCP packet segment, it does introduce some latency. But it does this with the benefit of minimizing the number of packets sent on the wire, and so it minimizes congestion on the network. But in cases where you need to minimize that transmit latency, the Sockets API provides a solution. To disable the Nagle algorithm, you can set the Listing 1. Disabling the Nagle algorithm for a TCP socket
Bonus tip: Experimentation with Samba demonstrates that disabling the Nagle algorithm results in almost doubling the read performance when reading from a Samba drive on a Microsoft® Windows® server. Tip 2. Minimize system call overhead Whenever you read or write data to a socket, you're using a system call . This call (such as This process illustrates that the system call operates not just in the application and kernel domains but through many levels within each domain. The process is expensive, so the more calls you make, the more time you spend working through this call chain, and the less performance you get from your application. Because you can't avoid making these system calls, your only option is to minimize the number of times you do it. Fortunately, you have control over this process. When writing data to a socket, write all the data that you have available instead of performing multiple writes of the data. For reads, pass in the largest buffer that you can support since the kernel will try to fill the entire buffer if enough data exist (in addition to keeping TCP's advertised window open). In this way, you can minimize the number of calls you make and achieve better overall performance. The Tip 3. Adjust TCP windows for the Bandwidth Delay Product TCP depends on several factors for performance. Two of the most important are the link bandwidth (the rate at which packets can be transmitted on the network) and the round-trip time , or RTT (the delay between a segment being sent and its acknowledgment from the peer). These two values determine what is called the Bandwidth Delay Product (BDP). Given the link bandwidth rate and the RTT, you can calculate the BDP, but what does this do for you? It turns out that the BDP gives you an easy way to calculate the theoretical optimal TCP socket buffer sizes (which hold both the queued data awaiting transmission and queued data awaiting receipt by the application). If the buffer is too small, the TCP window cannot fully open, and this limits performance. If it's too large, precious memory resources can be wasted. If you set the buffer just right, you can fully utilize the available bandwidth. Let's look at an example:
If your application communicates over a 100Mbps local area network with a 50 ms RTT, the BDP is:
Note: I divide by 8 to convert from bits to bytes communicated. So, set your TCP window to the BDP, or 625KB. But the default window for TCP on Linux 2.6 is 110KB, which limits your bandwidth for the connection to 2.2MBps, as I've calculated here:
If instead you use the window size calculated above, you get a whopping 12.5MBps, as shown here:
That's quite a difference and will provide greater throughput for your socket. So you now know how to calculate the optimal socket buffer size for your socket. But how do you make this change? The Sockets API provides several socket options, two of which exist to change the socket send and receive buffer sizes. Listing 2 shows how to adjust the size of the socket send and receive buffers with the Note: Although the socket buffer size determines the size of the advertised TCP window, TCP also maintains a congestion window within the advertised window. Therefore, because of congestion, a given socket may never utilize the maximum advertised window. Listing 2. Manually setting the send and receive socket buffer sizes
Within the Linux 2.6 kernel, the window size for the send buffer is taken as defined by the user in the call, but the receive buffer is doubled automatically. You can verify the size of each buffer using the
As for window scaling, TCP originally supported a maximum 64KB window (16 bits were used to define the window size). With the inclusion of window scaling (per RFC 1323), you can use a 32-bit value to represent the size of the window. The TCP/IP stack provided in GNU/Linux supports this option (and many others). Bonus tip: The Linux kernel also includes the ability to auto-tune these socket buffers (see Tip 4. Dynamically tune the GNU/Linux TCP/IP stack A standard GNU/Linux distribution tries to optimize for a wide range of deployments. This means that the standard distribution might not be optimal for your environment. GNU/Linux provides a wide range of tunable kernel parameters that you can use to dynamically tailor the operating system for your specific use. Let's look at some of the more important options that affect sockets performance. The tunable kernel parameters exist within the Listing 3. Tuning: Enable IP forwarding within the TCP/IP stack
Table 1 is a list of several tunable parameters that can help you increase the performance of the Linux TCP/IP stack.
As with any tuning effort, the best approach is experimental in nature. Your application behavior, processor speed, and availability of memory all affect how these parameters will alter performance. In some cases, what you think should be beneficial can be detrimental (and vice versa). So, try an option and then check the result. In other words, trust but verify. Bonus tip: A word about persistent configuration. Note that if you reboot a GNU/Linux system, any tunable kernel parameters that you changed revert to their default. To make yours the default parameter, use the file GNU/Linux is attractive to me because of the number of tools that are available. The vast majority are command-line tools, but they are amazingly useful and intuitive. GNU/Linux provides several tools -- either natively or available as open source -- to debug networking applications, measure bandwidth/throughput, and check link utilization. Table 2 lists some of the most useful GNU/Linux tools along with their intended use. Table 3 lists useful tools that are not typically part of GNU/Linux distributions.
Experiment with these tips and techniques to increase the performance of your sockets applications, including reducing transmit latency by disabling the Nagle algorithm, increasing bandwidth utilization of a socket through buffer sizing, reducing system call overhead by minimizing the number of system calls, and tuning the Linux TCP/IP stack with tunable kernel parameters. Always consider the nature of your application when tuning. For example, is your application LAN-based or will it communicate over the Internet? If your application operates only within a LAN, increasing socket-buffer sizes may not yield much benefit, but enabling jumbo frames certainly will! Finally, always check the results of your tuning with a tool like |