7. Managing high loads on application servers

One of the roles often expected from a load balancer is to mitigate the load on
the servers during traffic peaks. More and more often, we see heavy frameworks
used to deliver flexible and evolutive web designs, at the cost of high loads
on the servers, or very low concurrency. Sometimes, response times are also
rather high. People developing web sites relying on such frameworks very often
look for a load balancer which is able to distribute the load in the most
evenly fashion and which will be nice with the servers.

There is a powerful feature in haproxy which achieves exactly this : request
queueing associated with concurrent connections limit.

Let's say you have an application server which supports at most 20 concurrent
requests. You have 3 servers, so you can accept up to 60 concurrent HTTP
connections, which often means 30 concurrent users in case of keep-alive (2
persistent connections per user).

Even if you disable keep-alive, if the server takes a long time to respond,
you still have a high risk of multiple users clicking at the same time and
having their requests unserved because of server saturation. To workaround
the problem, you increase the concurrent connection limit on the servers,
but their performance stalls under higher loads.

The solution is to limit the number of connections between the clients and the
servers. You set haproxy to limit the number of connections on a per-server
basis, and you let all the users you want connect to it. It will then fill all
the servers up to the configured connection limit, and will put the remaining
connections in a queue, waiting for a connection to be released on a server.

This ensures five essential principles :

  - all clients can be served whatever their number without crashing the
    servers, the only impact it that the response time can be delayed.

  - the servers can be used at full throttle without the risk of stalling,
    and fine tuning can lead to optimal performance.

  - response times can be reduced by making the servers work below the
    congestion point, effectively leading to shorter response times even
    under moderate loads.

  - no domino effect when a server goes down or starts up. Requests will be
    queued more or less, always respecting servers limits.

  - it's easy to achieve high performance even on memory-limited hardware.
    Indeed, heavy frameworks often consume huge amounts of RAM and not always
    all the CPU available. In case of wrong sizing, reducing the number of
    concurrent connections will protect against memory shortages while still
    ensuring optimal CPU usage.

Example :

Haproxy is installed in front of an application servers farm. It will limit
the concurrent connections to 4 per server (one thread per CPU), thus ensuring
very fast response times.
        |             |     |     |           _|_db
     +--+--+        +-+-+ +-+-+ +-+-+        (___)
     | LB1 |        | A | | B | | C |        (___)
     +-----+        +---+ +---+ +---+        (___)
     haproxy       3 application servers
                   with heavy frameworks

Config on haproxy (LB1) :
    listen appfarm
       mode http
       maxconn 10000
       option httpclose
       option forwardfor
       balance roundrobin
       cookie SERVERID insert indirect
       option httpchk HEAD /index.html HTTP/1.0
       server railsA cookie A maxconn 4 check
       server railsB cookie B maxconn 4 check
       server railsC cookie C maxconn 4 check
       contimeout 60000

Description :
The proxy listens on IP, port 80, and expects HTTP requests. It
can accept up to 10000 concurrent connections on this socket. It follows the
roundrobin algorithm to assign servers to connections as long as servers are
not saturated.

It allows up to 4 concurrent connections per server, and will queue the
requests above this value. The "contimeout" parameter is used to set the
maximum time a connection may take to establish on a server, but here it
is also used to set the maximum time a connection may stay unserved in the
queue (1 minute here).

If the servers can each process 4 requests in 10 ms on average, then at 3000
connections, response times will be delayed by at most :

   3000 / 3 servers / 4 conns * 10 ms = 2.5 seconds

Which is not that dramatic considering the huge number of users for such a low
number of servers.

When connection queues fill up and application servers are starving, response
times will grow and users might abort by clicking on the "Stop" button. It is
very undesirable to send aborted requests to servers, because they will eat
CPU cycles for nothing.

An option has been added to handle this specific case : "option abortonclose".
By specifying it, you tell haproxy that if an input channel is closed on the
client side AND the request is still waiting in the queue, then it is highly
likely that the user has stopped, so we remove the request from the queue
before it will get served.

Managing unfair response times

Sometimes, the application server will be very slow for some requests (eg:
login page) and faster for other requests. This may cause excessive queueing
of expectedly fast requests when all threads on the server are blocked on a
request to the database. Then the only solution is to increase the number of
concurrent connections, so that the server can handle a large average number
of slow connections with threads left to handle faster connections.

But as we have seen, increasing the number of connections on the servers can
be detrimental to performance (eg: Apache processes fighting for the accept()
lock). To improve this situation, the "minconn" parameter has been introduced.
When it is set, the maximum connection concurrency on the server will be bound
by this value, and the limit will increase with the number of clients waiting
in queue, till the clients connected to haproxy reach the proxy's maxconn, in
which case the connections per server will reach the server's maxconn. It means
that during low-to-medium loads, the minconn will be applied, and during surges
the maxconn will be applied. It ensures both optimal response times under
normal loads, and availability under very high loads.

Example :
    listen appfarm
       mode http
       maxconn 10000
       option httpclose
       option abortonclose
       option forwardfor
       balance roundrobin
       # The servers will get 4 concurrent connections under low
       # loads, and 12 when there will be 10000 clients.
       server railsA minconn 4 maxconn 12 check
       server railsB minconn 4 maxconn 12 check
       server railsC minconn 4 maxconn 12 check
       contimeout 60000