Graceful Restart in Golang
JUN 3RD, 2014 | COMMENTS
Update (Apr 2015): Florian von Bock has turned what is described in this article into a nice Go package called endless.
If you have a Golang HTTP service, chances are, you will need to restart it on occasion to upgrade the binary or change some configuration. And if you (like me) have been taking graceful restart for granted because the webserver took care of it, you may find this recipe very handy because with Golang you need to roll your own.
There are actually two problems that need to be solved here. First is the UNIX side of the graceful restart, i.e. the mechanism by which a process can restart itself without closing the listening socket. The second problem is ensuring that all in-progress requests are properly completed or timed-out.
Restarting without closing the socket
- Fork a new process which inherits the listening socket.
- The child performs initialization and starts accepting connections on the socket.
- Immediately after, child sends a signal to the parent causing the parent to stop accepting connecitons and terminate.
Forking a new process
There is more than one way to fork a process using the Golang lib, but for this particular case exec.Command is the way to go. This is because the Cmd struct this function returns has this ExtraFiles
member, which specifies open files (in addition to stdin/err/out) to be inherited by new process.
Here is what this looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | |
In the above code netListener
is a pointer to net.Listener listening for HTTP requests. The path
variable should contain the path to the new executable if you’re upgrading (which may be the same as the currently running one).
An important point in the above code is that netListener.File()
returns a dup(2) of the file descriptor. The duplicated file descriptor will not have the FD_CLOEXEC
flag set, which would cause the file to be closed in the child (not what we want).
You may come across examples that pass the inherited file descriptor number to the child via a command line argument, but the way ExtraFiles
is implemented makes it unnecessary. The documentation states that “If non-nil, entry i becomes file descriptor 3+i.” This means that in the above code snippet, the inherited file descriptor in the child will always be 3, thus no need to explicitely pass it.
Finally, args
array contains a -graceful
option: your program will need some way of informing the child that this is a part of a graceful restart and the child should re-use the socket rather than try opening a new one. Another way to do this might be via an environment variable.
Child initialization
Here is part of the program startup sequence
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | |
Signal parent to stop
At this point we’re ready to accept requests, but just before we do that, we need to tell our parent to stop accepting requests and exit, which could be something like this:
1 2 3 4 5 6 7 | |
In-progress requests completion/timeout
For this we will need to keep track of open connections with a sync.WaitGroup. We will need to increment the wait group on every accepted connection and decrement it on every connection close.
1 | |
At first glance, the Golang standard http package does not provide any hooks to take action on Accept() or Close(), but this is where the interface magic comes to the rescue. (Big thanks and credit to Jeff R. Allen for this post).
Here is an example of a listener which increments a wait group on every Accept(). First, we “subclass” net.Listener
(you’ll see why we need stop
and stopped
below):
1 2 3 4 5 | |
Next we “override” the Accept method. (Nevermind gracefulConn
for now, it will be introduced later).
1 2 3 4 5 6 7 8 9 10 11 | |
We also need a “constructor”:
1 2 3 4 5 6 7 8 9 | |
The reason the function above starts a goroutine is because this cannot be done in our Accept()
above since it will block on gl.Listener.Accept()
. The goroutine will unblock it by closing file descriptor.
Our Close()
method simply sends a nil
to the stop channel for the above goroutine to do the rest of the work.
1 2 3 4 5 6 7 | |
Finally, this little convenience method extracts the file descriptor from the net.TCPListener
.
1 2 3 4 5 | |
And, of course we also need a variant of a net.Conn
which decrements the wait group on Close()
:
1 2 3 4 5 6 7 8 | |
To start using the above graceful version of the Listener, all we need is to change the server.Serve(l)
line to:
1 2 | |
And there is one more thing. You should avoid hanging connections that the client has no intention of closing (or not this week). It is better to create your server as follows:
1 2 3 4 5 | |
Posted by Gregory Trubetskoy Jun 3rd, 2014