In my last post, I wrote about how we handle I/O in the browser process to keep the main thread of Google Chrome free from hiccups. This time, I'll write about how we keep our sub-processes from interfering with the main ("browser") process.
As you may recall, Google Chrome is a multi-process application, with HTML rendering happening in separate processes we call the "renderers," and plugins running in separate "plugin" processes. Our priority is to always keep the browser process, and especially its main thread, running as smoothly as possible. If a plugin or renderer is interfering with the browser process, the user's interaction with all other tabs and plugins, as well as all the other features of Google Chrome, would also be interrupted. The user might even be prevented from terminating the offending sub-process, negating a key benefit of our multi-process architecture.
The first and most obvious approach is never to block while waiting for information from a renderer process, in case the renderer process happens to be busy or hung. And although the renderers may sometimes synchronously wait for the browser for some requests, there is not an easy way to express that the browser process should wait for the renderer on Windows. Unfortunately, this doesn't cover all cases.
The basic primitive of Microsoft Windows is the "window," which is much more general than just a top-level window with a title bar. Buttons, toolbars, and text controls are usually expressed as sub-windows of a floating top-level window. Windows in this hierarchy are not restricted to single processes, and early versions of Google Chrome used this feature to implement our cross-process rendering architecture. Each tab contained a sub-window owned by a renderer process. The renderer received input and painted into its child window just like any other. The browser and renderer processes each ran their own message processing for things like painting.
A problem arises for some types of Windows messages. The system will synchronously send them to all windows in a hierarchy, waiting for each window to process the message before sending it to parent or child windows. This introduces an implicit wait in the browser process on the renderer processes. If a renderer is hung and not responding to messages, the browser process will also hang as soon as one of these special messages is received. To solve this problem, we no longer allow the renderers to create any windows. Instead, the renderer paints the web page into an off-screen bitmap and sends it asynchronously to the browser process where it is copied to the screen.
Once we made this change, everything ran great. That is, until we implemented plugins. The NPAPI plugin standard that Google Chrome implements allows plugins to create sub-windows, and for compatibility, we can't avoid it. Sometimes a plugin may hang, or more commonly, block waiting on disk I/O. All the hard work we did to insulate the user interface from I/O latency is occasionally undone by our plugin architecture through this long chain of dependencies. To mitigate this problem, we periodically check plugins for responsiveness. If a plugin is unresponsive for too long, we know that the user-interface of Google Chrome might also be affected, and the user might not even be able to close the page that is hosting the plugin. To allow the user to regain control of the browser, we pop up a dialog that offers to terminate the plugin.
If you are doing something that saturates your hard drive (such as compiling Google Chrome), now you know one of the reasons why the interface may occasionally hang and give the "hung plugin" dialog box. Sometimes you may not even realize that a page has loaded plugins when you get this message. You can terminate the plugin immediately, but most of the time it also works to just wait longer for the plugin's I/O to complete.