We've probably all see a hung JVM at one time or another and chances are you've figured this out in one of two was if you're dealing with WebSphere Application Server: 1. the users are complaining that the browser just “spins” and never returns a web page, or 2. you've noticed output in the WebSphere logs (SystemOut.log) that indicate potentially hung threads. For the purposes of this discussion, we'll focus on the latter method.
WebSphere Application Server provides a feature that detects hung threads or that is to say, threads that have been active passed a certain time threshold and are suspected of being hung. Let's just stop here for a moment to clarify some terms and concepts:
- A hung thread is a thread that is being blocked by a blocking call or is waiting on a monitor (sync locked object) to be releases so that it can use it.
- WebSphere Application Server will output messages in the SystemOut log file with a message ID of WSVR0605W. This message simply indicates that a thread MAY be hung but, there is no way for WebSphere to make certain of this since it does not know the expected transaction length for the operations the thread is performing at the time. It's goal is simply to tell you about it so you can investigate it.
- The hung thread detection code will also notify you (through output in the SystemOut log file) that a previously reported hung thread actually completed its work. This message ID is WSVR0606W.
- Hang detection works only with WebSphere managed threads (e.g. thread pools) and does NOT monitor user created threads.
So let’s look at an example:
WSVR0605W: Thread “WebContainer : 1” has been active for 612,000 milliseconds and may be hung. There are 3 threads in total in the server that may be hung.
So the message above tells us that the thread named “ WebContainer : 1” has been doing something for 612 seconds or about 10 minutes and that there 3 other threads active in the JVM that my also hung (been active for longer than the threshold time).
An obvious question you may ask at this point is: “How long does a thread have to be active before the hang detection feature identifies the thread and tells me about it?”. And that answer is 10 minutes, by default. The good news is that the hang detection feature can be tuned a bit to better suit your needs. But before we go there, let's talk for a minute about what happens when the hang detection feature fires off a warning in the log.
First, as we’ve already seen, a log entry is output. Also, at the same time a JMX event is emitted from the server of the type TYPE
The hang detection feature also attempts to self-tune based on the number of hang warnings and subsequent clearing messages that it emits. It will attempt to adjust the trigger threshold (10 mins by default) to a higher or lower value in order to minimize the false positives seen in the logs. A message will be displayed in the log when the self-tuning occurs:
WSVR0607W: Too many thread hangs have been falsely reported. The hang threshold is now being set to thresholdtime.
OK, so now we know what to look for and how it gets there, let's look at how to tune the hang detection feature to match our needs. To set these values simply navigate to the application server instance you wish to configure click on Administration and then Custom Properties (all of this is in the Administration Console).
You can find a list of the properties that can be set in topic Configuring the hang detection policy in the product documentation.
Now, should you need to, you can configure the hang detection feature of WebSphere Application Server to meet your exacting specifications for detecting potentially hung threads in your JVM.
title image (modified) credit: (cc) Some rights reserved by netalloy