We experience several minutes lags in our server. Probably they are triggered by "stop the world" garbage collections. But we use concurrent mark and sweep GC (-XX:+UseConcMarkSweepG) so, I think, these pauses are triggered by memory fragmentation of old generation.
How can memory fragmentation of old generation be analyzed? Are there any tools for it?
Lags happen every hour. Most time they are about 20 sec, but sometimes - several minutes.
解决方案
Look at your Java documentation for the "java -X..." options for turning on GC logging. That will tell you whether you are collecting old or new generation, and how long the collections are taking.
A pause of "several minutes" sounds extraordinary. Are you sure that you aren't just running with a heap size that is too small, or on a machine with not enough physical memory?
If your heap too close to full,
the GC will be triggered again and
again, resulting in your server
spending most of its CPU time in the
GC. This will show up in the GC
logs.
If you use a large heap on a machine
with not enough physical memory,
a full GC is liable to cause
your machine to "thrash", spending
most of its time madly moving virtual
memory pages to and from disc. You
can observe this using system
monitoring tools; e.g. by watching
the console output from "vmstat 5" on
a typical UNIX/Linux system.
FOLLOWUP
Contrary to the OP's belief, turning on GC logging is unlikely to make a noticeable difference to performance.
The Understanding Concurrent Mark Sweep Garbage Collector Logs page on the Oracle site should be helpful in interpreting GC logs.
Finally, the OP's conclusion that this is a "fragmentation" problem is unlikely, and (IMO) unsupported by the snippets of evidence that he has provided. It is most likely something else.