内存泄漏检测工具_第2部分：泄漏检测和分析的工具和功能

最新推荐文章于 2024-08-13 06:00:00 发布

cuxiong8996

最新推荐文章于 2024-08-13 06:00:00 发布

阅读量1.9k

点赞数

文章标签： python java 大数据 jvm 编程语言

原文链接：https://www.ibm.com/developerworks/websphere/library/techarticles/0608_poddar/0608_poddar.html

版权

内存泄漏检测工具

企业应用程序中的内存泄漏会导致大量紧急情况。成本是分析所花费的时间和金钱，生产环境中昂贵的停机时间，压力以及对应用程序和框架失去信心的结合。

非代表性的测试环境，无效的工作负载识别以及不足的测试周期会导致无法通过测试过程检测到内存泄漏。公司通常无法或不愿意花费大量的时间和金钱来克服这些障碍。问题是教育，文化和财政问题之一。本文将不会尝试解决这些问题，而是将重点放在有助于解决这些问题的技术解决方案上。

本文是我们介绍性文章的第1部分：内存泄漏概述的后续文章。在第2部分中，我们将通过一些实际案例研究更详细地描述WebSphere Application Server V6.1中的内存泄漏分析和检测功能。本文介绍了WebSphere Application ServerV6.1中新引入的内存泄漏检测功能，以及称为Java内存转储诊断（MDD4J）的脱机内存泄漏分析工具。这两个功能的组合可用于确定在WebSphere Application Server中运行的Java和Java 2平台企业版（J2EE™）应用程序中内存泄漏的根本原因。

本文是为使用WebSphere Application Server上部署的应用程序的Java开发人员，应用程序服务器管理员和问题确定顾问编写的。

什么是Java中的内存泄漏？

当不再需要对象持有对对象的引用时，Java应用程序中就会发生内存泄漏。即使Java虚拟机（JVM）具有内置的垃圾收集机制，此问题也阻止了自动Java垃圾收集过程释放内存（请参阅参考资料）。相关主题），使程序员摆脱了任何明确的对象解除分配职责。这些内存泄漏问题表现为随着时间的推移Java堆使用量不断增加，而当堆完全用尽时最终会出现OutOfMemoryError。这种类型的内存泄漏称为Java堆内存泄漏。

碎片和本机内存泄漏

由于无法清除不再使用的本机系统资源（例如文件句柄，数据库连接工件等），Java中也可能发生内存泄漏。这种类型的内存泄漏称为本地内存泄漏。这些类型的内存泄漏表现为随着时间的推移，进程大小不断增加，而Java堆使用率却没有任何增加。

尽管Java堆内存泄漏和本机内存泄漏最终都显示为OutOfMemoryErrors，但并非所有OutOfMemoryErrors都是Java堆泄漏或本机内存泄漏引起的。由于Java垃圾回收过程无法在压缩期间为新对象释放任何连续的空闲内存块，因此Java堆碎片也可能导致OutOfMemoryErrors。在这种情况下，尽管存在大量的可用Java堆，但仍可能发生OutOfMemoryErrors。由于Java堆中存在固定或分配的对象，因此在IBM的SDKVersion 1.4.2或更早版本中可能会出现碎片问题。固定对象是由于JNI（Java本机接口）对这些对象的访问而无法在堆压缩期间移动的对象。定量对象是在堆压缩期间由于来自线程堆栈的引用而无法移动的对象。由于频繁分配大尺寸的对象（超过1 MB），碎片化问题通常还会加剧。

由于碎片问题或由于本机内存泄漏而导致的OutOfMemoryError错误不在本文讨论范围之内。通过观察一段时间内的Java堆使用情况，可以将本机内存泄漏和碎片问题与Java堆内存泄漏区分开来。 IBMTivoli®Performance Viewer和详细的GC输出（请参阅参考资料）可用于进行这种区分。 Java堆使用率的增加会导致完全耗尽，这表明存在Java堆内存泄漏，而对于本机内存泄漏和碎片问题，堆使用率不会随时间显着增加。对于本机内存泄漏，进程大小将增加，对于碎片问题，在发生OutOfMemoryError时将有大量可用堆。

Java应用程序中内存泄漏的常见原因

如上所述，Java中内存泄漏的常见根本原因（Java堆内存泄漏）是非故意的（由于程序逻辑错误）对象引用，这些对象引用在Java堆中占据了未使用的对象。在本节中，描述了导致Java堆内存泄漏的许多常见类型的程序逻辑错误。

无限缓存

一个非常简单的内存泄漏示例是一个java.util.Collection对象（例如，一个HashMap），该对象充当缓存，但是没有限制地增长。清单1显示了一个简单的Java程序，该程序演示了基本内存如何泄漏数据结构。

清单1.示例Java程序将String对象泄漏到静态HashSet容器对象中

public class MyClass {
  staticHashSet myContainer = new HashSet();
  HashSet myContainer = new HashSet();
  public void leak(int numObjects) {
    for (int i = 0; i < numObjects; ++i) {
      String leakingUnit = new String("this is leaking object: " + i);
      myContainer.add(leakingUnit);
    }
  }
  public static void main(String[] args) throws Exception {
    {
      MyClass myObj = new MyClass();
      myObj.leak(100000); // One hundred thousand
    } 
    System.gc();
  }

在清单1所示的Java程序中，有一个名为MyClass的类，该类通过myContainer的名称对HashSet进行了静态引用。在类的主要方法：MyClass中，有一个子作用域（粗体文本），在该作用域中实例化了类：MyClass的实例，并调用了其成员操作：泄漏。这导致在容器：myContainer中添加了十万个String对象。程序控件退出子范围后，将对MyClass对象的实例进行垃圾回收，因为在该子范围之外没有对MyClass对象的实例的引用。但是，MyClass类对象具有对名为myContainer的成员变量的静态引用。由于此静态引用，即使在MyClass对象的唯一实例被垃圾回收之后，myContainer HashSet仍继续保留在Java堆中，并且与HashSet一起，HashSet中的所有String对象也继续保留，从而保持了Java堆的重要部分，直到程序退出main方法为止。该程序演示了基本的内存泄漏操作，其中涉及到高速缓存对象的无限增长。大多数高速缓存都是使用Singleton模式实现的，该模式涉及对顶级Cache类的静态引用，如本示例所示。

未调用的侦听器方法

由于程序错误导致许多内存泄漏，导致无法调用清除方法。侦听器模式是Java程序中常用的模式，用于实现不再需要共享资源时清理共享资源的方法。例如，J2EE程序通常依赖HttpSessionListener接口及其sessionDestroyed回调方法来清除用户会话过期时存储在用户会话中的任何状态。有时，由于程序逻辑错误，负责调用侦听器的程序可能无法调用它，或者侦听器方法可能由于异常而无法完成，这甚至可能导致Java堆中未使用的程序状态仍然存在不再需要。

无限循环

由于程序错误，会发生一些内存泄漏，其中应用程序代码中的无限循环会分配新对象，并将新对象添加到可从程序循环范围外部访问的数据结构中。由于对共享的不同步数据结构的多线程访问有时会发生这种无限循环。这些类型的内存泄漏表现为快速增长的内存泄漏，如果详细的GC数据报告在很短的时间内可用堆空间急剧下降，则会导致OutOfMemoryError。对于这种类型的内存泄漏情况，重要的是分析在观察到空闲内存正在Swift下降的短时间内进行的堆转储。在标题为“ 案例研究3”和“ 案例研究4”的部分中讨论了在IBM支持机构中观察到的涉及无限循环的两个不同内存泄漏案例的分析结果。

尽管可以通过分析堆转储来识别内存泄漏数据结构，但要确定无限循环中的内存泄漏代码并非易事。卡在无限循环中的方法可以通过查看在观察到空闲内存Swift下降的过程中获取的线程转储中所有线程的线程堆栈来确定。 IBM SDK实现与堆转储一起生成Java核心文件。该文件包含所有活动线程的线程堆栈，可用于标识可能陷入无限循环的方法和线程。

会话对象过多

由于对支持最大用户负载所需的最大堆大小进行了不适当的配置，因此会发生许多OutOfMemoryError。一个简单的示例是使用内存中HttpSession对象存储用户会话信息的J2EE应用程序。如果未对可以保存在内存中的会话对象的最大数量设置最大限制，则在高峰用户加载时间内可能有许多会话对象。这可能导致OutOfMemoryErrors并不是真正的内存泄漏，而是配置不正确。

WebSphere解决方案

传统的内存泄漏技术基于这样的想法，即您知道自己存在内存泄漏，并希望找出根本原因。技术各不相同，但总是涉及堆转储分析，附加Java虚拟机分析器接口（JVMPI）或Java虚拟机工具接口（JVMTI）代理，或使用字节码插入来跟踪对集合的插入和删除。这些分析机制非常复杂，尽管它们具有巨大的性能负担，并且不适合在生产环境中持续使用。

问题

企业应用程序中的内存泄漏会导致大量紧急情况。此类问题的成本包括分析所需的时间和金钱，生产环境中昂贵的停机时间，压力以及对应用程序和框架的信心丧失。

典型的分析解决方案是尝试将应用程序移动到隔离的测试环境中，在该环境中可以重新创建问题并可以执行分析。在这些测试环境中，复制内存的困难使关联的内存泄漏的成本增加了。

传统的冗余方法（例如群集）只能在一定程度上提供帮助。内存泄漏将在整个群集成员中传播。由于受影响的应用程序服务器的响应速度变慢，工作负载管理技术导致请求被路由到更健康的服务器，并可能导致协调的应用程序服务器崩溃。

成本构成

典型的内存泄漏场景分析表明，企业应用程序中内存泄漏分析成本的主要来源是直到问题严重时才识别它们。除非管理员具有监视内存趋势的技能，时间和资源，否则用户通常不会意识到自己有问题，直到他们的应用程序性能下降并且应用程序服务器对管理请求无响应为止。通常，与内存泄漏相关的成本来源主要来自三个方面：测试，检测和分析。

WebSphere Application Server已将内存泄漏的检测和分析确定为相关但独立的解决方案中的两个不同问题。不幸的是，没有简单的技术解决方案来解决与足够的测试相关的成本，并且该主题不会在本文中解决。

将检测与分析分开

传统技术的问题在于它们试图同时进行检测和分析。这导致解决方案性能不佳，或者涉及不适合许多生产环境的技术，例如JVMPI代理或字节代码插入。

通过将检测问题与分析隔离，我们能够在WebSphere Application Server V6.0.2中提供轻量级的生产就绪型内存泄漏检测机制。该解决方案使用便宜的通用统计信息来监视内存使用趋势并提供内存泄漏的早期通知。这使管理员有时间准备适当的备份解决方案并分析问题的原因，而又不会造成与测试环境中的复制相关的昂贵且令人沮丧的问题。

尽管生成HeapDumps的过程很昂贵，并且不建议在应用程序服务器承受大量生产负载的情况下使用，但在负载处于活动状态时，无需生成HeapDumps。管理员可以设置负载平衡，或在较低的使用时间生成HeapDump，以避免短期内性能下降。

为了促进此分析，WebSphere Support提供了Java内存转储诊断工具（MDD4J），这是一种重量级的离线内存泄漏分析工具，将多种成熟技术整合到单个用户界面中。

为了弥合检测与分析之间的鸿沟，我们提供了一种自动化工具，可以在IBM JDK上生成HeapDump。此机制将生成多个堆转储，这些堆转储已与足够的内存泄漏进行了协调，以便于使用MDD4J进行比较分析。 HeapDumps的生成非常昂贵；默认情况下，此功能是禁用的，并且提供了MBean操作以在适当的时间启用它。

内存泄漏检测

WebSphere Application Server中的轻量级内存泄漏检测旨在提供对测试和生产环境中的内存问题的早期检测。它对性能的影响最小，并且不需要附加代理程序或使用字节码插入。尽管它旨在与脱机分析工具（包括MDD4J）结合使用，但它并非旨在提供问题根源的分析。

算法

如果应用程序服务器的状态和工作负载稳定，则内存使用模式应相对稳定。

图1.详细的GC图显示了内存泄漏应用程序的可用内存（绿色）和已用内存（红色）

启用详细GC是调试内存泄漏的问题确定过程的第一步。有关在IBM Developer Kit上启用详细GC的说明，请参阅《 IBM Developer Kit的诊断指南》（请参阅参考资料）。支持人员和客户都使用类似的流程和图表进行详细的GC统计，以确定内存泄漏是否是失败的原因（请参阅参考资料）。如果GC周期后的可用内存持续减少，则很可能发生内存泄漏。图1中的图表是在发生内存泄漏的应用程序中发生GC循环后，图表中的可用内存的示例（该图表使用的是内部IBM工具）。泄漏非常明显，但是除非主动监视数据，否则在服务器崩溃之前您不会意识到它。

我们的内存泄漏检测机制通常会自动执行此过程，以在GC周期后寻找空闲内存中的持续下降趋势。我们无法假设可以获得详细的GC信息，并且JVMPI对于生产而言过于昂贵（并且需要附加的代理程序）。因此，我们仅限于PMI数据，这些数据会进行Java API调用以获取可用内存和总内存统计信息。详细GC会在GC周期结束后直接提供可用内存统计信息，而PMI数据则不会。我们通过使用分析可用内存统计信息方差的算法来对垃圾回收周期后的可用内存进行近似估算。

泄漏可能非常快，也可能非常缓慢，因此我们分析了短间隔和长间隔的内存趋势。没有设置最短间隔的时间段，而是通过可用内存统计信息的方差得出的。

由于我们在生产服务器中运行，并且试图检测内存泄漏（而不是创建内存泄漏），因此我们必须存储非常有限的数据。废弃不再需要的原始数据点和汇总数据点，以将我们的内存占用降至最低。

通过在近似/汇总的可用内存统计信息中查找下降趋势来分析周期。规则的配置决定了如何应用严格的条件，尽管该规则配置了一组应通用的默认值。

除了垃圾回收周期后近似内存的下降趋势外，我们还会研究垃圾回收后的平均可用内存低于某些阈值的情况。这种情况要么是内存泄漏的迹象，要么是在资源太少的应用程序服务器上运行应用程序。

i系列

OS /400®或iSeries引入了一些独特的方案。 iSeries机器通常配置有有效的可用内存池大小。该池大小决定了可用于JVM的内存量。当Java堆超过此值时，将使用DASD（磁盘）容纳堆。这样的解决方案总是会导致糟糕的性能，尽管管理员可能没有意识到这个问题，因为即使服务器瘫痪了，应用服务器仍会保持响应。

如果将Java堆大小扩展到DASD，我们将发出警报，通知管理员这将发生，并且它们的有效内存池大小过小，资源太少或发生内存泄漏。

扩大堆

Java堆通常配置为具有最小和最大堆大小。在堆扩展的同时，分析可用内存趋势非常困难。我们避免在堆扩展时进行任何下降趋势分析，而是监视并确定堆是否很快将耗尽资源。这是通过监视堆大小是否持续增加，GC周期后的可用内存是否在堆的某个阈值以内并因此推动堆扩展以及通过预测当前趋势是否持续（如果JVM继续运行）来实现的资源不足。如果发现了这种情况，我们会通知用户潜在的问题，以便他们可以监视情况或制定应急计划。

HeapDump生成

许多分析工具（包括MDD4J）都分析堆转储以查找内存泄漏的根本原因。在IBM JDK上，通常将因OutOfMemoryExceptions而生成的HeapDumps用于此类分析。如果希望更加主动，则需要在适当的时间生成HeapDumps进行分析。这非常重要，因为如果在不适当的时间生成了HeapDump，则会导致错误的分析。例如，如果在工作负载开始时生成了HeapDump，则通常将填充的缓存标识为内存泄漏。

结合我们的内存泄漏检测机制，WebSphere Application Server提供了一种可以结合内存泄漏趋势生成多个堆转储的功能。这样可确保在出现明显的内存泄漏证据后进行堆转储，并具有足够的内存泄漏以确保获得有效分析结果的最佳机会。

缺省情况下，可以启用自动堆转储生成，或者可以在适当的时间使用MBean操作启动自动堆转储。

除了自动的堆转储生成实用程序之外，还可以通过使用wsadmin（请参阅WebSphere Application Server信息中心）或在设置一些环境变量（请参阅Unix®平台）后发送kill -3信号（在Unix®平台上）来手动生成堆转储。相关主题中的诊断指南）。

好处

管理员能够在测试和生产环境中运行轻量级内存泄漏检测，并提前收到内存泄漏的通知。这使管理员能够设置应急计划，在问题可重现时进行分析并在失去应用程序响应能力或应用程序服务器崩溃之前诊断结果。

结果是大大降低了与企业应用程序中的内存泄漏相关的成本。

继承限制

内存泄漏检测规则是根据一种简单的原理设计的。它使用免费的可用数据并进行必要的近似计算，以提供有关内存泄漏的可靠通知。由于所分析数据的固有局限性和所需的近似值，因此，现有解决方案使用更好的数据和更复杂的算法，并且应获得更准确的结果（尽管并非没有很大的性能成本）。但是，我们可以说，虽然简单，但是我们的实现很便宜，使用通用统计信息并检测内存泄漏。

自主管理器集成

轻量级内存泄漏检测是完全可配置的，旨在与高级自主管理器或自定义JMX客户端进行交互。 IBM WebSphere Extended Deployment是这种关系的一个示例。

WebSphere Extended Deployment抽象化WebSphere Application Server拓扑并适当地部署应用程序，以对不断变化的工作负载做出React，同时保持应用程序性能标准。它还纳入了健康管理政策。 WebSphere Extended Deployment内存运行状况策略使用WebSphere Application Server内存泄漏检测功能来识别应用程序服务器何时发生内存泄漏。

WebSphere Extended Deployment提供了许多配置内存泄漏检测的策略。在示例策略中，策略将通过获取多个堆转储（使用工作负载管理来维护应用程序的性能）进行分析来对内存泄漏通知做出React。另一个策略可能只是监视应用程序服务器的内存水平何时对损坏应用程序服务器重新启动至关重要。

生产系统的内存泄漏分析

确定Java内存泄漏的根本原因需要两个步骤：

确定内存泄漏在哪里。 标识对象，无意引用，持有这些无意引用的类和对象，以及无意引用指向的对象。
确定泄漏发生的原因。 确定负责在程序的适当位置不释放那些无意引用的源代码方法（程序逻辑）。

用于Java的Memory Dump Diagnostic工具有助于确定应用程序中发生内存泄漏的位置。但是，该工具无法帮助识别导致内存泄漏的错误源代码。借助此工具识别出泄漏的数据结构类和程序包后，您可以使用任何调试器或在日志记录中使用特定的跟踪语句来识别错误的源代码方法，并对应用程序代码进行必要的更改以解决内存问题。泄漏。

分析技术需要占用大量CPU，内存和磁盘空间。因此，分析机制被实现为离线工具。该机制特别适合在生产环境或压力测试环境中运行的大型应用程序。该工具可用于分析（离线）手动获得的或与轻量级内存泄漏检测结合产生的这些转储。

用于Java的Memory Dump Diagnostic工具针对以下角色，旨在满足这些相关目标：

系统管理员

哪个组件正在泄漏（客户应用程序中或WebSphere Application Server内部的数据结构）？

经过分析，泄漏候选列表中标识的对象的程序包名称和类名称可以标识导致内存泄漏的组件，而无需深入了解应用程序代码。
应用程式开发人员

哪些数据结构实际上在泄漏并且不是有效的缓存？

内存泄漏数据结构与非泄漏数据的不同之处仅在于，泄漏数据结构的大小无限制地增长，而非泄漏数据结构仅在特定范围内增长。 MMD4J提供的工具可帮助开发人员确认可疑数据结构实际上是内存泄漏还是正在适当增长的数据结构。

是什么导致数据结构泄漏？

确认内存泄漏数据结构后，出现的下一个问题是哪些类，对象和对象引用导致内存泄漏对象保留在其预期生命周期之外的内存中？用于Java的内存转储诊断工具可在树状视图中浏览和导航堆中的所有对象引用，同时显示任何选定对象的所有父对象。这有助于识别可能导致内存泄漏的意外对象引用。

哪些数据类型和数据结构导致大量占用空间？

在许多情况下，发生OutOfMemoryError错误不是由于内存泄漏，而是由于导致Java堆消耗过多的配置问题。在这些情况下，通常需要检测Java应用程序占用空间的主要贡献者，以归咎于不同组件。用于Java的Memory Dump Diagnostic工具可帮助识别对Java堆和这些贡献者之间的所有权关系有重大贡献的数据结构。这有助于应用程序开发人员了解不同应用程序组件对Java堆的贡献。

该工具支持IBM便携式堆转储（.phd），IBM Text，HPROF Text和SVC堆转储格式。有关格式和受支持的JDK版本的详细列表，请参见附录。

技术概述

用于Java的Memory Dump Diagnostic工具提供了对来自Java虚拟机（JVM）的常见格式的内存转储的分析功能，该Java虚拟机在各种IBM和非IBM平台上运行WebSphere Application Server。内存转储的分析旨在确定Java堆中可能是内存泄漏的根本原因的区域或数据结构的集合。该工具以图形格式显示内存转储的内容，同时突出显示已识别的区域作为内存泄漏的嫌疑人。图形用户界面提供了浏览功能，以验证可疑的内存泄漏区域，并了解包含这些泄漏区域的数据结构。

该工具提供两种主要类型的分析功能：单内存转储分析和比较分析。

单个转储分析最常用于由Java开发人员工具包自动触发的内存转储，并具有OutOfMemoryExceptions异常。这种类型的分析使用启发式方法来识别具有容器对象的可疑数据结构（例如， HashMap对象，带有一个带有大量子对象的HashMap $ Entry对象数组。这种启发式方法对于检测内部使用数组存储存储的对象的泄漏Java集合对象非常有效。已经发现，这种启发式方法在IBM支持部门处理的大量内存泄漏案例中是有效的。

除了查找丢弃的可疑对象之外，单个转储分析还确定了对象引用图中的聚合数据结构（稍后定义），这些数据结构是Java堆占用空间的最大贡献者。
在内存泄漏应用程序运行期间（即，空闲的Java堆内存正在删除时）运行的两次内存转储之间进行了比较分析 。出于分析目的，在泄漏应用程序的运行早期触发的内存转储称为基准内存转储。在泄漏应用程序运行一段时间以允许泄漏增加后触发的内存转储为称为主内存转储。在内存泄漏的情况下，预计主内存转储包含比基线内存转储更多得多的对象，这些对象占用的Java堆更大。为了获得更好的分析结果，建议将主内存转储的触发点与基准内存转储的触发点分开，以使总消耗堆大小大量增加。

比较分析确定了一组数据结构，这些数据结构的组成数据类型的实例数量显着增长。通过根据所有权的相似性（即在对象引用图中导致对象的引用链）将每个堆转储中的所有对象分类到不同区域（或等效类）中，可以识别这些可疑数据结构。通过在转储中每个对象的所有权上下文中对对象的数据类型采用模式匹配技术来实现分类。在每个转储中找到的区域都会进行匹配，并在主转储和基准转储之间进行比较。比较分析中确定的区域具有以下特征：
- 泄漏容器：该对象将所有具有大量实例的对象保存在内存中；例如，清单1中的内存泄漏示例中的HashSet对象。
- 泄漏单位：在一个区域内正在增长或大量存在的代表性对象的对象类型；例如，清单1中的内存泄漏示例中的HashMap $ Entry对象保存着泄漏的String对象。
- 泄漏根：这是在堆中将泄漏容器保存的对象引用链中的代表性对象。通常，这是一个持有静态引用的类对象。例如，清单1中的内存泄漏示例中的MyClass对象。或者，它也可以是植于Java堆栈中的对象，或者具有将其保存在内存中的本机引用。
- 所有者链：这是从泄漏根到泄漏单元的对象引用链中的对象集。所有者链中对象的数据类型和包名称有助于识别导致内存泄漏的应用程序组件。

分析结果显示在具有以下功能的基于Web的交互式用户界面中：

列出分析结果，堆内容，大小和增长的摘要。
列出可疑的数据结构，数据类型和程序包，以促进堆使用率的增长，以进行比较分析，并列出较大的堆空间以进行单个转储分析。
所有权上下文视图显示了主要足迹贡献者摘要集中的足迹主要贡献者与重要构成数据类型之间的所有权关系。
堆转储内容的交互式树状视图中的浏览功能，显示了堆中任何对象的所有传入引用（树中仅显示一个引用，其余引用分别显示）和传出引用以及根据范围大小排序的子对象。
从可疑列表到所有权上下文，内容视图到浏览视图的导航功能。
具有过滤器和排序列的内存转储中所有对象和数据类型的列表视图。

案例研究1：MyClass内存泄漏示例的内存泄漏分析

对来自MyClass的一对堆转储的比较分析（清单1中的示例代码）在图2中显示了以下泄漏可疑对象。

图2.疑似MyClass内存泄漏示例

分析结果的“可疑”选项卡在四个表中列出了可疑的内存泄漏。 The Data Structures That Contribute Most to Growth table lists data structures which are identified by the comparative analysis techniques described above. Each row of the table identifies a single memory leak suspect data structure:

Leaking Unit Object Type - lists the data type of the leak container object.
Growth - lists the growth observed in the size of this data structure/region in the heap in between the primary and baseline heap dump.
Size - lists the size of this data structure/region in the primary heap dump.
Contribution to Heap - lists the size of the data structure/region as a percentage of the total heap size of the primary dump.

Often there are multiple data structure suspects and the likelihood of a suspect being a memory leak can estimated from these columns. In this case, there is only one suspect and the fact that this data structure can account for 84 percent of the total heap size of the primary dump identifies this as a very likely suspect.

The second table, Data Structures with Large Drops in Reach Size , lists data structures identified by the single dump analysis on the primary heap dump. Each row in the table identifies a data structure with a potential container object that has a large number of child objects. Both the first and second table can point to the same suspect. If suspects identified in the first and second table are related, then the corresponding rows in both the tables are highlighted.

The third table, Object Types that contribute most to Growth in Heap Size , lists different data types that have experienced a large growth in the number of instances between the primary and baseline dump. These data types are not categorized into different data structures or regions based on their ownership context; rather, these are the top-most growing data types for the whole heap. Again, if a particular data type has a large number of instances in a selected data structure or region, then that data type row is highlighted.

The fourth table, Packages that contribute most to Growth in Heap Size , lists different Java package names for data types that have experienced a large growth in the number of instances between the primary and baseline dump. The application component which is responsible for the memory leak is often identified by the package name and class name of the data types which are part of growing regions. This table identifies suspect package names with the largest growth, which can help identify the responsible application component for the memory leak.

A single heap dump (often the heap dump generated automatically with the OutOfMemory error) can be also analyzed with MDD4J. Figure 3 shows the results of the Suspects tab when the primary heap dump from the example in Listing 1 is analyzed just by itself.

(There are only three tables in this case in the Suspects tab. There are no data structure suspects because comparative analysis is not performed. The Object Types that contribute most to Heap Size and Packages that contribute most to Heap Size tables do not show any growth statistics but only the total number of instances in the primary dump.)

Figure 3. Single dump analysis result for the MyClass memory leak example from Listing 1

After selecting a data structure in the Suspects tab, you can visit the Browse tab to see the chain of object references holding the leak container in the heap, as shown in Figure 4.

Figure 4. Browse suspects for MyClass memory leak example

In this example, it can be seen that there is a chain of references starting from the class object with the name MyClass to a HashSet to a HashMap to an array of HashMap$Entry objects with a very large number of HashMap$Entry child objects. Each HashMap$Entry object holds a String object. This describes the data structure created in the memory leak example shown in Listing 1.

The tree view in this tab shows all the object references in the heap dump, except in the cases where an object has more than one parent object. The parent table in the left panel shows all the parent objects of any selected object in the tree. Any row in the parent table can be selected to expand the tree to the location of the selected parent object. The left panel also shows other details for any selected object in the tree; for example, the size of the object, the number of children, the total reach size, and so on.

Figure 5. Ownership context for MyClass memory leak example

The Ownership Context and Contents tab helps answer the question of what are the major contributors to the footprint of the heap in the primary dump (Figure 5). It also helps to show the ownership relationship between the identified major contributors and the constituent data types for each of the major contributors. In this example, the MyClass node has been identified as a major contributor in the OwnershipContext graph shown in the left panel. On the right panel, the data types that contribute significantly to this node are listed. The HashMap$Entry object, of which there is one instance for each element in the HashSet, is shown in this set.

The analysis results are also stored in a text file with the name AnalysisResults.txt. The text analysis results can be viewed from a link in the Summary tab, as well as accessed from the corresponding analysis results directory in the file system. Listing 2 shows a snippet from the AnalysisResults.txt file, which shows the results of the analysis for the MyClass memory leak example.

Listing 2. Textual analysis results for MyClass memory leak example

Suspected memory leaking regions:
Region Key:0,Leak Analysis Type:SINGLE_DUMP_ANALYSIS,Rank:1.0
Region Size:13MB, !RegionSize.DropSize!13MB
Owner chain - Dominator tree:
MyClass, class0x300c0110, reaches:13MB), LEAK_ROOT
|java/util/HashSet, object0x3036d840, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|-java/util/HashMap, object0x3036d8a0, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|--java/util/HashMap$Entry, array0x30cf0870, reaches:13MB), LEAK_CONTAINER
|---Leaking unit:
|----java/util/HashMap$Entry, object0x3028ad18, reaches:480 bytes)
|----java/util/HashMap$Entry, object have grown by 72203 instances
Region Key:2,Leak Analysis Type:COMPARATIVE_ANALYSIS,Rank:1.0
Region Size:12MB, Growth:12MB, 300001 instances
Owner chain - Dominator tree:
MyClass, class0x300c0110, reaches:13MB), LEAK_ROOT
|java/util/HashSet, object0x3036d840, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|-java/util/HashMap, object0x3036d8a0, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|--java/util/HashMap$Entry, array0x30cf0870, reaches:13MB), LEAK_ROOT_TO_CONTAINER
|---java/util/HashMap$Entry, object0x30e88898, reaches:256 bytes), LEAK_CONTAINER
|----Leaking unit:
|-----java/util/HashMap$Entry, object have grown by 1 instances

Case study 2: Analysis results for memory leak due to un-invoked listener call back method

Figure 6 shows the Explore Context and Contents tab for a memory leak case involving a defect found during system testing in IBM WebSphere Portal. This defect occurred because there was some session state that was not being cleaned up when the session object was invalidated. The Ownership Context graph in this example shows that the MemorySessionContext node is the largest contributor to the footprint of the heap, which is expected because the MemorySessionContext is a WebSphere object which stores all the in memory session data.

Figure 6. Ownership context for a memory leak involving un-invoked listener methods

To find the root cause of the memory more specifically, it is necessary to see the Browse tab in Figure 7, where you can see that there are a very large number of LayoutModelContainer objects that are WebSphere Portal Server objects stored in the user session. After looking closely at the data structure and the number of LayoutModelContainer objects, it is possible to infer that the LayoutModelContainer objects were not getting removed when they were no longer required. Hence, it was possible to infer that the session invalidation listener code was not getting invoked properly. It was later discovered that the root cause was a WebSphere Application Server bug related to session invalidation when multiple clones are present, which was at the root cause of this problem. This issue was rectified soon afterwards.

Figure 7. Browse view for leaking WebSphere Portal LayoutModelContainer objects

Case Study 3: Analysis results from a memory leak due to an infinite loop

Figure 8 shows the Suspects tab from the analysis result of two heap dumps taken from a memory leak case involving an infinite loop. The symptoms presented in the verbose GC logs show a very fast drop in the available free heap space in a very short time. Analysis of a heap dump taken during the time when the free heap was dropping was critical to understanding the root cause of the problem. The OutOfMemoryError heap dump did not have the memory leaking data structure in it, because the memory leaking data structure was rooted in the Java stack, which got unrolled prior to generating the heap dump.

As can be seen from the Suspects tab, there are a very large number of instances of objects from the package: org.apache.poi.usermodel and an unusually large number of instances of the class org.apache.poi.usermodel.HSSFRow.

Figure 8. Suspects for a memory leak in Apache Jakarta POI application

Figure 9 shows the Browse tab in this analysis result. It can be observed that there is a chain of reference starting from an object of the type org.apache.poi.hssf.usermodel.HSSFWorkbook, which has an ArrayList containing a very large number (20,431) of HSSFSheet objects.

Figure 9. Browse leaking Apache Jakarta POI HSSFSheet objects

Further analysis showed that the HSSFSheet objects were getting created in a method which was stuck in an infinite loop and were getting added to an HSSFWorkbook object which was referenced from the Java stack. The thread dumps taken at the same time as the primary heap dump showed two Java thread stacks which were in the same method which was creating the HSSFSheet objects. Inspection of the Java source code (from the open source Apache project Jakarta POI) revealed some unsynchronized multi-threaded code access patterns which were fixed in a subsequent release. From this analysis, it was possible to narrow down the root cause of the memory leak to the Jakarta POI application component.

Case study 4: Example of a memory leak due to an infinite loop

Figure 10 shows another example of a memory leak due to an open source application component: com.jspsmart.upload.SmartUpload. From the left panel, you can see that there are an unusually large number of com.jspsmart.upload.File objects that are pointing to the SmartUpload object.

Figure 10. Browse memory leaking jspsmart SmartUpload File objects

Case study 5: Analysis results from a memory leak involving a large number of JMSConnection objects

Figure 11 shows a memory leak case involving a large number of JMSTopicConnection objects.

Figure 11. Suspects showing leaking JMS connection artifacts

Figure 12. Ownership context for leaking JMS connection objects

From the Ownership Context Graph in Figure 12, you can see that these JMSTopicConnnectionHandle objects are owned by another significant contributor node to the Java heap with PartnerLogData class in it. In addition, the PartnerLogData class has an unusually large number of XARecoveryWrapper objects. Further investigation revealed the existence of a WebSphere Application Server bug that was causing unused XARecoveryWrapper objects to remain in memory. These XARecoverWrapper objects were in turn holding up a large number of JMSTopicConnection objects in the Java heap. These JMSTopicConnection objects were also holding up significant amount of native heap resources. Thus, this problem was manifesting as a native memory leak with the root cause in the Java heap.

Case study 6: Analysis results showing WebSphere objects which are not really leaking

Figure 13. In-memory HTTP session artifacts in WebSphere

Figure 13 shows a memory leak analysis suspect pointing at the WebSphere MemorySessionContext object. The MemorySessionContext object has a reference to com.ibm.ws.webcontainer.httpsession.SessionSimpleHashMap leading to instances of .ibm.ws.webcontainer.httpsession.MemorySessionData objects. These objects are WebSphere Application Server implementations of in-memory HTTP session objects. These objects can be present in large numbers in a J2EE application heap, which uses HTTP sessions to store user session in memory. Such objects do not always signify a memory leak. Presence of a large number of objects of these types can imply that there too many sessions currently active, due to a heavy user load and an OutOfMemory error which, under these circumstances, can be easily circumvented by either increasing the maximum heap size, or by setting a limit on the maximum number of live sessions kept in memory at any time. Presence of a large number of objects of these types could also signify a deeper memory leak where application objects held by these session objects are actually leaking. So you can assume that the memory leak is in WebSphere Application Server when there are WebSphere-specific classes showing up in the analysis results.

Comparing available analysis tools

There are two kinds of Java memory leak detection tools currently available on the market. The first type of tool is an online tool that attaches to a running application and derives Java heap information from the application, either by instrumenting the Java application or by instrumenting the JDK implementation itself. Examples of this nature are Wily LeakHunter, Borland Optimizeit, JProbe Weblogic Memory Leak Detector, and so on. Although these tools can identify the individual types of objects that are increasing in number over a period of time, they do not help to identify the composite data structure of which these objects are part. To understand the root cause of a memory leak, it is necessary to identify not only individual leaking objects at a lower granularity, but also what is holding on to the leaking objects, and look at the whole data structure causing the memory leak at a larger granularity. In addition, profiling techniques used in some of these tools add overhead to normal application processing time, which make them unsuitable for production usage. Furthermore, application instrumentation techniques modify application behavior, which may also not be desirable.

Another set of tools includes the HAT tool from SUN Microsystems® which also analyzes Java heap dumps. The HAT tool produces statistics for data types that have large number of instances in a single dump and can also compare two heap dumps to identify data types which have increased in number. Again, what is missing is a description of the leaking data structure.

HeapRoots is an experimental console-based tool for analyzing IBM JDK heap dumps similar to the HAT tool, but it does not pinpoint the root cause of a memory leak. Memory Dump Diagnostic for Java (MDD4J) improves upon basic analysis features provided in HeapRoots by adding comparative and single dump analysis for memory leak root cause detection, and also provides an interactive graphical user interface.

Benefits and limitations

The lack of scalable and low overhead memory leak analysis tools make it hard to deal with memory leak issues in production or stress test environments.

The MDD4J tool is designed to address this gap. By analyzing heap dumps offline in the MDD4J tool running within the IBM Support Assistant process enables resource intensive pattern matching algorithms to be applied to comparatively analyze the dumps to detect root causes of the memory leak. These pattern matching algorithms seek to identify aggregated data structures (grouped together by similarity of ownership structure) that are growing the most in between the memory dumps. This approach not only identifies low level objects experiencing growth but also identifies higher level data structures, of which the various leaking objects are part. This helps to answer the question of what is leaking at a higher level of granularity than ubiquitous low level objects such as Strings.

In addition, the tool also provides footprint analysis, which identifies a summarized set of major contributors to the size of the Java heap, their ownership relationships, and their constituent significant data types. The ownership relationships, along with the browsing capabilities, also help to answer the question of what is holding on to the leaking objects in memory, thus causing the leak. The data types in the ownership context and contents also help to ascertain blame to a particular high level component in the whole memory leaking application. This helps to apportion the responsibility to the correct development team for detailed analysis.

It is also important to point out that the tool only points out suspects which may or may not be actual memory leaks. This is because leaking data structures and valid caches of objects are often indistinguishable.

Another gap is that it is not possible for the tool to identify the source code in the memory leaking application that is causing the memory leak to occur. To provide that information, it is necessary to capture allocation stack traces for every object allocation, which is very expensive and is also not available in most formats of memory dumps.

结论

When used in conjunction with lightweight memory leak detection, the Memory Dump Diagnostic for Java tool provides a complete production system that combines the benefits of early notification within production environments with state of the art offline analysis results.

Appendix: Supported HeapDump formats and JVM versions

The following formats of memory dumps are supported by the Memory Dump Diagnostic for Java tool:

IBM Portable Heap Dump (.phd) format (for WebSphere Application Server Versions 6.x, 5.1 on most platforms)
IBM Text heap dump format (for WebSphere Application Server Versions 5.0 and 4.0 on most platforms)
HPROF heap dump format (for WebSphere Application Server on the Solaris® and HP-UX platforms)
SVC Dumps (WebSphere on the IBM zSeries)

Table 1 provides a list of WebSphere versions, JDK versions, and Java memory dump formats supported on different platforms hosting the WebSphere Application Server JVM.

Table 1. Applicable platforms and versions

平台	WebSphere version	JDK Version	Java Memory Dump Format
AIX®, Windows®, Linux®	6.1	IBM J9 SDK 1.5	Portable Heap Dump (PHD)
	6.1	IBM J9 SDK 1.5 (64-bit)	Portable Heap Dump (PHD)
	6.0.2	IBM J9 SDK 1.4.2 (64-bit)	Portable Heap Dump (PHD)
	6.0 - 6.0.2	IBM SDK 1.4.2	Portable Heap Dump (PHD)
	5.1	IBM SDK 1.4.1	IBM Text Heap Dump
	5.0 - 5.0.2	IBM SDK 1.3	IBM Text Heap Dump
	4.0	IBM SDK 1.3	IBM Text Heap Dump
Solaris®, HP®	6.1	SUN JDK 1.5	HPROF(ASCII)
	6.1	SUN JDK 1.5 (64-bit)	HPROF(ASCII)
	6.0.2	SUN JDK 1.4.2 (64-bit)	HPROF(ASCII)
	6.0 - 6.02	SUN JDK 1.4	HPROF(ASCII)
	5.1	SUN JDK 1.4	HPROF(ASCII)
	5.0 - 5.0.2	SUN JDK 1.3	HPROF(ASCII)
	4.0	SUN JDK 1.3	HPROF(ASCII)
z/OS®	6.1	IBM J9 SDK 1.5	SVC/PHD
	6.1	IBM J9 SDK 1.5 (64-bit)	SVC/PHD
	6.0.2	IBM SDK 1.4.2 (64-bit)	SVC/PHD
	6.0 - 6.0.2	IBM SDK 1.4.2	SVC/PHD
	5.1	IBM SDK 1.4.1	SVC
	5.0-5.02	IBM SDK 1.3	SVC
OS/400	6.1	IBM SDK 1.5	PHD

Acknowlegments

The authors would like to thank Daniel Julin and Stan Cox for reviewing the paper, and Scott Shekerow for editing the contents.

翻译自: https://www.ibm.com/developerworks/websphere/library/techarticles/0608_poddar/0608_poddar.html

内存泄漏检测工具

cuxiong8996

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫