全局引用表溢出引起的NE问题分析

一,首先看如下NE问题的backtrace,如下所示:

 #00 pc 000000000001d808  /system/lib64/libc.so (abort+120)

   #01 pc 0000000000476644  /system/lib64/libart.so (art::Runtime::Abort(char const*)+552)

    #02 pc 000000000056c5ec  /system/lib64/libart.so (android::base::LogMessage::~LogMessage()+1004)

    #03 pc 0000000000264258  /system/lib64/libart.so (art::IndirectReferenceTable::Add(art::IRTSegmentState, art::ObjPtr<art::mirror::Object>)+764)

    #04 pc 00000000002ff750  /system/lib64/libart.so (art::JavaVMExt::AddGlobalRef(art::Thread*, art::ObjPtr<art::mirror::Object>)+68)

    #05 pc 0000000000343788  /system/lib64/libart.so (art::JNI::NewGlobalRef(_JNIEnv*, _jobject*)+572)

#06 pc 000000000011f838  /system/lib64/libandroid_runtime.so (JavaDeathRecipient::JavaDeathRecipient(_JNIEnv*, _jobject*, android::sp<DeathRecipientList> const&)+136)

出现异常时对应的mobile log如下

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531] JNI ERROR (app bug): global reference table overflow (max=51200)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531] global reference table dump:

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]   Last 10 entries (of 51198):

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51197: 0x14e79af0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51196: 0x13a74a08 com.android.server.am.ServiceRecord

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51195: 0x14e71ce8 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51194: 0x14e74530 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51193: 0x14e70958 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51192: 0x14e708b8 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.819   838  1526 F zygote64: runtime.cc:531]     51191: 0x14e750b0 com.android.server.wm.WindowState$DeathRecipient

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]     51190: 0x14e6db40 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]     51189: 0x13a723f0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]     51188: 0x14e6c508 android.os.Binder

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]   Summary:

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]     40143 of com.android.server.am.ServiceRecord (40143 unique instances)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]      6023 of java.lang.ref.WeakReference (6023 unique instances)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]      2931 of android.os.RemoteCallbackList$Callback (2931 unique instances)

08-14 06:21:12.820  838  1526 F zygote64: runtime.cc:531]       355 of com.android.server.content.ContentService$ObserverNode$ObserverEntry (355 unique instances)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]       313 of java.lang.Class (235 unique instances)

08-14 06:21:12.820   838  1526 F zygote64: runtime.cc:531]       247 of com.android.server.am.ActivityRecord$Token (247 unique instances)

上面log直接打印了错误的原因: JNI ERROR (app bug): global reference table overflow (max=51200)

这个说明全局引用表达到最大值51200,说明有全局对象个数溢出触发了NE。

紧着这行语句的下面的就是打印出具体哪些对象溢出:Last 10 entries (of 51198):后面打印出最后的10个对象信息,而Summary:后面是打印所有的对象概要信息。

相关的代码如下:

IndirectRef IndirectReferenceTable::Add(IRTSegmentState previous_state,

                                        ObjPtr<mirror::Object> obj) {

    size_t top_index = segment_state_.top_index;

 

    CHECK(obj != nullptr);

    VerifyObject(obj);

    DCHECK(table_ != nullptr);

 

    if (top_index == max_entries_) {

      if (resizable_ == ResizableCapacity::kNo) {

        LOG(FATAL) << "JNI ERROR (app bug): " << kind_ << " table overflow "

                 << "(max=" << max_entries_ << ")\n"

                 << MutatorLockedDumpable<IndirectReferenceTable>(*this);

        UNREACHABLE();

      }

    }

}

其中max_entries_当前值定义为51200,全局引用表的个数最大值定义如下:

static constexpr size_t kGlobalsMax = 51200;  // Arbitrary sanity check. (Must fit in 16 bits.)

   

我们在mobile log中看到那些打印,实际上就是上面红色字体输出的,其中<< MutatorLockedDumpable<IndirectReferenceTable>(*this);是重载了std::ostream 的<<运算符,对应模板函数定义如下:

template<typename T>

inline std::ostream& operator<<(std::ostream& os, const MutatorLockedDumpable<T>& rhs) {

  Locks::mutator_lock_->AssertSharedHeld(Thread::Current());

  rhs.Dump(os);

  return os;

}

*this指向IndirectReferenceTable对象,它调用Dump()方法如下:

void IndirectReferenceTable::Dump(std::ostream& os) const {

  os << kind_ << " table dump:\n";

  ReferenceTable::Table entries;

  for (size_t i = 0; i < Capacity(); ++i) {

    ObjPtr<mirror::Object> obj = table_[i].GetReference()->Read<kWithoutReadBarrier>();

    if (obj != nullptr) {

      obj = table_[i].GetReference()->Read();

      entries.push_back(GcRoot<mirror::Object>(obj));

    }

  }

  ReferenceTable::Dump(os, entries);  //这个是核心打印函数,包括打印最后10个引用对象,以及所有对象的摘要信息都在这个方法里面,感兴趣的可以阅读以下。

}

从Summary的相关打印可以看出ServiceRecord实例有多达40143个,这个是需要重点分析的。

二, 通过DB中的dumpsys activitiy信息可以查看系统中到底有哪些ServiceRecord实例,文件中可以看到大部分ServiceRecord都是如下类型:

* Destroy ServiceRecord{871ec04 u0 com.google.android.googlequicksearchbox/com.google.android.voicesearch.ime.VoiceInputMethodService}

    intent={act=android.view.InputMethod cmp=com.google.android.googlequicksearchbox/com.google.android.voicesearch.ime.VoiceInputMethodService}

    packageName=com.google.android.googlequicksearchbox

    processName=com.google.android.googlequicksearchbox:search

    permission=android.permission.BIND_INPUT_METHOD

    baseDir=/system/priv-app/Velvet/Velvet.apk

    dataDir=/data/user/0/com.google.android.googlequicksearchbox

    app=ProcessRecord{e8026a 23595:com.google.android.googlequicksearchbox:search/u0a37}

    createTime=-15h53m25s487ms startingBgTimeout=--

    lastActivity=-15h53m25s486ms restartTime=-- createdFromFg=true

    executeNesting=4 executeFg=true executingStart=-15h53m25s474ms

destroying=true destroyTime=-15h53m25s474ms

这个说明是待销毁的ServiceRecord对象,但是一直在链表mDestroyingServices中未被销毁,那么为什么一直未被销毁呢,可以分析ServiceRecord的销毁代码流程,

最后销毁对象的对应代码如下:

private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,

            boolean finishing) {

     r.executeNesting--;

        if (r.executeNesting <= 0) {

            if (r.app != null) {

                //省略无关代码

                if (inDestroying) {

                    if (DEBUG_SERVICE) Slog.v(TAG_SERVICE,

                            "doneExecuting remove destroying " + r);

                    mDestroyingServices.remove(r);  //此处是删除对象的代码

                    r.bindings.clear();

                }

            }

}

三, 结合代码和ServiceRecord的打印信息可以看出,ServiceRecord未销毁原因是因为executeNesting=4,导致if (r.executeNesting <= 0)条件始终不满足,那么需要分析为什么这个值一直不满足。

首先得明白executeNesting这个值的含义,它表示Service执行一个动作完成的标志,从整个Service的创建、绑定、解除绑定等流程来说,是很难存在executeNesting值不会为0的情况,

<1>bumpServiceExecutingLocked()中执行r.executeNesting++;

<2> serviceDoneExecutingLocked()中执行r.executeNesting--;

以StopService过程为例来说明,其它调用例如bindService、unbindService、startService调用过程是类似的,不再重复。

private final void bringDownServiceLocked(ServiceRecord r) {

   //省略无关代码

bumpServiceExecutingLocked(r, false, "destroy");

                    mDestroyingServices.add(r);

                    r.destroying = true;

                    mAm.updateOomAdjLocked(r.app, true);

//省略无关代码

r.app.thread.scheduleStopService(r);

}

r.app.thread.scheduleStopService(r);语句最终是通过binder调用到对应应用程序的ActivityThread的handleStopService()方法。

四,  private void handleStopService(IBinder token) {

        Service s = mServices.remove(token);

        if (s != null) {

          try {

                if (localLOGV) Slog.v(TAG, "Destroying service " + s);

                s.onDestroy();  //此处调用的对应Service对象的onDestroy(),从而销毁Service。

                s.detachAndCleanUp();

                Context context = s.getBaseContext();

                if (context instanceof ContextImpl) {

                    final String who = s.getClassName();

                    ((ContextImpl) context).scheduleFinalCleanup(who, "Service");

                }

 

                QueuedWork.waitToFinish();

 

                try {

                    ActivityManager.getService().serviceDoneExecuting(

                            token, SERVICE_DONE_EXECUTING_STOP, 0, 0);

                } catch (RemoteException e) {

                    throw e.rethrowFromSystemServer();

                }

} catch (Exception e) {

                if (!mInstrumentation.onException(s, e)) {

                    throw new RuntimeException(

                            "Unable to stop service " + s

                            + ": " + e.toString(), e);

                }

                Slog.i(TAG, "handleStopService: exception for " + token, e);

            }

        }

     

ActivityManager.getService().serviceDoneExecuting(token, SERVICE_DONE_EXECUTING_STOP, 0, 0);该方法最终会调用到ActiveServices的serviceDoneExecutingLocked()方法,这样的话,上面所述的<1>和<2>就对应上了。

       上面是正常代码流程,没有问题,即使binder调用过程中有异常,也会被捕获到,并让进程重启,从而达到释放ServiceRecord对象目的。但是有一点是可能没有考虑到的,就是Service s = mServices.remove(token);这个语句返回的对象s做了非空判断,才调用回调ActivityManager.getService().serviceDoneExecuting(

                            token, SERVICE_DONE_EXECUTING_STOP, 0, 0);通知到ActiveServices对象。那么s是否可能真的为空呢,其实是有可能的。如果没有调用到

ActivityManager.getService().serviceDoneExecuting(),那么Service对象对应ActiveServices中的ServiceRecord实例的executeNesting值也就不能减1了,最后也就不能变为0,从而不能销毁掉。解决办法就是在上面else分支地方再次发送一个回调消息ActivityManager.getService().serviceDoneExecuting()通知ActiveServices对象。

 

五, 下面再解释一下一个ServiceRecord实例为什么在art中就会存在一个全局引用对象呢,

final class ServiceRecord extends Binder,由于ServiceRecord直接继承自Binder类,也就是说它属于一个Binder实体,每个Binder实体在jni层都会对应一个JavaBBinder实例,

下面简述一下调用过程

<1>writeStrongBinder(IBinder val) ->

<2>android_os_Parcel_writeStrongBinder()->

<3>ibinderForJavaObject(env, object)->

<4> JavaBBinderHolder::get(env, obj)->

<5> new JavaBBinder(env, obj);->

<6> mObject(env->NewGlobalRef(object)

   <7> JNI::NewGlobalRef()->

   <8> art::JavaVMExt::AddGlobalRef()->

   <9> art::IndirectReferenceTable::Add()

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值