Android TimeoutException治理

原创不易,转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645,谢谢!

常见的一种Android崩溃Timeout崩溃如下:

形如:java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds

java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds
    at android.os.BinderProxy.destroy(Native Method)
    at android.os.BinderProxy.finalize(Binder.java:548)
    at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:191)
    at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:174)
    at java.lang.Thread.run(Thread.java:818)

分析:

Android ART虚拟机中GC相关存在5个守护线程

 这五个守护线程分别是:

       1. ReferenceQueueDaemon:引用队列守护线程。我们知道,在创建引用对象的时候,可以关联一个队列。当被引用对象引用的对象被GC回收的时候,被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道那些被引用对象引用的对象已经被回收了。

       2. FinalizerDaemon:析构守护线程。对于重写了成员函数finalize的对象,它们被GC决定回收时,并没有马上被回收,而是被放入到一个队列中,等待FinalizerDaemon守护线程去调用它们的成员函数finalize,然后再被回收。

       3. FinalizerWatchdogDaemon:析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重定了成员函数finalize的对象在执行成员函数finalize时超出一定的时候,那么就会退出VM。

       4. HeapTrimmerDaemon:堆裁剪守护线程。用来执行裁剪堆的操作,也就是用来将那些空闲的堆内存归还给系统。

       5. GCDaemon:并行GC线程。用来执行并行GC。

 

其中,和这个crash相关的是FinalizerDaemon和FinalizerWatchdogDaemon。当finallize时候,会FinalizerDaemon守护线程去调用它们的成员函数finalize,同时watchdog线程去监视GC的过程。超过10秒则抛出异常。

正常情况下,不会出现这个crash。但是当系统锁屏后,CPU进入休眠状态,刚好处于GC finallize执行时候中,由于系统休眠而被暂时挂起,此时watchdog线程发现超过了10秒,抛出异常引发crash。

见 http://stackoverflow.com/questions/24021609/how-to-handle-java-util-concurrent-timeoutexception-android-os-binderproxy-fin

备注:10s由rom这个字段定义

private static final long MAX_FINALIZE_NANOS = 10L * NANOS_PER_SECOND;

反编译三方rom可以看到,有些手机rom更改了这个静态常量值到120,300,所以crash平台有些上报timed out after 120 seconds等这样的情况

理论上如果可以把这个变量更改到无限大,也可避免timeout,但是由于是静态常量,做了内联,无法做到。

处理方向:

1.治标:

思路:FinalizerWatchdogDaemon只是用来监测回收过程,我们直接停掉它

1)下面是某个公司平台部提供的方案:

public class GcHacker {

    private static final String TAG = GcHacker.class.getSimpleName();

    private GcHacker() {}

    public static void stopWatchdog() {

        if (Build.VERSION.SDK_INT > 19) {

            return;

        }

        try {

            Class watchdogCls = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");

            Field instanceField = watchdogCls.getDeclaredField("INSTANCE");

            instanceField.setAccessible(true);

            Object instance = instanceField.get(null);

            Class daemonCls = Class.forName("java.lang.Daemons$Daemon");

            Method stopMethod = daemonCls.getDeclaredMethod("stop");

            stopMethod.setAccessible(true);

            Field threadField = daemonCls.getDeclaredField("thread");

            threadField.setAccessible(true);

            Object thread = threadField.get(instance);

            Method isRunningMethod = daemonCls.getDeclaredMethod("isRunning");

            isRunningMethod.setAccessible(true);

            if ((boolean) isRunningMethod.invoke(instance)) {

                stopMethod.invoke(instance);

                threadField.set(instance, thread);

            }

        } catch (Throwable t) {

            LogUtils.e(TAG, Log.getStackTraceString(t));

        }

    }

}

使用后效果:

timeout异常有所减少,但是仍有一定量的上报

主要集中在Android5.0,5.1.1和6.0.1等系统上;

下面分析一下为什么这种方案有问题:

以5.1.1版本FinalizerWatchdogDaemon为例:

207    private static class FinalizerWatchdogDaemon extends Daemon {
208        private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
209
210        @Override public void run() {
211            while (isRunning()) {
212                boolean waitSuccessful = waitForObject();
213                if (waitSuccessful == false) {
214                    // We have been interrupted, need to see if this daemon has been stopped.
215                    continue;
216                }
217                boolean finalized = waitForFinalization();
218                if (!finalized && !VMRuntime.getRuntime().isDebuggerActive()) {
219                    Object finalizedObject = FinalizerDaemon.INSTANCE.finalizingObject;
220                    // At this point we probably timed out, look at the object in case the finalize
221                    // just finished.
222                    if (finalizedObject != null) {
223                        finalizerTimedOut(finalizedObject);
224                        break;
225                    }
226                }
227            }
228        }

关键方法:boolean finalized = waitForFinalization();

我们看一下里面:

267        private boolean waitForFinalization() {
268            long startTime = FinalizerDaemon.INSTANCE.finalizingStartedNanos;
269            sleepFor(startTime, MAX_FINALIZE_NANOS);
270            // If we are finalizing an object and the start time is the same, it must be that we
271            // timed out finalizing something. It may not be the same object that we started out
272            // with but this doesn't matter.
273            return FinalizerDaemon.INSTANCE.finalizingObject == null ||
274                   FinalizerDaemon.INSTANCE.finalizingStartedNanos != startTime;
275        }

关键代码sleepFor(startTime, MAX_FINALIZE_NANOS); 

其中MAX_FINALIZE_NANOS就是我们之前说的那个超时时间

线程尝试睡眠MAX_FINALIZE_NANOS这么长的时间,之后检查对象是否被正确回收;

我们在看一下,使用之前的方案,会发生什么:

反射调用FinalizerWatchdogDaemon的stop方法,FinalizerWatchdogDaemon继承自Daemon

看一下Daemon的stop方法

93        public void stop() {
94            Thread threadToStop;
95            synchronized (this) {
96                threadToStop = thread;
97                thread = null;
98            }
99            if (threadToStop == null) {
100                throw new IllegalStateException("not running");
101            }
102            threadToStop.interrupt();
103            while (true) {
104                try {
105                    threadToStop.join();
106                    return;
107                } catch (InterruptedException ignored) {
108                }
109            }
110        }

最终调用了当前线程的interrupt方法

意思是只要执行到了boolean finalized = waitForFinalization((FinalizerWatchdogDaemon中的方法),stop之后执行, 在waitForFinalization里面调用了sleepfor方法,而这是stop方法触发了interrupt,导致线程抛出异常,sleep终止,此时被回收的对象不为空,抛出timeout异常,然而正常情况下这个对象是可能被正常回收的,反而可能加剧Timeout异常;

那么此方案为什么在7.0及以上可以正常生效呢,我们在看一下代码,以7.0源码为例:

333        private Object waitForFinalization() {
334            long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
335            // Avoid remembering object being finalized, so as not to keep it alive.
336            if (!sleepFor(MAX_FINALIZE_NANOS)) {
337                // Don't report possibly spurious timeout if we are interrupted.
338                return null;
339            }
340            if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
341                // We assume that only remove() and doFinalize() may take time comparable to
342                // MAX_FINALIZE_NANOS.
343                // We observed neither the effect of the gotoSleep() nor the increment preceding a
344                // later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
345                // interval must have been followed by a wakeUp call before we checked needToWork.
346                // But then we would have seen the counter increment.  Thus there cannot have
347                // been such a remove() call.
348                // The FinalizerDaemon must not have progressed (from either the beginning or the
349                // last progressCounter increment) to either the next increment or gotoSleep()
350                // call.  Thus we must have taken essentially the whole MAX_FINALIZE_NANOS in a
351                // single doFinalize() call.  Thus it's OK to time out.  finalizingObject was set
352                // just before the counter increment, which preceded the doFinalize call.  Thus we
353                // are guaranteed to get the correct finalizing value below, unless doFinalize()
354                // just finished as we were timing out, in which case we may get null or a later
355                // one.  In this last case, we are very likely to discard it below.
356                Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
357                sleepFor(NANOS_PER_SECOND / 2);
358                // Recheck to make it even less likely we report the wrong finalizing object in
359                // the case which a very slow finalization just finished as we were timing out.
360                if (getNeedToWork()
361                        && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
362                    return finalizing;
363                }
364            }
365            return null;
366        }

sleep被打断后,return null,导致对象判断为空,所以不会抛出timeout(注:外层有捕获InterruptedException

2)使用新的hook点(借鉴腾讯方案),之前的方案做备份使用

try {
            final Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
            final Field field = clazz.getDeclaredField("INSTANCE");
            field.setAccessible(true);
            final Object watchdog = field.get(null);
            try {
                final Field thread = clazz.getSuperclass().getDeclaredField("thread");
                thread.setAccessible(true);
                thread.set(watchdog, null);
            } catch (final Throwable t) {
                Logger.trace(TAG, "stopWatchDog, set null occur error:" + t);

                t.printStackTrace();
                try {
                    // 直接调用stop方法,在Android 6.0之前会有线程安全问题
                    final Method method = clazz.getSuperclass().getDeclaredMethod("stop");
                    method.setAccessible(true);
                    method.invoke(watchdog);
                } catch (final Throwable e) {
                    Logger.trace(TAG, "stopWatchDog, stop occur error:" + t);
                    t.printStackTrace();
                }
            }
        } catch (final Throwable t) {
            Logger.trace(TAG, "stopWatchDog, get object occur error:" + t);
            t.printStackTrace();
        }

效果,消灭了timeout崩溃;但是此种方案只是保证timeout不正常上报了,不建议使用,应从根本上减少对象回收。

治本:

从根本上减少回收次数,两个方向:1.减少内存占用;2.减少内存泄漏

我们app主要使用我开发的内存检测框架Probe(目前支持检测内存使用,线程过多,文件描述符过多导致的java层OOM崩溃,及native泄漏问题检测),优化内存占用和泄露,这个后续打算写一下Probe原理,经过几个版本迭代,内存平均使用比例降低了50%以上(由于涉及安全隐私问题,不方便截图),从源头上减少内存的回收。

原创不易,转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645,谢谢!

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值