Android TimeoutException治理

最新推荐文章于 2024-05-16 15:59:54 发布

shihongyu12345

最新推荐文章于 2024-05-16 15:59:54 发布

阅读量775

点赞数 1

分类专栏： Android 文章标签： Android TimeOut Exception

本文链接：https://blog.csdn.net/shihongyu12345/article/details/89682645

版权

Android 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

原创不易，转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645，谢谢！

常见的一种Android崩溃Timeout崩溃如下：

形如：java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds

java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds
    at android.os.BinderProxy.destroy(Native Method)
    at android.os.BinderProxy.finalize(Binder.java:548)
    at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:191)
    at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:174)
    at java.lang.Thread.run(Thread.java:818)

分析：

Android ART虚拟机中GC相关存在5个守护线程

这五个守护线程分别是：

1. ReferenceQueueDaemon：引用队列守护线程。我们知道，在创建引用对象的时候，可以关联一个队列。当被引用对象引用的对象被GC回收的时候，被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道那些被引用对象引用的对象已经被回收了。

2. FinalizerDaemon：析构守护线程。对于重写了成员函数finalize的对象，它们被GC决定回收时，并没有马上被回收，而是被放入到一个队列中，等待FinalizerDaemon守护线程去调用它们的成员函数finalize，然后再被回收。

3. FinalizerWatchdogDaemon：析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重定了成员函数finalize的对象在执行成员函数finalize时超出一定的时候，那么就会退出VM。

4. HeapTrimmerDaemon：堆裁剪守护线程。用来执行裁剪堆的操作，也就是用来将那些空闲的堆内存归还给系统。

5. GCDaemon：并行GC线程。用来执行并行GC。

其中，和这个crash相关的是FinalizerDaemon和FinalizerWatchdogDaemon。当finallize时候，会FinalizerDaemon守护线程去调用它们的成员函数finalize，同时watchdog线程去监视GC的过程。超过10秒则抛出异常。

正常情况下，不会出现这个crash。但是当系统锁屏后，CPU进入休眠状态，刚好处于GC finallize执行时候中，由于系统休眠而被暂时挂起，此时watchdog线程发现超过了10秒，抛出异常引发crash。

见 http://stackoverflow.com/questions/24021609/how-to-handle-java-util-concurrent-timeoutexception-android-os-binderproxy-fin

备注：10s由rom这个字段定义

private static final long MAX_FINALIZE_NANOS = 10L * NANOS_PER_SECOND;

反编译三方rom可以看到，有些手机rom更改了这个静态常量值到120，300，所以crash平台有些上报timed out after 120 seconds等这样的情况

理论上如果可以把这个变量更改到无限大，也可避免timeout，但是由于是静态常量，做了内联，无法做到。

处理方向：

1.治标：

思路：FinalizerWatchdogDaemon只是用来监测回收过程，我们直接停掉它

1）下面是某个公司平台部提供的方案：

public class GcHacker {

    private static final String TAG = GcHacker.class.getSimpleName();

    private GcHacker() {}

    public static void stopWatchdog() {

        if (Build.VERSION.SDK_INT > 19) {

            return;

        }

        try {

            Class watchdogCls = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");

            Field instanceField = watchdogCls.getDeclaredField("INSTANCE");

            instanceField.setAccessible(true);

            Object instance = instanceField.get(null);

            Class daemonCls = Class.forName("java.lang.Daemons$Daemon");

            Method stopMethod = daemonCls.getDeclaredMethod("stop");

            stopMethod.setAccessible(true);

            Field threadField = daemonCls.getDeclaredField("thread");

            threadField.setAccessible(true);

            Object thread = threadField.get(instance);

            Method isRunningMethod = daemonCls.getDeclaredMethod("isRunning");

            isRunningMethod.setAccessible(true);

            if ((boolean) isRunningMethod.invoke(instance)) {

                stopMethod.invoke(instance);

                threadField.set(instance, thread);

            }

        } catch (Throwable t) {

            LogUtils.e(TAG, Log.getStackTraceString(t));

        }

    }

}

使用后效果：

timeout异常有所减少，但是仍有一定量的上报

主要集中在Android5.0，5.1.1和6.0.1等系统上；

下面分析一下为什么这种方案有问题：

以5.1.1版本FinalizerWatchdogDaemon为例：

207    private static class FinalizerWatchdogDaemon extends Daemon {
208        private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
209
210        @Override public void run() {
211            while (isRunning()) {
212                boolean waitSuccessful = waitForObject();
213                if (waitSuccessful == false) {
214                    // We have been interrupted, need to see if this daemon has been stopped.
215                    continue;
216                }
217                boolean finalized = waitForFinalization();
218                if (!finalized && !VMRuntime.getRuntime().isDebuggerActive()) {
219                    Object finalizedObject = FinalizerDaemon.INSTANCE.finalizingObject;
220                    // At this point we probably timed out, look at the object in case the finalize
221                    // just finished.
222                    if (finalizedObject != null) {
223                        finalizerTimedOut(finalizedObject);
224                        break;
225                    }
226                }
227            }
228        }

关键方法：boolean finalized = waitForFinalization();

我们看一下里面：

267        private boolean waitForFinalization() {
268            long startTime = FinalizerDaemon.INSTANCE.finalizingStartedNanos;
269            sleepFor(startTime, MAX_FINALIZE_NANOS);
270            // If we are finalizing an object and the start time is the same, it must be that we
271            // timed out finalizing something. It may not be the same object that we started out
272            // with but this doesn't matter.
273            return FinalizerDaemon.INSTANCE.finalizingObject == null ||
274                   FinalizerDaemon.INSTANCE.finalizingStartedNanos != startTime;
275        }

关键代码sleepFor(startTime, MAX_FINALIZE_NANOS);

其中MAX_FINALIZE_NANOS就是我们之前说的那个超时时间

线程尝试睡眠MAX_FINALIZE_NANOS这么长的时间，之后检查对象是否被正确回收；

我们在看一下，使用之前的方案，会发生什么：

反射调用FinalizerWatchdogDaemon的stop方法，FinalizerWatchdogDaemon继承自Daemon

看一下Daemon的stop方法

93        public void stop() {
94            Thread threadToStop;
95            synchronized (this) {
96                threadToStop = thread;
97                thread = null;
98            }
99            if (threadToStop == null) {
100                throw new IllegalStateException("not running");
101            }
102            threadToStop.interrupt();
103            while (true) {
104                try {
105                    threadToStop.join();
106                    return;
107                } catch (InterruptedException ignored) {
108                }
109            }
110        }

最终调用了当前线程的interrupt方法

意思是只要执行到了boolean finalized = waitForFinalization((FinalizerWatchdogDaemon中的方法），stop之后执行，在waitForFinalization里面调用了sleepfor方法，而这是stop方法触发了interrupt，导致线程抛出异常，sleep终止，此时被回收的对象不为空，抛出timeout异常，然而正常情况下这个对象是可能被正常回收的，反而可能加剧Timeout异常；

那么此方案为什么在7.0及以上可以正常生效呢，我们在看一下代码，以7.0源码为例：

333        private Object waitForFinalization() {
334            long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
335            // Avoid remembering object being finalized, so as not to keep it alive.
336            if (!sleepFor(MAX_FINALIZE_NANOS)) {
337                // Don't report possibly spurious timeout if we are interrupted.
338                return null;
339            }
340            if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
341                // We assume that only remove() and doFinalize() may take time comparable to
342                // MAX_FINALIZE_NANOS.
343                // We observed neither the effect of the gotoSleep() nor the increment preceding a
344                // later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
345                // interval must have been followed by a wakeUp call before we checked needToWork.
346                // But then we would have seen the counter increment.  Thus there cannot have
347                // been such a remove() call.
348                // The FinalizerDaemon must not have progressed (from either the beginning or the
349                // last progressCounter increment) to either the next increment or gotoSleep()
350                // call.  Thus we must have taken essentially the whole MAX_FINALIZE_NANOS in a
351                // single doFinalize() call.  Thus it's OK to time out.  finalizingObject was set
352                // just before the counter increment, which preceded the doFinalize call.  Thus we
353                // are guaranteed to get the correct finalizing value below, unless doFinalize()
354                // just finished as we were timing out, in which case we may get null or a later
355                // one.  In this last case, we are very likely to discard it below.
356                Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
357                sleepFor(NANOS_PER_SECOND / 2);
358                // Recheck to make it even less likely we report the wrong finalizing object in
359                // the case which a very slow finalization just finished as we were timing out.
360                if (getNeedToWork()
361                        && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
362                    return finalizing;
363                }
364            }
365            return null;
366        }

sleep被打断后，return null，导致对象判断为空，所以不会抛出timeout（注：外层有捕获InterruptedException）

2）使用新的hook点（借鉴腾讯方案），之前的方案做备份使用

try {
            final Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
            final Field field = clazz.getDeclaredField("INSTANCE");
            field.setAccessible(true);
            final Object watchdog = field.get(null);
            try {
                final Field thread = clazz.getSuperclass().getDeclaredField("thread");
                thread.setAccessible(true);
                thread.set(watchdog, null);
            } catch (final Throwable t) {
                Logger.trace(TAG, "stopWatchDog, set null occur error:" + t);

                t.printStackTrace();
                try {
                    // 直接调用stop方法，在Android 6.0之前会有线程安全问题
                    final Method method = clazz.getSuperclass().getDeclaredMethod("stop");
                    method.setAccessible(true);
                    method.invoke(watchdog);
                } catch (final Throwable e) {
                    Logger.trace(TAG, "stopWatchDog, stop occur error:" + t);
                    t.printStackTrace();
                }
            }
        } catch (final Throwable t) {
            Logger.trace(TAG, "stopWatchDog, get object occur error:" + t);
            t.printStackTrace();
        }

效果，消灭了timeout崩溃；但是此种方案只是保证timeout不正常上报了，不建议使用，应从根本上减少对象回收。

治本：

从根本上减少回收次数，两个方向：1.减少内存占用；2.减少内存泄漏

我们app主要使用我开发的内存检测框架Probe（目前支持检测内存使用，线程过多，文件描述符过多导致的java层OOM崩溃，及native泄漏问题检测），优化内存占用和泄露，这个后续打算写一下Probe原理，经过几个版本迭代，内存平均使用比例降低了50%以上（由于涉及安全隐私问题，不方便截图），从源头上减少内存的回收。

原创不易，转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645，谢谢！

shihongyu12345

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
Android TimeoutException治理

原创不易，转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645，谢谢！常见的一种Android崩溃Timeout崩溃如下：形如：java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 1...
复制链接

扫一扫