原创不易,转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645,谢谢!
常见的一种Android崩溃Timeout崩溃如下:
形如:java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds
java.util.concurrent.TimeoutException: android.os.BinderProxy.finalize() timed out after 120 seconds
at android.os.BinderProxy.destroy(Native Method)
at android.os.BinderProxy.finalize(Binder.java:548)
at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:191)
at java.lang.Daemons$FinalizerDaemon.run(Daemons.java:174)
at java.lang.Thread.run(Thread.java:818)
分析:
Android ART虚拟机中GC相关存在5个守护线程
这五个守护线程分别是:
1. ReferenceQueueDaemon:引用队列守护线程。我们知道,在创建引用对象的时候,可以关联一个队列。当被引用对象引用的对象被GC回收的时候,被引用对象就会被加入到其创建时关联的队列去。这个加入队列的操作就是由ReferenceQueueDaemon守护线程来完成的。这样应用程序就可以知道那些被引用对象引用的对象已经被回收了。
2. FinalizerDaemon:析构守护线程。对于重写了成员函数finalize的对象,它们被GC决定回收时,并没有马上被回收,而是被放入到一个队列中,等待FinalizerDaemon守护线程去调用它们的成员函数finalize,然后再被回收。
3. FinalizerWatchdogDaemon:析构监护守护线程。用来监控FinalizerDaemon线程的执行。一旦检测那些重定了成员函数finalize的对象在执行成员函数finalize时超出一定的时候,那么就会退出VM。
4. HeapTrimmerDaemon:堆裁剪守护线程。用来执行裁剪堆的操作,也就是用来将那些空闲的堆内存归还给系统。
5. GCDaemon:并行GC线程。用来执行并行GC。
其中,和这个crash相关的是FinalizerDaemon和FinalizerWatchdogDaemon。当finallize时候,会FinalizerDaemon守护线程去调用它们的成员函数finalize,同时watchdog线程去监视GC的过程。超过10秒则抛出异常。
正常情况下,不会出现这个crash。但是当系统锁屏后,CPU进入休眠状态,刚好处于GC finallize执行时候中,由于系统休眠而被暂时挂起,此时watchdog线程发现超过了10秒,抛出异常引发crash。
备注:10s由rom这个字段定义
private static final long MAX_FINALIZE_NANOS = 10L * NANOS_PER_SECOND;
反编译三方rom可以看到,有些手机rom更改了这个静态常量值到120,300,所以crash平台有些上报timed out after 120 seconds等这样的情况
理论上如果可以把这个变量更改到无限大,也可避免timeout,但是由于是静态常量,做了内联,无法做到。
处理方向:
1.治标:
思路:FinalizerWatchdogDaemon只是用来监测回收过程,我们直接停掉它
1)下面是某个公司平台部提供的方案:
public class GcHacker {
private static final String TAG = GcHacker.class.getSimpleName();
private GcHacker() {}
public static void stopWatchdog() {
if (Build.VERSION.SDK_INT > 19) {
return;
}
try {
Class watchdogCls = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
Field instanceField = watchdogCls.getDeclaredField("INSTANCE");
instanceField.setAccessible(true);
Object instance = instanceField.get(null);
Class daemonCls = Class.forName("java.lang.Daemons$Daemon");
Method stopMethod = daemonCls.getDeclaredMethod("stop");
stopMethod.setAccessible(true);
Field threadField = daemonCls.getDeclaredField("thread");
threadField.setAccessible(true);
Object thread = threadField.get(instance);
Method isRunningMethod = daemonCls.getDeclaredMethod("isRunning");
isRunningMethod.setAccessible(true);
if ((boolean) isRunningMethod.invoke(instance)) {
stopMethod.invoke(instance);
threadField.set(instance, thread);
}
} catch (Throwable t) {
LogUtils.e(TAG, Log.getStackTraceString(t));
}
}
}
使用后效果:
timeout异常有所减少,但是仍有一定量的上报
主要集中在Android5.0,5.1.1和6.0.1等系统上;
下面分析一下为什么这种方案有问题:
以5.1.1版本FinalizerWatchdogDaemon为例:
207 private static class FinalizerWatchdogDaemon extends Daemon {
208 private static final FinalizerWatchdogDaemon INSTANCE = new FinalizerWatchdogDaemon();
209
210 @Override public void run() {
211 while (isRunning()) {
212 boolean waitSuccessful = waitForObject();
213 if (waitSuccessful == false) {
214 // We have been interrupted, need to see if this daemon has been stopped.
215 continue;
216 }
217 boolean finalized = waitForFinalization();
218 if (!finalized && !VMRuntime.getRuntime().isDebuggerActive()) {
219 Object finalizedObject = FinalizerDaemon.INSTANCE.finalizingObject;
220 // At this point we probably timed out, look at the object in case the finalize
221 // just finished.
222 if (finalizedObject != null) {
223 finalizerTimedOut(finalizedObject);
224 break;
225 }
226 }
227 }
228 }
关键方法:boolean finalized = waitForFinalization();
我们看一下里面:
267 private boolean waitForFinalization() {
268 long startTime = FinalizerDaemon.INSTANCE.finalizingStartedNanos;
269 sleepFor(startTime, MAX_FINALIZE_NANOS);
270 // If we are finalizing an object and the start time is the same, it must be that we
271 // timed out finalizing something. It may not be the same object that we started out
272 // with but this doesn't matter.
273 return FinalizerDaemon.INSTANCE.finalizingObject == null ||
274 FinalizerDaemon.INSTANCE.finalizingStartedNanos != startTime;
275 }
关键代码sleepFor(startTime, MAX_FINALIZE_NANOS);
其中MAX_FINALIZE_NANOS就是我们之前说的那个超时时间
线程尝试睡眠MAX_FINALIZE_NANOS这么长的时间,之后检查对象是否被正确回收;
我们在看一下,使用之前的方案,会发生什么:
反射调用FinalizerWatchdogDaemon的stop方法,FinalizerWatchdogDaemon继承自Daemon
看一下Daemon的stop方法
93 public void stop() {
94 Thread threadToStop;
95 synchronized (this) {
96 threadToStop = thread;
97 thread = null;
98 }
99 if (threadToStop == null) {
100 throw new IllegalStateException("not running");
101 }
102 threadToStop.interrupt();
103 while (true) {
104 try {
105 threadToStop.join();
106 return;
107 } catch (InterruptedException ignored) {
108 }
109 }
110 }
最终调用了当前线程的interrupt方法
意思是只要执行到了boolean finalized = waitForFinalization((FinalizerWatchdogDaemon中的方法),stop之后执行, 在waitForFinalization里面调用了sleepfor方法,而这是stop方法触发了interrupt,导致线程抛出异常,sleep终止,此时被回收的对象不为空,抛出timeout异常,然而正常情况下这个对象是可能被正常回收的,反而可能加剧Timeout异常;
那么此方案为什么在7.0及以上可以正常生效呢,我们在看一下代码,以7.0源码为例:
333 private Object waitForFinalization() {
334 long startCount = FinalizerDaemon.INSTANCE.progressCounter.get();
335 // Avoid remembering object being finalized, so as not to keep it alive.
336 if (!sleepFor(MAX_FINALIZE_NANOS)) {
337 // Don't report possibly spurious timeout if we are interrupted.
338 return null;
339 }
340 if (getNeedToWork() && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
341 // We assume that only remove() and doFinalize() may take time comparable to
342 // MAX_FINALIZE_NANOS.
343 // We observed neither the effect of the gotoSleep() nor the increment preceding a
344 // later wakeUp. Any remove() call by the FinalizerDaemon during our sleep
345 // interval must have been followed by a wakeUp call before we checked needToWork.
346 // But then we would have seen the counter increment. Thus there cannot have
347 // been such a remove() call.
348 // The FinalizerDaemon must not have progressed (from either the beginning or the
349 // last progressCounter increment) to either the next increment or gotoSleep()
350 // call. Thus we must have taken essentially the whole MAX_FINALIZE_NANOS in a
351 // single doFinalize() call. Thus it's OK to time out. finalizingObject was set
352 // just before the counter increment, which preceded the doFinalize call. Thus we
353 // are guaranteed to get the correct finalizing value below, unless doFinalize()
354 // just finished as we were timing out, in which case we may get null or a later
355 // one. In this last case, we are very likely to discard it below.
356 Object finalizing = FinalizerDaemon.INSTANCE.finalizingObject;
357 sleepFor(NANOS_PER_SECOND / 2);
358 // Recheck to make it even less likely we report the wrong finalizing object in
359 // the case which a very slow finalization just finished as we were timing out.
360 if (getNeedToWork()
361 && FinalizerDaemon.INSTANCE.progressCounter.get() == startCount) {
362 return finalizing;
363 }
364 }
365 return null;
366 }
sleep被打断后,return null,导致对象判断为空,所以不会抛出timeout(注:外层有捕获InterruptedException)
2)使用新的hook点(借鉴腾讯方案),之前的方案做备份使用
try {
final Class clazz = Class.forName("java.lang.Daemons$FinalizerWatchdogDaemon");
final Field field = clazz.getDeclaredField("INSTANCE");
field.setAccessible(true);
final Object watchdog = field.get(null);
try {
final Field thread = clazz.getSuperclass().getDeclaredField("thread");
thread.setAccessible(true);
thread.set(watchdog, null);
} catch (final Throwable t) {
Logger.trace(TAG, "stopWatchDog, set null occur error:" + t);
t.printStackTrace();
try {
// 直接调用stop方法,在Android 6.0之前会有线程安全问题
final Method method = clazz.getSuperclass().getDeclaredMethod("stop");
method.setAccessible(true);
method.invoke(watchdog);
} catch (final Throwable e) {
Logger.trace(TAG, "stopWatchDog, stop occur error:" + t);
t.printStackTrace();
}
}
} catch (final Throwable t) {
Logger.trace(TAG, "stopWatchDog, get object occur error:" + t);
t.printStackTrace();
}
效果,消灭了timeout崩溃;但是此种方案只是保证timeout不正常上报了,不建议使用,应从根本上减少对象回收。
治本:
从根本上减少回收次数,两个方向:1.减少内存占用;2.减少内存泄漏
我们app主要使用我开发的内存检测框架Probe(目前支持检测内存使用,线程过多,文件描述符过多导致的java层OOM崩溃,及native泄漏问题检测),优化内存占用和泄露,这个后续打算写一下Probe原理,经过几个版本迭代,内存平均使用比例降低了50%以上(由于涉及安全隐私问题,不方便截图),从源头上减少内存的回收。
原创不易,转载请注明出处https://blog.csdn.net/shihongyu12345/article/details/89682645,谢谢!