接着上一篇WTD的介绍 ,看下实际死锁情况下,WTD的功能与改造。
最近遇见Android开机一直停留在动画界面,查看trace文件发现死锁了,简要信息如下:
"main" prio=5 tid=1 MONITOR
| group="main" sCount=1 dsCount=0 obj=0x4c20f360 self=0x71e1ade0
| sysTid=519 nice=-2 sched=0/0 cgrp=apps handle=1878216768
| state=S schedstat=( 736667963 56924727 1529 ) utm=62 stm=11 core=0
at com.android.server.am.ActivityManagerService.registerReceiver(ActivityManagerService.java:~13326)
- waiting to lock <0x4c6b2630> (a com.android.server.am.ActivityManagerService) held by tid=27 (InputDispatcher)
at android.app.ContextImpl.registerReceiverInternal(ContextImpl.java:1473)
at android.app.ContextImpl.registerReceiver(ContextImpl.java:1441)
at com.android.server.power.PowerManagerService.systemReady(PowerManagerService.java:494)
at com.android.server.ServerThread.initAndLoop(SystemServer.java:1050)
at com.android.server.SystemServer.main(SystemServer.java:1371)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:610)
at dalvik.system.NativeStart.main(Native Method)
"InputDispatcher" prio=10 tid=27 MONITOR
| group="main" sCount=1 dsCount=0 obj=0x4c9c7d60 self=0x72010e50
| sysTid=554 nice=-8 sched=0/0 cgrp=apps handle=1912287104
| state=S schedstat=( 1007065539 96683590 71214 ) utm=22 stm=78 core=0
at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManagerInternal(PowerManagerService.java:~2206)
- waiting to lock <0x4c6a8af0> (a java.lang.Object) held by tid=1 (main)
at com.android.server.power.PowerManagerService.setScreenBrightnessOverrideFromWindowManager(PowerManagerService.java:2199)
at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedInner(WindowManagerService.java:9818)
at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLockedLoop(WindowManagerService.java:8566)
at com.android.server.wm.WindowManagerService.performLayoutAndPlaceSurfacesLocked(WindowManagerService.java:8508)
at com.android.server.wm.WindowManagerService.setNewConfiguration(WindowManagerService.java:3847)
at com.android.server.am.ActivityManagerService.updateConfigurationLocked(ActivityManagerService.java:14490)
at com.android.server.am.ActivityManagerService.updateConfiguration(ActivityManagerService.java:14375)
at com.android.server.wm.WindowManagerService.sendNewConfiguration(WindowManagerService.java:6725)
at com.android.server.wm.InputMonitor.notifyConfigurationChanged(InputMonitor.java:325)
at com.android.server.input.InputManagerService.notifyConfigurationChanged(InputManagerService.java:1275)
at dalvik.system.NativeStart.run(Native Method)
trace很清楚的说明了main、InputDispatcher线程发生互相的死锁。从栈信息函数调用上可以看出两个线程都都用了AMS、PMS服务,从上一篇分析来看,AMS、PMS都是已经添加到WTD中进行检测的,为何服务发生死锁了,WTD没有检测到?
回到上一篇看一下有关AMS、PMS的启动流程,还有WTD的启动时间点,如下:
public void initAndLoop() {
try {
// Wait for installd to finished starting up so that it has a chance to
// create critical directories such as /data/user with the appropriate
// permissions. We need this to complete before we initialize other services.
Slog.i(TAG, "Waiting for installd to be ready.");
installer = new Installer();
installer.ping();
Slog.i(TAG, "Power Manager");
power = new PowerManagerService();
ServiceManager.addService(Context.POWER_SERVICE, power);
Slog.i(TAG, "Activity Manager");
context = ActivityManagerService.main(factoryTest);
} catch (RuntimeException e) {
Slog.e("System", "******************************************");
Slog.e("System", "************ Failure starting bootstrap service", e);
}
// only initialize the power service after we have started the
// lights service, content providers and the battery service.
power.init(context, lights, ActivityManagerService.self(), battery,
BatteryStatsService.getService(),
ActivityManagerService.self().getAppOpsService(), display);
Slog.i(TAG, "Init Watchdog");
Watchdog.getInstance().init(context, battery, power, alarm,
ActivityManagerService.self());
Watchdog.getInstance().addThread(wmHandler, "WindowManager thread");
try {
<span style="color:#ff0000;">power.systemReady(twilight, dreamy);</span>
} catch (Throwable e) {
reportWtf("making Power Manager Service ready", e);
}
ActivityManagerService.self().systemReady(new Runnable() {
public void run() {
<span style="color:#cc0000;">Watchdog.getInstance().start();</span>
从systemserver.java文件上可以看到WTD线程的启动是在很多service注册之后才启动的,那么如果service注册过程死锁,WTD就没法启动检测了。所以上面trace死锁问题的原因就找到了,接下来想办法如何解决这个问题。我大致觉得办法有三,如下:
一. 提前WTD的运行,即在实例化后马上运行,这样当出现上诉死锁时,WTD将能够检测到并杀死死锁线程
二. 在AMS、PMS中设置ReentrantLock互斥锁,按照trace死锁的位置,设定函数访问互斥锁,当PMS systemready函数持有锁时,setScreenBrightnessOverrideFromWindowManager不去申请锁,访问死锁
三. 服务注册过程中禁止InputManagerService.notifyConfigurationChanged,这种做法我觉得没有办法二恰当,出现这个死锁是因为系统挂着USB输入设备,USB是热插拔设备,注册时间上是不可控的,也就导致了上诉的死锁。
重点说明方法一方法,加速WTD的运行。以下patch就是提前WTD运行的思路。结合WTD源码分析,加速WTD的运行首先要考虑这样做系统的稳定性。尤其是提前的WTD的运行,是否影响后续服务的WTD使用,以及WTD在此过程中,资源的访问是否存在问题。
--- a/frameworks/base/services/java/com/android/server/SystemServer.java
+++ b/frameworks/base/services/java/com/android/server/SystemServer.java
@@ -351,7 +351,9 @@ class ServerThread {
Watchdog.getInstance().init(context, battery, power, alarm,
ActivityManagerService.self());
Watchdog.getInstance().addThread(wmHandler, "WindowManager thread");
-
+ Watchdog.getInstance().start();
+
Slog.i(TAG, "Input Manager");
@@ -1165,8 +1167,8 @@ class ServerThread {
} catch (Throwable e) {
reportWtf("making Recognition Service ready", e);
}
- Watchdog.getInstance().start();
-
+ //Watchdog.getInstance().start();
// It is now okay to let the various system services start their
// third party code...
针对以上问题综合分析,我认为这个过程存在的问题是可以避免的,只是在上诉patch的基础上,需要对watchdog.java文件进行一些额外处理。在此制作简单描述,实现起来比较简单。
1. 取消addMonitor、addThread函数接口中对线程状态的判断,否则WTD启动后不能添加监视器到WTD中
2. WTD启动后,run函数和addMonitor、addThread存在锁竞争,而run函数的执行周期很长,在系统启动过程中需要调节run函数的执行周期
按照上诉注意事项对WTD进行启动时序改造后,系统可以正常运行,WTD运行正常,我进行reboot测试一千次,暂无影响