Android 12 Watchdog(5) 案例分析集

文章托管在gitee上 Android Notes , 同步csdn

通常判断是否是发送Watchdog,可以通过以下log:

  • 查看 /data/anr 目录下是否生成 Watchdog trace 文件
  • events log中查看 watchdog event. 在Watchdog#run的方法中打印event事件
    // If we got here, that means that the system is most likely hung.
    // First collect stack traces from all threads of the system process.
    // Then kill this process so that the system will restart.
    EventLog.writeEvent(EventLogTags.WATCHDOG, subject);
m
  • system log中查看 watchdog 事件log. 在Watchdog#run的方法中打印Watchdog重启事件
    Slog.w(TAG, "*** WATCHDOG KILLING SYSTEM PROCESS: " + subject);
    WatchdogDiagnostics.diagnoseCheckers(blockedCheckers);
    Slog.w(TAG, "*** GOODBYE!");

案例一

这个案例是 system_server 发起binder调用到 surfaceflinger,而 surfaceflinger 继续提供 hwbinder 调用 hwc,最终卡在 hwc的相关函数。

查看events log中的Watchdog事件

从中可以发现,系统阻塞在android.fg等线程

03-01 23:29:48.089083  2339  2461 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim)

查看system log中的Watchdog重启事件

从log可以看出前后两次生成的trace文件,以及卡住的线程

03-01 23:29:56.357723  2339  2461 E Watchdog: First set of traces taken from /data/anr/anr_2022-03-01-23-29-14-351
03-01 23:29:56.524827  2339  2461 E Watchdog: Second set of traces taken from /data/anr/anr_2022-03-01-23-29-48-401
03-01 23:29:56.712212  2339 29320 I DropBoxManagerService: add tag=system_server_watchdog isTagEnabled=true flags=0x6
03-01 23:29:56.738777  2339  2461 W Watchdog: *** WATCHDOG KILLING SYSTEM PROCESS: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on ui thread (android.ui), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim)
03-01 23:29:56.739727  2339  2461 W Watchdog: android.fg annotated stack trace:
03-01 23:29:56.739791  2339  2461 W Watchdog:     at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16455)
03-01 23:29:56.740235  2339  2461 W Watchdog:     - waiting to lock <0x0d483326> (a com.android.server.am.ActivityManagerService)
03-01 23:29:56.740377  2339  2461 W Watchdog:     at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:276)
03-01 23:29:56.740448  2339  2461 W Watchdog:     at android.os.Handler.handleCallback(Handler.java:938)
03-01 23:29:56.740476  2339  2461 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:99)
03-01 23:29:56.740486  2339  2461 W Watchdog:     at android.os.Looper.loopOnce(Looper.java:238)
03-01 23:29:56.740496  2339  2461 W Watchdog:     at android.os.Looper.loop(Looper.java:379)
03-01 23:29:56.740506  2339  2461 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:67)
03-01 23:29:56.740516  2339  2461 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:44)
03-01 23:29:56.740750  2339  2461 W Watchdog: android.ui annotated stack trace:
03-01 23:29:56.740780  2339  2461 W Watchdog:     at com.android.server.wm.RootWindowContainer.onDisplayChanged(RootWindowContainer.java:2916)
03-01 23:29:56.740850  2339  2461 W Watchdog:     - waiting to lock <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock)
03-01 23:29:56.740882  2339  2461 W Watchdog:     at android.hardware.display.DisplayManagerGlobal$DisplayListenerDelegate.handleMessage(DisplayManagerGlobal.java:973)
03-01 23:29:56.740953  2339  2461 W Watchdog:     - locked <0x09a04c63> (a android.hardware.display.DisplayManagerGlobal$DisplayListenerDelegate)
03-01 23:29:56.740977  2339  2461 W Watchdog:     at android.os.Handler.dispatchMessage(Handler.java:106)
03-01 23:29:56.740987  2339  2461 W Watchdog:     at android.os.Looper.loopOnce(Looper.java:238)
03-01 23:29:56.740996  2339  2461 W Watchdog:     at android.os.Looper.loop(Looper.java:379)
03-01 23:29:56.741005  2339  2461 W Watchdog:     at android.os.HandlerThread.run(HandlerThread.java:67)
03-01 23:29:56.741025  2339  2461 W Watchdog:     at com.android.server.ServiceThread.run(ServiceThread.java:44)
03-01 23:29:56.741053  2339  2461 W Watchdog:     at com.android.server.UiThread.run(UiThread.java:45)
...
03-01 23:29:56.743657  2339  2461 W Watchdog: *** GOODBYE!

分析 traces_SystemServer_WDT01_03_23_29_56.357

查看Blocked的线程。可以发现main在等待thread 15

"main" prio=5 tid=1 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x72a2dc18 self=0xb4000079fc150c00
  | sysTid=2339 nice=-2 cgrp=foreground sched=0/0 handle=0x79fd7924f8
  | state=S schedstat=( 291309050514 268264950373 1674947 ) utm=15535 stm=13595 core=4 HZ=100
  | stack=0x7fdcded000-0x7fdcdef000 stackSize=8188KB
  | held mutexes=
  at com.android.server.wm.DisplayRotation.updateOrientationListener(DisplayRotation.java:1059)
  - waiting to lock <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 15
  at com.android.server.policy.PhoneWindowManager.finishedGoingToSleep(PhoneWindowManager.java:5031)
  at com.android.server.power.Notifier.lambda$handleLateInteractiveChange$3$Notifier(Notifier.java:523)

继续搜索tid=15, 从log看是进行binder通信,等待对端返回。发起的调用即 BpSurfaceComposer::setTransactionState

"android.anim" prio=5 tid=15 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x1acc09c0 self=0xb400007934711400
  | sysTid=2459 nice=-4 cgrp=top-app sched=0/0 handle=0x78d3678cb0
  | state=S schedstat=( 13497493576 7900077917 56495 ) utm=941 stm=408 core=5 HZ=100
  | stack=0x78d3575000-0x78d3577000 stackSize=1039KB
  | held mutexes=
  native: #00 pc 00000000000e802c  /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12)
  native: #01 pc 00000000000a1b48  /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
  native: #02 pc 0000000000053f54  /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+292)
  native: #03 pc 0000000000055ac4  /system/lib64/libbinder.so (android::IPCThreadState::waitForResponse(android::Parcel*, int*)+64)
  native: #04 pc 0000000000055808  /system/lib64/libbinder.so (android::IPCThreadState::transact(int, unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+224)
  native: #05 pc 000000000004ca1c  /system/lib64/libbinder.so (android::BpBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+196)
  native: #06 pc 00000000000c89d0  /system/lib64/libgui.so (android::BpSurfaceComposer::setTransactionState(android::FrameTimelineInfo const&, android::Vector<android::ComposerState> const&, android::Vector<android::DisplayState> const&, unsigned int, android::sp<android::IBinder> const&, android::InputWindowCommands const&, long, bool, android::client_cache_t const&, bool, std::__1::vector<android::ListenerCallbacks, std::__1::allocator<android::ListenerCallbacks> > const&, unsigned long)+1648)
  native: #07 pc 00000000000f5fd8  /system/lib64/libgui.so (android::SurfaceComposerClient::Transaction::apply(bool)+2500)
  at android.view.SurfaceControl.nativeApplyTransaction(Native method)
  at android.view.SurfaceControl.access$2900(SurfaceControl.java:91)
  at android.view.SurfaceControl$GlobalTransactionWrapper.applyGlobalTransaction(SurfaceControl.java:3747)
  at android.view.SurfaceControl.closeTransaction(SurfaceControl.java:1726)
  - locked <0x0818e060> (a java.lang.Class<android.view.SurfaceControl>)
  at com.android.server.wm.WindowManagerService.closeSurfaceTransaction(WindowManagerService.java:1200)
  at com.android.server.wm.RootWindowContainer.performSurfacePlacementNoTrace(RootWindowContainer.java:876)
  at com.android.server.wm.RootWindowContainer.performSurfacePlacement(RootWindowContainer.java:825)
  at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacementLoop(WindowSurfacePlacer.java:177)
  at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacement(WindowSurfacePlacer.java:126)
  at com.android.server.wm.WindowSurfacePlacer.performSurfacePlacement(WindowSurfacePlacer.java:115)
  at com.android.server.wm.WindowSurfacePlacer$Traverser.run(WindowSurfacePlacer.java:57)
  - locked <0x033bcac7> (a com.android.server.wm.WindowManagerGlobalLock)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:238)
  at android.os.Looper.loop(Looper.java:379)
  at android.os.HandlerThread.run(HandlerThread.java:67)
  at com.android.server.ServiceThread.run(ServiceThread.java:44)

通过BpSurfaceComposer::setTransactionState可以知道,对端应该是SurfaceFlinger了。那么怎么确定对端是哪个线程在处理? 通过sysTid=2459 可以知道它在系统的线程id。 接下来,可以在binder通信的log搜索2459:

从log看,发起端是 2339进程的2459线程,接收端是 1205进程的2112线程

// 发起端 system_server
proc 2339
context binder
  ...
  thread 2459: l 10 need_return 0 tr 0
    outgoing transaction 365209941: 0000000000000000 from 2339:2459 to 1205:2112 code 8 flags 10 pri 0:116 r1
    transaction complete

// 接收端 SurfaceFlinger
proc 1205
context binder
  ...
  thread 2112: l 01 need_return 1 tr 0
    incoming transaction 365209941: 0000000000000000 from 2339:2459 to 1205:2112 code 8 flags 10 pri 0:116 r1 node 2483 size 1676:48 data 0000000000000000

接下来看 1205进程的2112线程的调用栈, 从log看,SurfaceFlinger::setTransactionState 又发起了一个 hwbinder 调用 BpHwDisplayConfig::perform

"Binder:1205_3" sysTid=2112
    #00 pc 00000000000e802c  /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
    #01 pc 00000000000a1b48  /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
    #02 pc 0000000000066b40  /system/lib64/libhidlbase.so (android::hardware::IPCThreadState::transact(int, unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int)+1540) (BuildId: c608f6971797e3ae95b93820aa139249)
    #03 pc 00000000000663dc  /system/lib64/libhidlbase.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+80) (BuildId: c608f6971797e3ae95b93820aa139249)
    #04 pc 0000000000016340  /system_ext/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BpHwDisplayConfig::_hidl_perform(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>) (.cfi)+608) (BuildId: a9e3ee6979c034ea920452c69d780029)
    #05 pc 0000000000015ad4  /system_ext/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BpHwDisplayConfig::perform(unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+188) (BuildId: a9e3ee6979c034ea920452c69d780029)
    #06 pc 0000000000013fc8  /system_ext/lib64/libdisplayconfig.system.qti.so (DisplayConfig::ClientImpl::CreateVirtualDisplay(unsigned int, unsigned int, int)+188) (BuildId: 7a35e6af402a03a0989fd76de91c38b4)
    #07 pc 00000000001ac040  /system/bin/surfaceflinger (android::SurfaceFlinger::setTransactionState(android::FrameTimelineInfo const&, android::Vector<android::ComposerState> const&, android::Vector<android::DisplayState> const&, unsigned int, android::sp<android::IBinder> const&, android::InputWindowCommands const&, long, bool, android::client_cache_t const&, bool, std::__1::vector<android::ListenerCallbacks, std::__1::allocator<android::ListenerCallbacks> > const&, unsigned long)+1184) (BuildId: f0bd600747bdf092064c4802e154f3ef)
    #08 pc 00000000000c727c  /system/lib64/libgui.so (android::BnSurfaceComposer::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+14572) (BuildId: 843237695aa14d88272bc2dcc327f092)
    #09 pc 00000000001b9f3c  /system/bin/surfaceflinger (android::SurfaceFlinger::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+156) (BuildId: f0bd600747bdf092064c4802e154f3ef)
    #10 pc 000000000004a880  /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+252) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
    #11 pc 0000000000055134  /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+3484) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
    #12 pc 00000000000542cc  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+164) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
    #13 pc 00000000000554b0  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
    #14 pc 0000000000081e18  /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28) (BuildId: a82e3bba4ac6144031ff6985fc101a95)
    #15 pc 000000000001358c  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+264) (BuildId: 6fc8f24a666a9723e0829fb53721b425)

接下来继续查看对端,根据线程sysTid=2112 在binder log搜索2112确定调用链:

// 发起端 SurfaceFlinger
proc 1205
context hwbinder
  ...
  thread 2112: l 10 need_return 0 tr 0
    outgoing transaction 365210005: 0000000000000000 from 1205:2112 to 1124:1458 code 2 flags 10 pri 1:98 r1

// 服务端 hwc
proc 1124
context hwbinder
  ...
  thread 1458: l 02 need_return 1 tr 0
    incoming transaction 365210005: 0000000000000000 from 1205:2112 to 1124:1458 code 2 flags 10 pri 1:98 r1 node 1349 size 216:24 data 0000000000000000

接下来看 1124:1458,从堆栈看出是卡住在vendor.qti.hardware.display.composer-service的 HWCSession::WaitForResources。到此,已经定位到了问题点,将问题交由hwc继续分析。最后问题貌似是高通方面提供了patch解决。

"HwBinder:1124_1" sysTid=1458
    #00 pc 00000000000940f0  /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
    #01 pc 00000000000989bc  /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
    #02 pc 00000000000fc02c  /apex/com.android.runtime/lib64/bionic/libc.so (pthread_cond_wait+84) (BuildId: fb76ba9523d8aaae2d41a1fbd415c1dc)
    #03 pc 0000000000056ca0  /apex/com.android.vndk.v31/lib64/libc++.so (std::__1::condition_variable::wait(std::__1::unique_lock<std::__1::mutex>&)+24) (BuildId: 3800ebf760648abf274c4f36afc38008)
    #04 pc 00000000000962e4  /vendor/bin/hw/vendor.qti.hardware.display.composer-service (sdm::HWCSession::WaitForResources(bool, unsigned long, unsigned long)+1096) (BuildId: 0ea515b02e255ad4a6681b5df410666b)
    #05 pc 00000000000a3e44  /vendor/bin/hw/vendor.qti.hardware.display.composer-service (sdm::HWCSession::DisplayConfigImpl::CreateVirtualDisplay(unsigned int, unsigned int, int)+340) (BuildId: 0ea515b02e255ad4a6681b5df410666b)
    #06 pc 000000000001c280  /vendor/lib64/libdisplayconfig.qti.so (DisplayConfig::DeviceImpl::DeviceClientContext::ParseCreateVirtualDisplay(android::hardware::hidl_vec<unsigned char> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+88) (BuildId: 14bf4b7d6c0063e6da991a1e3abdedff)
    #07 pc 00000000000182ac  /vendor/lib64/libdisplayconfig.qti.so (DisplayConfig::DeviceImpl::perform(unsigned long, unsigned int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&, std::__1::function<void (int, android::hardware::hidl_vec<unsigned char> const&, android::hardware::hidl_vec<android::hardware::hidl_handle> const&)>)+8824) (BuildId: 14bf4b7d6c0063e6da991a1e3abdedff)
    #08 pc 00000000000132d0  /vendor/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BnHwDisplayConfig::_hidl_perform(android::hidl::base::V1_0::BnHwBase*, android::hardware::Parcel const&, android::hardware::Parcel*, std::__1::function<void (android::hardware::Parcel&)>) (.cfi)+724) (BuildId: d203a85715a76890e73a85b38b115ec5)
    #09 pc 0000000000012280  /vendor/lib64/vendor.display.config@2.0.so (vendor::display::config::V2_0::BnHwDisplayConfig::onTransact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+336) (BuildId: d203a85715a76890e73a85b38b115ec5)
    #10 pc 000000000006d44c  /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::BHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+96) (BuildId: 54569e58303a962f81df8fe06e9274cb)
    #11 pc 000000000006b9c0  /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::IPCThreadState::getAndExecuteCommand()+1112) (BuildId: 54569e58303a962f81df8fe06e9274cb)
    #12 pc 000000000006b484  /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::IPCThreadState::joinThreadPool(bool)+100) (BuildId: 54569e58303a962f81df8fe06e9274cb)
    #13 pc 000000000006b408  /apex/com.android.vndk.v31/lib64/libhidlbase.so (android::hardware::PoolThread::threadLoop()+28) (BuildId: 54569e58303a962f81df8fe06e9274cb)
    #14 pc 000000000001358c  /apex/com.android.vndk.v31/lib64/libutils.so (android::Thread::_threadLoop(void*)+264) (BuildId: 3246d7568b7735dff37d41c19433144c)

案例二

此案例是sensor hal被异常冻结,导致hwbinder通信卡住,导致的Watchdog

查看events log中的Watchdog事件

从中可以发现发生Watchdog,系统阻塞在android.fg等线程

03-05 01:31:13.268348  2144  2446 I watchdog: Blocked in monitor com.android.server.am.ActivityManagerService on foreground thread (android.fg), Blocked in handler on main thread (main), Blocked in handler on display thread (android.display), Blocked in handler on animation thread (android.anim), Blocked in handler on ActivityManager (ActivityManager)

查看system log中的Watchdog事件

查看system log发现Watchdog并没有kill system, 因为是在跑monkey的原因。

03-05 01:30:36.694976  2144  2446 I Watchdog: WAITED_HALF
03-05 01:31:19.282718  2144  2446 I Watchdog: Collecting Binder Transaction Status Information
03-05 01:31:19.282889  2144  2446 I Watchdog: Collecting Binder state file from binderfs
03-05 01:31:24.533333  2144  2446 E Watchdog: First set of traces taken from /data/anr/anr_2022-03-05-01-30-36-809
03-05 01:31:24.719023  2144  2446 E Watchdog: Second set of traces taken from /data/anr/anr_2022-03-05-01-31-13-619
03-05 01:31:24.946323  2144  2446 I Watchdog: Reporting stuck state to activity controller

分析 traces_SystemServer_WDT04_03_17_31_24.533_pid2144

查看blocked 的 android.fg 线程调用栈,发现HandlerChecker在执行monitor时等锁,而锁被thread 53持有。

"android.fg" prio=5 tid=13 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d407e0 self=0xb4000078f80f6800
  | sysTid=2438 nice=0 cgrp=foreground sched=0/0 handle=0x786f585cb0
  | state=S schedstat=( 302218682432 246835604021 994874 ) utm=19112 stm=11109 core=3 HZ=100
  | stack=0x786f482000-0x786f484000 stackSize=1039KB
  | held mutexes=
  at com.android.server.am.ActivityManagerService.monitor(ActivityManagerService.java:16455)
  - waiting to lock <0x0fec94e0> (a com.android.server.am.ActivityManagerService) held by thread 53
  at com.android.server.Watchdog$HandlerChecker.run(Watchdog.java:276)
  at android.os.Handler.handleCallback(Handler.java:938)
  at android.os.Handler.dispatchMessage(Handler.java:99)
  at android.os.Looper.loopOnce(Looper.java:238)
  at android.os.Looper.loop(Looper.java:379)
  at android.os.HandlerThread.run(HandlerThread.java:67)
  at com.android.server.ServiceThread.run(ServiceThread.java:44)

搜索 system_server(pid2144) tid=53 线程,它持有AMS锁,但也在等另外一个锁WindowManagerGlobalLock对象,该锁被thread 231持有.

"Binder:2144_D" prio=5 tid=53 Blocked
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d5f858 self=0xb400007834b6b000
  | sysTid=4973 nice=-2 cgrp=foreground sched=0/0 handle=0x7867a8ccb0
  | state=S schedstat=( 1116118050555 488843391069 8552582 ) utm=71855 stm=39756 core=0 HZ=100
  | stack=0x7867995000-0x7867997000 stackSize=991KB
  | held mutexes=
  at com.android.server.wm.ActivityTaskManagerService$LocalService.compatibilityInfoForPackage(ActivityTaskManagerService.java:6227)
  - waiting to lock <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock) held by thread 231
  at com.android.server.am.ActivityManagerService.compatibilityInfoForPackage(ActivityManagerService.java:3044)
  at com.android.server.am.ActiveServices.realStartServiceLocked(ActiveServices.java:4384)
  at com.android.server.am.ActiveServices.bringUpServiceLocked(ActiveServices.java:4247)
  at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1284)
  at com.android.server.am.ActiveServices.startServiceInnerLocked(ActiveServices.java:1073)
  at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:971)
  at com.android.server.am.ActiveServices.startServiceLocked(ActiveServices.java:703)
  at com.android.server.am.ActivityManagerService.startService(ActivityManagerService.java:12968)
  - locked <0x0fec94e0> (a com.android.server.am.ActivityManagerService)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:2749)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2765)
  at android.os.Binder.execTransactInternal(Binder.java:1198)
  at android.os.Binder.execTransact(Binder.java:1157)

搜索 system_server(pid2144) tid=231 线程, 它在执行SensorService::disable时在等锁(MutexLockWithTimeout),因此需要找到是哪个线程正在执行. 我们可以搜索/system/lib64/libsensorservice.so,查看此服务哪个线程在持锁执行任务.

"Binder:2144_14" prio=5 tid=231 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d60698 self=0xb40000783c268800
  | sysTid=8793 nice=-10 cgrp=foreground sched=0/0 handle=0x786746ccb0
  | state=S schedstat=( 1132724004197 492411705656 8727320 ) utm=72806 stm=40465 core=5 HZ=100
  | stack=0x7867375000-0x7867377000 stackSize=991KB
  | held mutexes=
  native: #00 pc 00000000000940f0  /apex/com.android.runtime/lib64/bionic/libc.so (syscall+32)
  native: #01 pc 00000000000989bc  /apex/com.android.runtime/lib64/bionic/libc.so (__futex_wait_ex(void volatile*, bool, int, bool, timespec const*)+148)
  native: #02 pc 00000000000fe04c  /apex/com.android.runtime/lib64/bionic/libc.so (NonPI::MutexLockWithTimeout(pthread_mutex_internal_t*, bool, timespec const*)+224)
  native: #03 pc 00000000000366b0  /system/lib64/libsensorservice.so (android::SensorService::disable(android::sp<android::SensorService::SensorEventConnection> const&, int)+68)
  native: #04 pc 00000000000287b0  /system/lib64/libsensorservice.so (android::SensorService::SensorEventConnection::enableDisable(int, bool, long, long, int)+320)
  native: #05 pc 00000000001ab8f8  /system/lib64/libandroid_runtime.so ((anonymous namespace)::nativeDisableSensor(_JNIEnv*, _jclass*, long, int)+120)
  at android.hardware.SystemSensorManager$BaseEventQueue.nativeDisableSensor(Native method)
  at android.hardware.SystemSensorManager$BaseEventQueue.disableSensor(SystemSensorManager.java:823)
  at android.hardware.SystemSensorManager$BaseEventQueue.removeAllSensors(SystemSensorManager.java:743)
  at android.hardware.SystemSensorManager.unregisterListenerImpl(SystemSensorManager.java:242)
  - locked <0x0be8257d> (a java.util.HashMap)
  at android.hardware.SensorManager.unregisterListener(SensorManager.java:679)
  at com.android.server.wm.WindowOrientationListener.disable(WindowOrientationListener.java:209)
  - locked <0x08f23c72> (a java.lang.Object)
  at com.android.server.wm.DisplayRotation$OrientationListener.disable(DisplayRotation.java:1745)
  at com.android.server.wm.DisplayRotation.updateOrientationListenerLw(DisplayRotation.java:1146)
  at com.android.server.wm.DisplayRotation.updateOrientationListener(DisplayRotation.java:1060)
  - locked <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock)
  at com.android.server.wm.DisplayRotation.noteUpdateOrientationFromApp(DisplayRotation.java:1856)
  at com.android.server.wm.RootWindowContainer.ensureVisibilityAndConfig(RootWindowContainer.java:1880)
  at com.android.server.wm.Task.resumeTopActivityInnerLocked(Task.java:6745)
  at com.android.server.wm.Task.resumeTopActivityUncheckedLocked(Task.java:6359)
  at com.android.server.wm.Task.resumeTopActivityUncheckedLocked(Task.java:6372)
  at com.android.server.wm.RootWindowContainer.resumeFocusedTasksTopActivities(RootWindowContainer.java:2660)
  at com.android.server.wm.ActivityStarter.resumeTargetRootTaskIfNeeded(ActivityStarter.java:3123)
  at com.android.server.wm.ActivityStarter.recycleTask(ActivityStarter.java:2399)
  at com.android.server.wm.ActivityStarter.startActivityInner(ActivityStarter.java:2026)
  at com.android.server.wm.ActivityStarter.startActivityUnchecked(ActivityStarter.java:1833)
  at com.android.server.wm.ActivityStarter.executeRequest(ActivityStarter.java:1433)
  at com.android.server.wm.ActivityStarter.execute(ActivityStarter.java:829)
  - locked <0x0a6587e6> (a com.android.server.wm.WindowManagerGlobalLock)
  at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1510)
  at com.android.server.wm.ActivityTaskManagerService.startActivityAsUser(ActivityTaskManagerService.java:1435)
  at com.android.server.am.ActivityManagerService.startActivityAsUserWithFeature(ActivityManagerService.java:3200)
  at android.app.IActivityManager$Stub.onTransact(IActivityManager.java:4294)
  at com.android.server.am.ActivityManagerService.onTransact(ActivityManagerService.java:2765)
  at android.os.Binder.execTransactInternal(Binder.java:1198)
  at android.os.Binder.execTransact(Binder.java:1157)

通过搜索发现SensorService的其他线程都在等锁,只有下面这个线程在进行hwbinder通信,其对端是sensor hal(android.hardware.sensors@2.0)

"Binder:2144_A" prio=5 tid=127 Native
  | group="main" sCount=1 ucsCount=0 flags=1 obj=0x13d5c810 self=0xb400007870095400
  | sysTid=3855 nice=0 cgrp=foreground sched=0/0 handle=0x77e63cccb0
  | state=S schedstat=( 1189398409434 521393263938 9255328 ) utm=76615 stm=42324 core=1 HZ=100
  | stack=0x77e62d5000-0x77e62d7000 stackSize=991KB
  | held mutexes=
  native: #00 pc 00000000000e802c  /apex/com.android.runtime/lib64/bionic/libc.so (__ioctl+12)
  native: #01 pc 00000000000a1b48  /apex/com.android.runtime/lib64/bionic/libc.so (ioctl+156)
  native: #02 pc 00000000000714d4  /system/lib64/libhidlbase.so (android::hardware::IPCThreadState::transact(int, unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int)+424)
  native: #03 pc 00000000000711cc  /system/lib64/libhidlbase.so (android::hardware::BpHwBinder::transact(unsigned int, android::hardware::Parcel const&, android::hardware::Parcel*, unsigned int, std::__1::function<void (android::hardware::Parcel&)>)+80)
  native: #04 pc 000000000002245c  /system/lib64/android.hardware.sensors@2.0.so (android::hardware::sensors::V2_0::BpHwSensors::_hidl_activate(android::hardware::IInterface*, android::hardware::details::HidlInstrumentor*, int, bool) (.cfi)+408)
  native: #05 pc 000000000001aef4  /system/lib64/libsensorservice.so (android::SensorDevice::activateLocked(void*, int, int)+1144)
  native: #06 pc 000000000001c67c  /system/lib64/libsensorservice.so (android::SensorDevice::activate(void*, int, int)+76)
  native: #07 pc 0000000000036730  /system/lib64/libsensorservice.so (android::SensorService::disable(android::sp<android::SensorService::SensorEventConnection> const&, int)+196)
  native: #08 pc 00000000000287b0  /system/lib64/libsensorservice.so (android::SensorService::SensorEventConnection::enableDisable(int, bool, long, long, int)+320)
  native: #09 pc 000000000000ccd0  /system/lib64/libsensor.so (android::BnSensorEventConnection::onTransact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+460)
  native: #10 pc 000000000004a880  /system/lib64/libbinder.so (android::BBinder::transact(unsigned int, android::Parcel const&, android::Parcel*, unsigned int)+252)
  native: #11 pc 00000000000551c4  /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+3628)
  native: #12 pc 00000000000542cc  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+164)
  native: #13 pc 000000000005553c  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+72)
  native: #14 pc 0000000000081ea4  /system/lib64/libbinder.so (android::PoolThread::threadLoop()+28)
  native: #15 pc 000000000001358c  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+264)
  native: #16 pc 00000000000bf47c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+148)
  native: #17 pc 0000000000012de8  /system/lib64/libutils.so (thread_data_t::trampoline(thread_data_t const*)+408)
  native: #18 pc 00000000000fcebc  /apex/com.android.runtime/lib64/bionic/libc.so (__pthread_start(void*)+400)
  native: #19 pc 00000000000994bc  /apex/com.android.runtime/lib64/bionic/libc.so (__start_thread+68)

通过 sysTid=3855 在binder transaction log可以知道其通信对端是进程1137,线程号0(也就是没有线程处理),通常这种情况是因为该进程状态异常(比如进程已死或者处于D状态)或者binder线程没有空闲线程

proc 2144
context hwbinder
  thread 3855: l 10 need_return 0 tr 0
    outgoing transaction 625975487: 00000000777dc692 from 2144:3855 to 1137:0 code 3 flags 10 pri 0:120 r1

proc 1137
context hwbinder
  thread 1137: l 12 need_return 0 tr 0
    pending transaction 625975487: 00000000777dc692 from 2144:3855 to 1137:0 code 3 flags 10 pri 0:120 r1 node 7558 size 48:0 data 000000006f380266

接下来使用ps查看sensor hal进程,可以发现进程1137处于D状态:

# ps-A | grep 1137
USER    PID     PPID   VSZ      RSS       WCHAN           ADDR   S   NAME
system  1137    1      2911284  6680     __refrigerator   0      D   android.hardware.sensors@2.1-service.mu1tihal

查看其cgroup如下,此进程应该是处于冻结状态

# cat /proc/1137/cgroup
5:freezer:/bg
4:memory:/
3:cpuset:/system-background
2:cpu:/
1:blkio:/
0::/uid_1000/pid_1137

# cat/proc/1137/stat
1137(sensors@2.1-ser) D 1 1137 0 0 -1 4260096 173126 5796 220 0 18273 21477 13 25 20 0 36 0 710 2981154816 1582 1844674 4073709551615 407914143744 407914214216 549746513600 0 0 0 2048 0 1073777912 00017 200400 407914227088 407914227 088 408491892736 549746516604 549746516665 549746516665 549746519995 0

此案与冻结实现同事沟通,原因是复用时刚好复用了一个之前解冻时没有被成功解冻的进程号,导致出现hwbinder通信卡住,此问题需要优化冻结相关的逻辑来解决此问题。

  • 1
    点赞
  • 10
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值