[Android][CTS][WIFI]Android P上CtsNetTestCases测试项导致系统重启问题分析

复现步骤

执行:

cts-tf > run cts -m CtsNetTestCases

后概率复现;

初步分析

通过多次尝试,发现大多集中在如下这项测试项:

android.net.wifi.cts.WifiManagerTest#testStartLocalOnlyHotspotSingleRequestByApps

单跑此项后验证,确实有较高概率导致system_server进程崩溃,Android上层重启:

cts-tf > run cts -m CtsNetTestCases -t android.net.wifi.cts.WifiManagerTest#testStartLocalOnlyHotspotSingleRequestByApps

系统崩溃日志节选如下:

01-13 17:51:36.588  1338  1486 E AndroidRuntime: *** FATAL EXCEPTION IN SYSTEM PROCESS: android.io
01-13 17:51:36.588  1338  1486 E AndroidRuntime: java.lang.IllegalStateException: command '110 network route add local wlan1 fe80::/64' failed with '400 110 addRoute() failed (No such device)'
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NetworkManagementService.modifyRoute(NetworkManagementService.java:1176)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NetworkManagementService.addInterfaceToLocalNetwork(NetworkManagementService.java:2673)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine.configureLocalIPv6Routes(TetherInterfaceStateMachine.java:386)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine.setRaParams(TetherInterfaceStateMachine.java:446)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine.updateUpstreamIPv6LinkProperties(TetherInterfaceStateMachine.java:338)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine.access$700(TetherInterfaceStateMachine.java:67)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine$BaseServingState.processMessage(TetherInterfaceStateMachine.java:570)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.connectivity.tethering.TetherInterfaceStateMachine$LocalHotspotState.processMessage(TetherInterfaceStateMachine.java:607)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.internal.util.StateMachine$SmHandler.processMsg(StateMachine.java:992)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.internal.util.StateMachine$SmHandler.handleMessage(StateMachine.java:809)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:106)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at android.os.Looper.loop(Looper.java:193)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at android.os.HandlerThread.run(HandlerThread.java:65)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.ServiceThread.run(ServiceThread.java:44)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: Caused by: com.android.server.NativeDaemonConnector$NativeDaemonFailureException: command '110 network route add local wlan1 fe80::/64' failed with '400 110 addRoute() failed (No such device)'
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NativeDaemonConnector.executeForList(NativeDaemonConnector.java:515)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:411)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:406)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NativeDaemonConnector.execute(NativeDaemonConnector.java:391)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	at com.android.server.NetworkManagementService.modifyRoute(NetworkManagementService.java:1172)
01-13 17:51:36.588  1338  1486 E AndroidRuntime: 	... 13 more
01-13 17:51:36.589  1338  1486 I am_crash: [1338,0,system_server,-1,com.android.server.NativeDaemonConnector$NativeDaemonFailureException,command '110 network route add local wlan1 fe80::/64' failed with '400 110 addRoute() failed (No such device)',NativeDaemonConnector.java,515]

从这里可以一目了然看到异常的调用栈;

逻辑分析

这里就略去逐步分析的过程,直接上分析结果:

  • 这项测试项是测试第三方应用是否可以重复调用WifiManager.startLocalOnlyHotspot()(期望值为不能)
  • 测试会调用两次WifiManager.startLocalOnlyHotspot(),在调用第二次时会捕获framework抛出的IllegalStateException,并在之后调用WifiManager.stopLocalOnlyHotspot()将其关闭;
  • 而问题就出在了最后的WifiManager.stopLocalOnlyHotspot()

正常情况

在这里插入图片描述

  • WifiServiceImpl.stopLocalOnlyHotspot()会发出android.net.wifi.WIFI_AP_STATE_CHANGED广播,状态为WifiManager.WIFI_AP_STATE_DISABLING
  • Tethering会收到该广播,并调用TetherInterfaceStateMachine.unwanted()进行路由规则的反注册;
  • 与此同时WifiServiceImpl会调用HalDeviceManager.removeIfaceInternal()通知HAL服务将该接口销毁;
  • 虽然广播与销毁接口的调用并非线程安全,但由于调用存在先后顺序,且Tethering本身也是运行在system_server进程中,因此正常情况下时序还是如上图所示,不会出问题的;

异常情况

Tethering负责监听android.net.wifi.WIFI_AP_STATE_CHANGED广播的线程为IoThread,因此,如果次线程出现拥堵,导致广播接收出现延时,那么就会导致如下情况:
在这里插入图片描述
图中黄色部分表示异常部分,而红色则是系统崩溃的发生点:

  • 如图所示,由于WifiServiceImpl.startLocalOnlyHotspot()调用的广播送达Tethering之后,运行在IoThreadTetherInterfaceStateMachine并未在期望的时间内完成与Netd进程的交互,以至于在而后的WifiServiceImpl.stopLocalOnlyHotspot()调用后销毁接口已经完成之后,TetherInterfaceStateMachine仍还在尝试为该接口添加路由规则。从而因此了异常,导致system_server异常重启;

解决方案

具体方案就不提供了,这里提供三个思路供参考:

  1. 问题发生于startLocalOnlyHotspot()stoptLocalOnlyHotspot()之间无间隔的快速调用,因此在此处添加一个必要时间窗口判断,并做对应的延时,是可以规避该问题的;(固定延时,还是监听Tethering状态回调来讲无序调用变成有序调用均可)
  2. 问题的根本在于TetherInterfaceStateMachine对已经销毁的接口发送添加路由规则的请求,因此在发送前对接口是否被销毁进行判断,也可以规避该问题;
  3. 广播android.net.wifi.WIFI_AP_STATE_CHANGED未能及时送达,是因为接收者处于一个可能会繁忙的线程中,因此将其从该线程剥离,或者使用Handler.sendMessageAtFrontOfQueue()来确保其优先级,也是可以考虑的一个方向;
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 3
    评论
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值