1. ANR patthern
ANR时候,从trace看主线程是在restoreDefaultApn,具体的操作是在delete,通过binder来实现,查看binderinfo发现是在phone进程里面执行,不过搜索对应线程trace发现,binder线程是空闲的,如Binder_2(trace是网上拷贝不是当时的),不清楚这是问啥,正常情况应该对应线程的trace也在delete()才是。
---pid 17348 at 2018-01-17 16:06:13
Cmd line: con.Android.settings
"main" tid=1 Native
at android.os.BinderProxy.transactNative(Native Method
at android.os.BinderProxy.transact(Binder.java)
at Android.content.ContentProviderProxy.delete ()ContentProviderNative.java
at com.android.settings.ResetNetworkConfirm.restoreDefaultApn(ResetNetworkConfirm.java:158)
---pid 17348 at 2018-01-17 16:06:15
Cmd line: con.Android.phone
"Binder_2" prio=5 tid=8 Native
| group="main" sCount=1 dsCount=0 obj=0x12f9e0a0 self=0x7f76957400
| sysTid=22049 nice=0 cgrp=default sched=0/0 handle=0x7f7b92f440
| state=S schedstat=( 3091875 6286929 20 ) utm=0 stm=0 core=5 HZ=100
| stack=0x7f7b833000-0x7f7b835000 stackSize=1013KB
| held mutexes=
native: pc 0000000000068c88 /system/lib64/libc.so (__ioctl+4)
native: pc 0000000000072670 /system/lib64/libc.so (ioctl+100)
native: pc 000000000002d3b4 /system/lib64/libbinder.so (android::IPCThreadState::talkWithDriver(bool)+164)
native: pc 000000000002dc88 /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+24)
native: pc 000000000002dda4 /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+76)
native: pc 00000000000367e8 /system/lib64/libbinder.so (???)
native: pc 0000000000015954 /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+208)
native: pc 000000000009392c /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
native: pc 00000000000151a4 /system/lib64/libutils.so (???)
native: pc 00000000000660c4 /system/lib64/libc.so (__pthread_start(void*)+52)
native: pc 000000000001ee84 /system/lib64/libc.so (__start_thread+16)
2.Code Flow
看代码,当时是在恢复设置,包含wifi bt ims apn等等,很多东西,打印Log会发现这个button执行完的时间大概是2s,TelephonyProvider.delete操作会去持有锁,当快速点击多次后面的就会等锁。这就很容易导致anr
@/packages/apps/Settings/src/com/android/settings/ResetNetworkConfirm.java
private Button.OnClickListener mFinalClickListener = new Button.OnClickListener() {
@Override
public void onClick(View v) {
if (Utils.isMonkeyRunning()) { //monkey return
return;
}
connectivityManager.factoryReset();
wifiManager.factoryReset();
telephonyManager.factoryReset(mSubId);
policyManager.factoryReset(subscriberId);
btAdapter.factoryReset();
ImsManager.factoryReset(context);
restoreDefaultApn(context);
}
};
184 private void restoreDefaultApn(Context context) {
185 Uri uri = Uri.parse(ApnSettings.RESTORE_CARRIERS_URI);
186
187 if (SubscriptionManager.isUsableSubIdValue(mSubId)) {
188 uri = Uri.withAppendedPath(uri, "subId/" + String.valueOf(mSubId));
189 }
190
191 ContentResolver resolver = context.getContentResolver();
192 resolver.delete(uri, null, null);
193 }
@/packages/providers/TelephonyProvider/src/com/android/providers/telephony/TelephonyProvider.java
@Override
public synchronized int delete(Uri url, String where, String[] whereArgs) { //synchronized 同步锁
...
switch (match)
{
...
case URL_RESTOREAPN: {
count = 1;
restoreDefaultAPN(subId);
getContext().getContentResolver().notifyChange(
Uri.withAppendedPath(CONTENT_URI, "restore/subId/" + subId), null,
true, UserHandle.USER_ALL);
break;
}
return count;
}
3.疑问解答
1.Q:为啥phone进程的binder显示是IDLE,而不是在delete?
A:dump anr trace时,phone进程dump晚了2s钟,此时delete已经完成,因此是空闲的。
4.解决方案
一般anr解决都是降低耗时操作,比如这里phone.delete耗时1s左右,是否可以优化---对比正常机器这个时间不算异常
耗时操作不放在主线程--把onclick里面的操作都放到其他线程---不过这里delete会有一个同步锁,就算开线程也不是很好
最终的解决是降低button的点击频率,比如必须执行完onclick内容才能再次点击
5.本博客的意思
不要犯一看trace就说这里有毛病的结论,比如这里不应该说settings的主线程blocked在ResetNetworkConfirm.restoreDefaultApn,就说这里有问题,然后看到binder对端是phone,就直接把此bug归为phone问题,然后丢过去,anr里面就有这种多次情况还有循环耗时的情况就,但是每次循环大概在0.5s,这种导致,还有就是前面的方法耗时4.8s,最后trace打印的LOG耗时才0.2s这种。