鸿蒙5.0 APP性能优化实战——应用无响应问题排查方案

往期推文全新看点(文中附带全新鸿蒙5.0全栈学习笔录)

✏️ 鸿蒙应用开发与鸿蒙系统开发哪个更有前景?

✏️ 嵌入式开发适不适合做鸿蒙南向开发?看完这篇你就了解了~

✏️ 对于大前端开发来说,转鸿蒙开发究竟是福还是祸?

✏️ 鸿蒙岗位需求突增!移动端、PC端、IoT到底该怎么选?

✏️ 记录一场鸿蒙开发岗位面试经历~

✏️ 持续更新中……


用户在使用应用时会出现点击没反应、应用无响应等情况,其超过一定时间限制后即被定义为应用无响应(appfreeze)。系统提供了检测应用无响应的机制,并生成appfreeze日志供应用开发分析。

说明
本文仅适用于Stage模型使用。且在根据本文分析日志前,需要开发者对JS在系统中运行情况、C++程序堆栈信息有相关基础知识,并对应用相关的子系统有一定了解。

应用无响应检测原理

THREAD_BLOCK_6S 应用主线程卡死

该故障出现表示当前应用主线程有卡死或者执行任务过多的情况,影响任务执行的流畅度和体验。

该事件的检测原理是:应用的watchdog线程定期向主线程插入判活检测,并在自己线程插入超时上报机制。当判活检测超过3s没有被执行,会上报THREAD_BLOCK_3S警告事件;超过6s依然没有被执行,会上报THREAD_BLOCK_6S主线程卡死事件。两个事件匹配生成THREAD_BLOCK的应用无响应日志。

检测原理如下图:

APP_INPUT_BLOCK 用户输入响应超时

该故障是指用户的点击事件超过一定时间限制未得到响应,严重影响当前用户体验。

该事件的检测原理是:用户点击应用的按钮时,输入系统会向应用侧发送点击事件,但超时未收到应用侧的响应反馈回执,则上报该故障。

检测原理如下图:

LIFECYCLE_TIMEOUT 生命周期切换超时

生命周期切换超时是指UIAbility生命周期切换超时。

该故障出现在生命周期切换的过程中,影响当前应用内Ability的切换。

该事件的检测原理是:通过获取不同生命周期切换的过程,在生命周期切换开始的位置向watchdog线程插入超时任务,在生命周期切换完成之后移除超时任务,固定时间内未成功移除将上报故障。

生命周期切换超时由LIFECYCLE_HALF_TIMEOUT和LIFECYCLE_TIMEOUT两个事件组合而成。LIFECYCLE_HALF_TIMEOUT作为LIFECYCLE_TIMEOUT的警告事件,抓取binder等信息。

检测的生命周期与开发者可感知的UIAbility生命周期类似,存在微小区别:

应用无响应日志解析

分析应用无响应日志需要了解应用无响应(appfreeze)日志结构及各部分含义,详见日志规格。

应用无响应自恢复

当应用发生以下故障时,为了保证可恢复,会杀死应用。

故障类型说明
THREAD_BLOCK_6S应用主线程卡死超时
APP_INPUT_BLOCK用户输入响应超时
LIFECYCLE_TIMEOUTAbility生命周期切换超时

定位步骤与思路

获取日志

应用无响应日志是一种故障日志,与Native进程崩溃、JS应用崩溃、系统进程异常等都由FaultLog模块管理,可通过以下方式获取日志:

方式一:通过DevEco Studio获取日志

DevEco Studio会收集设备的故障日志并归档到FaultLog下。具体可参考FaultLog。

方式二:通过hiappevent获取

hiappevent对外提供订阅系统卡死事件,可以查询卡死事件信息,详见订阅系统事件(卡死事件)。

方式三:通过shell获取日志

应用无响应日志是以appfreeze-开头,生成在设备“/data/log/faultlog/faultlogger/”路径下。该日志文件名格式为“appfreeze-应用包名-应用UID-秒级时间”。

查看基本信息

1、进程号

故障日志中搜索 “Pid” 可获得。

根据进程号可以查找对应进程的栈信息、在流水日志中过滤出对应进程的日志输出等。

2、故障类型

故障日志中搜索 “Reason” 可获得。

根据故障类型可以参考对应类型的检测原理和故障检测时长。

3、故障上报时间点

故障日志中搜索 “Fault time” 可获得。

请注意日志中多处 “TIMESTAMP” 字段的含义,代表着故障形成过程中中间重要节点的时间,正常场景下与故障上报时间点接近,但是特殊的,当故障形成时由于种种系统原因导致阻塞,可能存在一定的时间误差。

而 “Fault time” 字段最接近故障上报时间点。

4、前后台信息

故障日志中搜索 “Foreground” 可获得。

该字段表示应用发生故障时所处前后台状态。

值得注意的是,APP_INPUT_BLOCK 用户输入响应超时,事件能够分发到应用上,可想而知应用必为前台状态。

5、故障检测时长

该信息可从故障类型和前后台信息推理得出。

对于 THREAD_BLOCK_6S 事件,当应用处于前台时,检测时长为 6s;当应用处在后台时,由于不直接与用户交互,其对主线程阻塞判断有所放宽,其故障检测时长为 21s。

对于 LIFECYCLE_TIMEOUT 事件,可以从 MSG 的 reason 部分获取是哪种生命周期超时,参照表格获取其对应的故障检测时长。

6、故障检测时间区间

该信息可从故障上报时间点和故障检测时长推理得出。通过故障时间点往前推故障检测时长可得到故障发生的具体时间。

区间为:【故障上报时间点 - 故障检测时长,故障上报时间点】

查看eventHandler信息

1、dump begin curTime & Current Running

mainHandler dump is:
EventHandler dump begin curTime: 2024-08-08 12:17:43.544         -> 开始 dump 时间
Event runner (Thread name = , Thread ID = 35854) is running
Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = xxx  }
-> trigger time:任务开始运行的时间

当前任务已运行时长 = dump begin curTime - trigger time,如示例中当前任务运行达到 27s

若时间差 > 故障检测时长,表示当前正在运行的任务即是导致应用卡死的任务,需排查该任务运行情况。

若时间差较小,表示当前任务仅是检测时间区间内主线程运行的任务之一,主要耗时不一定是本任务,需排查近期运行的任务中耗时较长者。该情形多为线程繁忙导致的watchdog无法调度执行。

2、History event queue information

Current Running: start at 2024-08-08 12:17:16.629, Event { send thread = 35882, send time = 2024-08-08 12:17:16.628, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.630, task name = , caller = [extension_ability_thread.cpp(ScheduleAbilityTransaction:393)] }
History event queue information:
No. 0 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.528, priority = High, id = 1 }
No. 1 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.525, handle time = 2024-08-08 12:17:15.525, trigger time = 2024-08-08 12:17:15.527, completeTime time = 2024-08-08 12:17:15.527, priority = Low, task name = MainThread:SetRunnerStarted }
No. 2 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.765, handle time = 2024-08-08 12:17:15.765, trigger time = 2024-08-08 12:17:15.766, completeTime time = 2024-08-08 12:17:15.800, priority = Low, task name = MainThread:LaunchApplication }
No. 3 : Event { send thread = 35856, send time = 2024-08-08 12:17:15.767, handle time = 2024-08-08 12:17:15.767, trigger time = 2024-08-08 12:17:15.800, completeTime time = 2024-08-08 12:17:16.629, priority = Low, task name = MainThread:LaunchAbility }
No. 4 : Event { send thread = 35854, send time = 2024-08-08 12:17:15.794, handle time = 2024-08-08 12:17:15.794, trigger time = 2024-08-08 12:17:16.629, completeTime time = 2024-08-08 12:17:16.629, priority = IDEL, task name = IdleTime:PostTask }
No. 5 : Event { send thread = 35882, send time = 2024-08-08 12:17:16.629, handle time = 2024-08-08 12:17:16.629, trigger time = 2024-08-08 12:17:16.629, completeTime time = , priority = Low, task name =  }

可以从历史任务队列中寻找故障发生时间区间内较为耗时的任务。其中 completeTime time 为空的任务即是当前任务。

任务运行耗时 = completeTime time - trigger time

筛选出耗时较高的任务,排查其运行情况。

3、VIP priority event queue information

VIP priority event queue information:
No.1 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.2 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.3 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.407, handle time = 2024-08-07 04:11:15.407, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.4 : Event { send thread = 3961, send time = 2024-08-07 04:11:15.408, handle time = 2024-08-07 04:11:15.408, task name = MMI::OnPointerEvent, caller = [input_manager_impl.cpp(OnPointerEvent:493)] }
No.5 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.408, handle time = 2024-08-07 04:11:15.408, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.6 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.7 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.8 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.409, handle time = 2024-08-07 04:11:15.409, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.9 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.410, handle time = 2024-08-07 04:11:15.410, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.10 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.410, handle time = 2024-08-07 04:11:15.410, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.11 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.411, handle time = 2024-08-07 04:11:15.411, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.12 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.412, handle time = 2024-08-07 04:11:15.412, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.13 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.412, handle time = 2024-08-07 04:11:15.412, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.14 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.413, handle time = 2024-08-07 04:11:15.413, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.15 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.414, handle time = 2024-08-07 04:11:15.414, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.16 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.414, handle time = 2024-08-07 04:11:15.414, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.17 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.414, handle time = 2024-08-07 04:11:15.414, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.18 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.415, handle time = 2024-08-07 04:11:15.415, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.19 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.416, handle time = 2024-08-07 04:11:15.416, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
No.20 : Event { send thread = 3961, send time = 2024-08-07 04:11:15.417, handle time = 2024-08-07 04:11:15.417, task name = MMI::OnPointerEvent, caller = [input_manager_impl.cpp(OnPointerEvent:493)] }
No.21 : Event { send thread = 3205, send time = 2024-08-07 04:11:15.417, handle time = 2024-08-07 04:11:15.417, task name = ArkUIWindowInjectPointerEvent, caller = [task_runner_adapter_impl.cpp(PostTask:33)] }
...

用户输入事件传递链中的任务都属于高优先级任务,为保障第一时间响应用户。

4、 High priority event queue information

High priority event queue information:
No.1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.6 : Event { send thread = 35862, send time = 2024-08-08 12:17:40.526, handle time = 2024-08-08 12:17:40.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.7 : Event { send thread = 35862, send time = 2024-08-08 12:17:43.544, handle time = 2024-08-08 12:17:43.544, id = 1, caller = [watchdog.cpp(Timer:156)] }
Total size of High events : 7

watchdog 任务位于此优先级队列中,观察可发现其是每隔 3s 发送一次。

对比warning/block事件,观察 watchdog任务在队列中的移动情况。

warning:

High priority event queue information:
No.1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)] }
Total size of High events : 4

block:

High priority event queue information:
No.1 : Event { send thread = 35862, send time = 2024-08-08 12:17:25.526, handle time = 2024-08-08 12:17:25.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.2 : Event { send thread = 35862, send time = 2024-08-08 12:17:28.526, handle time = 2024-08-08 12:17:28.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.3 : Event { send thread = 35862, send time = 2024-08-08 12:17:31.526, handle time = 2024-08-08 12:17:31.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.4 : Event { send thread = 35862, send time = 2024-08-08 12:17:34.530, handle time = 2024-08-08 12:17:34.530, id = 1, caller = [watchdog.cpp(Timer:156)] }
No.5 : Event { send thread = 35862, send time = 2024-08-08 12:17:37.526, handle time = 2024-08-08 12:17:37.526, id = 1, caller = [watchdog.cpp(Timer:156)] }
Total size of High events : 5

以上示例中可发现block队列相比于warning队列更长了,而对应的第一个任务没有发生变化,可能存在两种情况:

  • 当前正在运行的任务卡死阻塞,导致其他任务一直未被调度执行。

  • 更高优先级队列中任务堆积,导致位于较低优先级队列中的watchdog任务未被调度执行。

查看stack信息

应用无响应所对应的故障类型,其检测主要都是应用主线程。

应用主线程存在以下几种情况:

1、warning/block栈一致,卡锁

Tid:3025, Name: xxx
# 00 pc 00000000001b4094 /system/lib/ld-musl-aarch64.so.1(__timedwait_cp+188)(b168f10a179cf6050a309242262e6a17)
# 01 pc 00000000001b9fc8 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_timedlock_inner+592)(b168f10a179cf6050a309242262e6a17)
# 02 pc 00000000000c3e40 /system/lib64/libc++.so(std::__h::mutex::lock()+8)(9cbc937082b3d7412696099dd58f4f78242f9512) --> 等锁卡死
# 03 pc 000000000007ac4c /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::Container::Release(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+60)(5e8443def4695e8c791e5f847035ad9f)
# 04 pc 000000000007aaf4 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteConnectionPool::ReleaseNode(std::__h::shared_ptr<OHOS::NativeRdb::SqliteConnectionPool::ConnNode>)+276)(5e8443def4695e8c791e5f847035ad9f)
# 05 pc 000000000007a8c0 /system/lib64/platformsdk/libnative_rdb.z.so(5e8443def4695e8c791e5f847035ad9f)
# 06 pc 00000000000b36ec /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::Close()+324)(5e8443def4695e8c791e5f847035ad9f)
# 07 pc 000000000006da94 /system/lib64/module/data/librelationalstore.z.so(OHOS::RelationalStoreJsKit::ResultSetProxy::Close(napi_env__*, napi_callback_info__*) (.cfi)+212)(5c7c67512e12e0e53fd23e82ee576a88)
# 08 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511)
# 09 pc 00000000002d71d0 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40)
# 10 at parseResultSet (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:266:1)
# 11 at query (entry/build/default/cache/default/default@CompileArkTS/esmodule/release/datamanager/datawrapper/src/main/ets/database/RdbManager.ts:188:1)

堆栈显示等锁卡死,通过反编译获取对应代码行,排查其他线程栈和代码上下文锁的使用解决故障。

2、warning/block栈一致,卡在IPC请求

Tid:53616, Name:xxx
# 00 pc 0000000000171c1c /system/lib/ld-musl-aarch64.so.1(ioctl+176)(b168f10a179cf6050a309242262e6a17)
# 01 pc 0000000000006508 /system/lib64/chipset-pub-sdk/libipc_common.z.so(OHOS::BinderConnector::WriteBinder(unsigned long, void*)+100)(1edec25445c569dd1093635c1da3bc0a) --> binder 卡死
# 02 pc 000000000004d500 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::TransactWithDriver(bool)+296)(6151eca3b47aa2ab3e378e6e558b90f3)
# 03 pc 000000000004c6c0 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::WaitForCompletion(OHOS::MessageParcel*, int*)+128)(6151eca3b47aa2ab3e378e6e558b90f3)
# 04 pc 000000000004c304 /system/lib64/platformsdk/libipc_core.z.so(OHOS::BinderInvoker::SendRequest(int, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+348)(6151eca3b47aa2ab3e378e6e558b90f3)
# 05 pc 00000000000319ac /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequestInner(bool, unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+124)(6151eca3b47aa2ab3e378e6e558b90f3)
# 06 pc 0000000000031cfc /system/lib64/platformsdk/libipc_core.z.so(OHOS::IPCObjectProxy::SendRequest(unsigned int, OHOS::MessageParcel&, OHOS::MessageParcel&, OHOS::MessageOption&)+184)(6151eca3b47aa2ab3e378e6e558b90f3)
# 07 pc 0000000000023c7c /system/lib64/libipc.dylib.so(<ipc::remote::obj::RemoteObj>::send_request+268)(7006cb5520edc22f64d04df86cb90152)
# 08 pc 000000000000b904 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::send_request+48)(4073ec22b58b83f79883d5fc8102ce77)
# 09 pc 000000000000b600 /system/lib64/libasset_sdk.dylib.so(<asset_sdk::Manager>::query+156)(4073ec22b58b83f79883d5fc8102ce77)
# 10 pc 0000000000006d94 /system/lib64/libasset_sdk_ffi.z.so(query_asset+116)(9a309896092ba014c878289a54688679)
# 11 pc 0000000000006740 /system/lib64/module/security/libasset_napi.z.so((anonymous namespace)::NapiQuerySync(napi_env__*, napi_callback_info__*) (.cfi)+220)(ef7afe850712e4822f085ed0ac184e8a)
# 12 pc 0000000000034408 /system/lib64/platformsdk/libace_napi.z.so(panda::JSValueRef ArkNativeFunctionCallBack<true>(panda::JsiRuntimeCallInfo*)+220)(f271f536a588ef9d0dc5328c70fce511)

通过IPC栈帧下面的业务栈帧,识别应用是通过什么接口进行IPC请求,识别对端是什么进程。需要结合 binder 调用链,确定对端阻塞没有返回的原因。

3、warning/block栈一致,卡在某业务栈帧

warning/block栈均为:

Tid:14727, Name:xxx
# 00 pc 00000000001c4c60 /system/lib/ld-musl-aarch64.so.1(pread+72)(b168f10a179cf6050a309242262e6a17)
# 01 pc 0000000000049154 /system/lib64/platformsdk/libsqlite.z.so(unixRead+180)(48485aa23da681fc87d8dc0b4be3e34c)
# 02 pc 0000000000053e98 /system/lib64/platformsdk/libsqlite.z.so(readDbPage+116)(48485aa23da681fc87d8dc0b4be3e34c)
# 03 pc 0000000000053d48 /system/lib64/platformsdk/libsqlite.z.so(getPageNormal+864)(48485aa23da681fc87d8dc0b4be3e34c)
# 04 pc 00000000000757a0 /system/lib64/platformsdk/libsqlite.z.so(getAndInitPage+216)(48485aa23da681fc87d8dc0b4be3e34c)
# 05 pc 0000000000077658 /system/lib64/platformsdk/libsqlite.z.so(moveToLeftmost+164)(48485aa23da681fc87d8dc0b4be3e34c)
# 06 pc 000000000006aa34 /system/lib64/platformsdk/libsqlite.z.so(sqlite3VdbeExec+34532)(48485aa23da681fc87d8dc0b4be3e34c)
# 07 pc 000000000002e424 /system/lib64/platformsdk/libsqlite.z.so(sqlite3_step+644)(48485aa23da681fc87d8dc0b4be3e34c)
# 08 pc 00000000000b1a70 /system/lib64/platformsdk/libnative_rdb.z.so(FillSharedBlockOpt+408)(5e8443def4695e8c791e5f847035ad9f)
# 09 pc 0000000000082a94 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteStatement::FillBlockInfo(OHOS::NativeRdb::SharedBlockInfo*) const+76)(5e8443def4695e8c791e5f847035ad9f)
# 10 pc 00000000000b4214 /system/lib64/platformsdk/libnative_rdb.z.so(OHOS::NativeRdb::SqliteSharedResultSet::ExecuteForSharedBlock(OHOS::AppDataFwk::SharedBlock*, int, int, bool)+236)(5e8443def4695e8c791e5f847035ad9f)

结合 trace 进一步确认,排查是否为应用调用的单一函数内部逻辑执行超时,例如:某函数内执行复杂计算。

4、瞬时栈,warning/error 栈不一致

表示两个时刻是在线程的运行过程中抓取的栈信息,此时进程未卡死,属于线程繁忙场景。

warning栈:

Tid:2648, Name:ohos.sceneboard
# 00 pc 00000000001bd7e4 /system/lib/ld-musl-aarch64.so.1(__pthread_mutex_trylock+36)(aded9a3bf39f018cb492cc3b0ba36667)
# 01 pc 0000000000609d7c /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::Mutex::TryLock()+8)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 02 pc 00000000005a10d4 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::RuntimeLockHolder::RuntimeLockHolder(panda::ecmascript::JSThread*, panda::ecmascript::Mutex&)+32)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 03 pc 000000000030abf8 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::EcmaStringTable::GetOrInternString(panda::ecmascript::EcmaVM*, unsigned char const*, unsigned int, bool)+136)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 04 pc 00000000005249d0 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::ObjectFactory::NewFromASCII(std::__h::basic_string_view<char, std::__h::char_traits<char>>)+44)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 05 pc 0000000000184768 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::base::NumberHelper::NumberToString(panda::ecmascript::JSThread const*, panda::ecmascript::JSTaggedValue)+412)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 06 pc 0000000000166c28 /system/lib64/platformsdk/libark_jsruntime.so(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 07 pc 0000000000166ba4 /system/lib64/platformsdk/libark_jsruntime.so(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 08 pc 00000000001d8334 /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsArray::Flat(panda::ecmascript::EcmaRuntimeCallInfo*)+416)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 09 pc 0000000000339790 /system/lib64/module/arkcompiler/stub.an(RTStub_PushCallArgsAndDispatchNative+40)
# 10 at getPageIndex (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/folder/model/FolderData.ts:463:463)
# 11 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/feature/desktop/bigfolder/src/main/ets/default/view/OpenFolderSwiperPage.ts:761:761)
# 12 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/feature/desktop/bigfolder/src/main/ets/default/view/OpenFolderSwiperPage.ts:749:749)
# 13 pc 00000000003291ac /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+280)(9d0bbd23f13dc63b84dc8d6c98d8ea54)

block栈:

Tid:2648, Name:ohos.sceneboard
# 00 pc 000000000001d52c /system/lib64/module/arkcompiler/stub.an(BCStub_HandleLdobjbynameImm8Id16+2392)
# 01 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/folder/model/FolderData.ts:464:464)
# 02 at getPageIndex (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/staticcommon/launchercommon/src/main/ets/folder/model/FolderData.ts:463:463)
# 03 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/feature/desktop/bigfolder/src/main/ets/default/view/OpenFolderSwiperPage.ts:761:761)
# 04 at anonymous (product/phone/build/default/cache/default/default@CompileArkTS/esmodule/release/feature/desktop/bigfolder/src/main/ets/default/view/OpenFolderSwiperPage.ts:749:749)
# 05 pc 00000000003291ac /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::InterpreterAssembly::Execute(panda::ecmascript::EcmaRuntimeCallInfo*)+280)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 06 pc 00000000001d34dc /system/lib64/platformsdk/libark_jsruntime.so(panda::ecmascript::builtins::BuiltinsArray::ForEach(panda::ecmascript::EcmaRuntimeCallInfo*)+680)(9d0bbd23f13dc63b84dc8d6c98d8ea54)
# 07 pc 00000000003393d4 /system/lib64/module/arkcompiler/stub.an(RTStub_AsmInterpreterEntry+208)

warning/block栈可能存在相似性,需结合trace和hilog判断应用具体运行场景,针对场景进行优化。

5、eventhandler栈

Tid:1778, Name:sapp.appgallery
# 00 pc 0000000000154c54 /system/lib/ld-musl-aarch64.so.1(epoll_wait+80)(aded9a3bf39f018cb492cc3b0ba36667)
# 01 pc 00000000000181e0 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EpollIoWaiter::WaitFor(std::__h::unique_lock<std::__h::mutex>&, long)+228)(ccb2dd405e62e3312e9e49a76e8ba04b)
# 02 pc 00000000000201b0 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueue::WaitUntilLocked(std::__h::chrono::time_point<std::__h::chrono::steady_clock, std::__h::chrono::duration<long long, std::__h::ratio<1l, 1000000000l>>> const&, std::__h::unique_lock<std::__h::mutex>&)+140)(ccb2dd405e62e3312e9e49a76e8ba04b)
# 03 pc 0000000000021b8c /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventQueueBase::GetEvent()+232)(ccb2dd405e62e3312e9e49a76e8ba04b)
# 04 pc 000000000002b638 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::(anonymous namespace)::EventRunnerImpl::Run()+900)(ccb2dd405e62e3312e9e49a76e8ba04b)
# 05 pc 000000000002e9c4 /system/lib64/chipset-pub-sdk/libeventhandler.z.so(OHOS::AppExecFwk::EventRunner::Run()+524)(ccb2dd405e62e3312e9e49a76e8ba04b)
# 06 pc 00000000000b4200 /system/lib64/platformsdk/libappkit_native.z.so(OHOS::AppExecFwk::MainThread::Start()+408)(40f96e5ff3eda8dfba747fd3d21ecd70)
# 07 pc 0000000000004e30 /system/lib64/appspawn/appspawn/libappspawn_ace.z.so(RunChildProcessor(AppSpawnContent*, AppSpawnClient*)+568)(4ed3e05e4fafe7623cd5942d69fc13c8)
# 08 pc 000000000000af04 /system/bin/appspawn(AppSpawnChild+576)(14694678248bb218ea81845fd797df79)
# 09 pc 000000000000ab98 /system/bin/appspawn(AppSpawnProcessMsg+708)(14694678248bb218ea81845fd797df79)
# 10 pc 0000000000012f30 /system/bin/appspawn(ProcessSpawnReqMsg+228)(14694678248bb218ea81845fd797df79)
# 11 pc 0000000000011cac /system/bin/appspawn(OnReceiveRequest+132)(14694678248bb218ea81845fd797df79)
# 12 pc 00000000000160ac /system/lib64/chipset-pub-sdk/libbegetutil.z.so(HandleRecvMsg_+344)(db681f1dae48804a106924985d1ee750)

此堆栈表示当前线程eventhandler在等待任务提交,说明此时线程不繁忙。

查看binder信息

binder信息抓取时机:存在半周期检测的故障类型是在warning事件产生后获取,其他则在block事件后获取。

1、获取binder调用链

PeerBinderCatcher -- pid==35854 layer_ == 1
BinderCatcher --
35877:35931 to 1386:0 code 5f475249 wait:27.197975517 s frz_state:3
35867:35979 to 52462:52509 code 3 wait:25.914378121 s frz_state:3
35863:35883 to 1386:0 code 5f475249 wait:27.191127080 s frz_state:3
35854:35933 to 1386:0 code 5f475249 wait:27.183442705 s frz_state:3
35854:35854 to 52462:52462 code 3 wait:27.185154163 s frz_state:3                       -> 35854:35854 to 52462:52462 code 3 wait:27.185154163 s
35844:35947 to 1386:0 code 5f475249 wait:24.287536977 s frz_state:3
35551:35683 to 1386:0 code 5f475249 wait:32.275078121 s frz_state:3
35565:35615 to 1386:0 code 5f475249 wait:37.2062495 s frz_state:3
35563:35574 to 1386:0 code 5f475249 wait:36.982649996 s frz_state:3
35568:35571 to 1386:0 code 5f475249 wait:28.442972913 s frz_state:3
async    618:2274 to 35568:0 code 1 wait:28.453510413 s frz_state:3
async    618:2274 to 35568:0 code 1 wait:28.453321871 s frz_state:3
35419:35419 to 0:0 code 403 wait:47.246904160 s frz_state:-1
52462:33643 to 1386:0 code 13 wait:102.405493214 s frz_state:3
52462:52509 to 1386:0 code 13 wait:22.285699476 s frz_state:3
52462:52506 to 1386:0 code 13 wait:135.369138522 s frz_state:3
52462:52462 to 1386:0 code 13 wait:24.733640622 s frz_state:3                           -> 52462:52462 to 1386:0 code 13 wait:24.733640622 s
16786:34270 to 1386:0 code 13 wait:107.404109880 s frz_state:3
async    618:35370 to 1386:0 code 1 wait:36.774835412 s frz_state:3
async    618:35370 to 1386:0 code 1 wait:36.774851037 s frz_state:3
async    618:35719 to 1386:0 code 1 wait:28.464953642 s frz_state:3
async    618:35719 to 1386:0 code 1 wait:28.464939059 s frz_state:3
1341:4565 to 1386:3223 code 14 wait:180.18540598 s frz_state:3
1341:2215 to 1386:0 code 110 wait:100.322648423 s frz_state:3
async    824:844 to 1341:0 code 1 wait:180.30325494 s frz_state:3
async    824:844 to 1341:0 code 1 wait:100.958674465 s frz_state:3
1334:1911 to 1386:0 code 13 wait:31.153617184 s frz_state:3
975:37303 to 1386:0 code 13 wait:107.408228630 s frz_state:3
780:969 to 1386:0 code 2 wait:37.373139578 s frz_state:3

以上示例为参考:从故障进程的主线程出发,存在 35854:35854 -> 52462:52462 -> 1386:0 的调用链关系。

结合对端进程堆栈信息排查对端阻塞原因。

2、线程号为0

表示该应用 IPC FULL,即应用的ipc线程都在使用中,没有空闲线程分配来完成本次请求,导致阻塞。例如上面示例中的 1386 进程

pid      context      request       started       max       ready  free_async_space
35863    binder     0          2         16        2      519984
35854    binder     0          2         16        3      520192
35850    binder     0          2         16        3      520192
13669    binder     0          1         16        3      520192
52516    binder     0          3         16        4      520192
52462    binder     0          9         16        8      520192
16786    binder     0          8         16        10     520192
1972 binder     0          1         16        3      520192
1386 binder     1          15            16        0      517264                  -> binderInfo
1474 binder     0          2         16        4      520192

可以看到此时 1386 进程处于 ready 态的线程为 0,验证了上述说法。 此情况说明该进程的其他ipc线程可能全部被阻塞了,需要分析排查为什么其他ipc线程不释放。常见场景为:某一ipc线程持锁阻塞,导致其他所有线程等锁卡死。

另一种情况为 free_async_space 消耗殆尽,导致新的ipc线程没有足够的 buffer 空间完成请求。值得说明的是,同步和异步请求都会消耗该值,常见场景为:某短时间段内大批量异步请求。

3、waitTime过小

waitTime 表示的是本次ipc通信时长,如果该值远小于故障检测时长,我们有理由确定本次ipc请求并不是卡死的根本原因。

一种典型场景是:应用侧主线程在短时间内多次ipc请求,总请求时长过长导致故障。

排查方向:

  • 单次请求是否在预期时长内(例如:规格在20ms的请求接口异常情形下达到1s),排查该接口性能不达预期的原因。

  • 应用侧频繁调用场景是否合理。

4、无调用关系,而栈却为ipc栈

确定是否为瞬时栈,即warning/block栈是否一致,可能场景是:warning为ipc栈,block栈为其他瞬时栈,表明抓取binder时ipc请求已经结束,本次ipc请求耗时并不长。

需要提到的是:binder信息并不是在发生故障时刻实时获取的,有一定的延迟性;对于存在半周期检测的故障类型来说,binder抓取比较准确,绝大多数都可以在故障时间段内完成采集;而其他故障类型在上报存在延迟的情况下可能抓取到非现场binder。

当然,结合 trace 分析更能直观查看binder耗时情况。

结合hilog信息

DFX 相关打印:

1、故障上报时间点

搜索关键词 “Start NotifyAppFault” 确定流水中故障上报时间点

2、后台检测(5次后上报)

搜索关键词 “In Background, thread may be blocked in, do not report this time” ,判断故障检测时长是否达到 21s

3、抓栈(signal: 35)

搜索关键词 “DfxFaultLogger: Receive dump request” 获取故障抓栈时间点,结合故障上报时间点判断抓栈时机是否准确

4、记录查杀原因

搜索关键词 “hisysevent write result=0, send event [FRAMEWORK,PROCESS_KILL]” 获取故障查杀原因,结合 AppRecovery

5、应用退出

搜索关键词 “is going to exit due to APP_FREEZE” 获取应用退出时间点

一种可能的场景:应用退出时还未完成抓栈,该情况下无法获取到栈信息

一般分析步骤:

1、确定故障上报时间点

2、推断故障检测区间

3、判断该时间区间内应用主线程运行状态:

  • 应用主线程日志完全无打应输出:卡死在最后日志打印的接口调用处

例如示例:APP_INPUT_BLOCK 类型在 07:24:08.167 上报,应用主线程在 07:24:01.581 后就没有打印了,可排查是否为 FormManagerService: [form_mgr_proxy.cpp(GetFormsInfoByApp:1128)] 中的逻辑超时

  • 应用疯狂打印输出日志:分析对应输出表示的场景及其合理性

例如示例:进程在被杀死前在大量输出,对应的 ImageEffect 领域需排查此日志是否正常,是否陷入死循环或频繁调用场景

结合trace信息

1、进程频繁执行密集任务超时;

示例为:PreviewArea::updateShotComponent(更新组件) -> ohos.animator (执行动画)-> 密集的动画执行过程达 9.2s;

线程繁忙地循环执行某业务,分析每一小段业务:

  • 不符合业务场景(此处不应该频繁调用),分析业务代码,为何会循环执行;

  • 符合业务场景,分析每一小段业务是否耗时超过预期,性能为何不满足设计规格;

2、进程执行某一任务超时

示例为:OHOS::AppExecFwk::FormMgrAdapter::GetFormsInfoByApp 接口执行时长达到 8s

3、进程多段任务累积超时

示例中:三段任务累积到达6s超时,判断场景为屏幕旋转后页面动画超时

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值