记一个android R上开机启动vendor.boot-hal-1-1进程启动失败的过程分析,总结一下下,也给需要的提供个参考。
问题:
在开机启动过程中,一直报错,vendor.boot-hal-1-1无法正常启动。
[ 18.037464] {1}[1:init]init: starting service 'vendor.boot-hal-1-1'...
[ 18.040238] {1}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[ 18.040622] {1}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[ 18.071197] {1}[1:init]init: Service 'vendor.boot-hal-1-1' (pid 2507) 333 exited with status 1
[ 18.071225] {1}[1:init]init: Sending signal 9 to service 'vendor.boot-hal-1-1' (pid 2507) process group...
[ 18.071506] {1}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2507 in 0ms
[ 18.096360] {1}[1:init]init: Service 'bpfloader' (pid 2502) 333 exited with status 0 oneshot service took 0.149000 seconds in background
[ 18.096385] {1}[1:init]init: Sending signal 9 to service 'bpfloader' (pid 2502) process group...
[ 18.096593] {1}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2502 in 0ms
[ 19.159220] {2}[2503:update_verifier]HidlServiceManagement: Waited one second for android.hardware.boot@1.0::IBootControl/default
[ 19.173529] {2}[2503:update_verifier]HidlServiceManagement: getService: Trying again for android.hardware.boot@1.0::IBootControl/default...
[ 19.174937] {3}[1:init]init: starting service 'vendor.boot-hal-1-1'...
[ 19.180231] {3}[1:init]init: Control message: Processed ctl.interface_start for 'android.hardware.boot@1.0::IBootControl/default' from pid: 2387 (/system/bin/hwservicemanager)
[ 19.214810] {3}[1:init]init: Service 'vendor.boot-hal-1-1' (pid 2509) 333 exited with status 1
[ 19.224638] {3}[1:init]init: Sending signal 9 to service 'vendor.boot-hal-1-1' (pid 2509) process group...
[ 19.235949] {3}[1:init]libprocessgroup: Successfully killed process cgroup uid 0 pid 2509 in 0ms
[ 20.213341] {2}[2503:update_verifier]HidlServiceManagement: Waited one second for android.hardware.boot@1.0::IBootControl/default
[ 20.227637] {2}[2503:update_verifier]HidlServiceManagement: getService: Trying again for android.hardware.boot@1.0::IBootControl/default...
看起来是vendor.boot-hal-1-1这个服务起来后,很快就异常了,然后被kill掉了。
如何分析定位?
因为这个log一直打印刷屏,看着有点烦,可以通过如下命令把log打印关闭
echo 0 > /proc/sys/kernel/printk
好了,现在串口不疯狂打印log了,可以借助strace进行定位,命令strace 进程名,当然也可以用strace -p 进程pid
strace /vendor/bin/hw/android.hardware.boot@1.1-service
strace后打印的部分内容如下:
openat(AT_FDCWD, "/vendor/etc/fstab.xxx", O_RDONLY|O_CLOEXEC) = 6
writev(5, [{iov_base="\0\241\nNb\370`\266\236)#", iov_len=11}, {iov_base="\4", iov_len=1}, {iov_base="android.hardware.boot@1.1-servic"..., iov_len=34},
{iov_base="[libfs_mgr]ReadDefaultFstab "..., iov_len=81}], 4) = 127
writev(5, [{iov_base="\0\241\nNb\370`eU8#", iov_len=11}, {iov_base="\6", iov_len=1}, {iov_base="android.hardware.boot@1.1-servic"..., iov_len=34},
{iov_base="Could not find bootloader messag"..., iov_len=79}], 4) = 125
openat(AT_FDCWD, "/dev/pmsg0", O_WRONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
writev(5, [{iov_base="\0\241\nNb\370`/\310E#", iov_len=11}, {iov_base="\6", iov_len=1}, {iov_base="android.hardware.boot@1.1-impl\0", iov_len=31},
{iov_base="Could not initialize BootControl"..., iov_len=40}], 4) = 83
close(6) = 0
exit_group(1) = ?
+++ exited with 1 +++
根据这里面的打印内容“Could not initialize BootControl”,去代码中找吧,因为是HIDL的进程,直接去android\hardware\interfaces下面检索,最后定位到在android/hardware/interfaces/boot/1.1/default/boot_control/libboot_control.cpp
bool BootControl::Init() {
std::string device = get_bootloader_message_blk_device(&err);
if (device.empty()) {
LOG(ERROR) << "**Could not find bootloader message block device**: " << err;
return false;
}
...
}
继续追一下get_bootloader_message_blk_device的实现
std::string get_bootloader_message_blk_device(std::string* err) {
std::string misc_blk_device = get_misc_blk_device(err);
if (misc_blk_device.empty()) return "";
if (!wait_for_device(misc_blk_device, err)) return "";
return misc_blk_device;
}
继续追get_misc_blk_device的实现,从下面的代码基本上可以断定是从fstab中找misc分区节点配置了。
std::string get_misc_blk_device(std::string* err) {
if (g_misc_device_for_test.has_value() && !g_misc_device_for_test->empty()) {
return *g_misc_device_for_test;
}
Fstab fstab;
if (!ReadDefaultFstab(&fstab)) {
*err = "failed to read default fstab";
return "";
}
for (const auto& entry : fstab) {
if (entry.mount_point == "/misc") {
return entry.blk_device;
}
}
*err = "failed to find /misc partition";
return "";
}
回头一看,果然是fstab.xxx中没有配置misc分区,添加上
/dev/block/by-name/misc /misc emmc defaults defaults
重新编译,果然就不再报这个问题了。
现在回头来看一下vendor.boot-hal-1-1启动的是啥进程,看了一下相关的bp和代码,这是打开ENABLE_AB = true后,用来OTA A/B升级完成时更新slot信息的进程和相关接口,为了告诉系统应用引导哪个slot a还b镜像。
好了,就分析到这吧。