QCM6490 SSR 记述(一)

项目场景:

modem 子系统crash导致系统crash,偶现。且SSR已经关闭。

如果disable_restart_work 设置为DISABLE_SSR,那么不管什么(wlan adsp-audio/sensor modem etc)触发了SSR,都不会重启

#define DISABLE_SSR 0x9889deed
/* If set to 0x9889deed, call to subsystem_restart_dev() returns immediately */
//static uint disable_restart_work;
static uint disable_restart_work = DISABLE_SSR;

常用的打开log:

adb shell "echo 'file subsystem_restart.c +p' > /sys/kernel/debug/dynamic_debug/control"  
adb shell "echo 'file subsys-pil-tz.c +p' > /sys/kernel/debug/dynamic_debug/control"  

问题描述

1.有以下几种情况会导致系统重启/crash:

//subsystem_restart_dev
//01.If a system reboot/shutdown is underway ignore subsystem errors.
//However, print a message so that we know that a subsystem behaved unexpectedly here.
extern enum system_states {
  	SYSTEM_BOOTING,
  	SYSTEM_SCHEDULING,
  	SYSTEM_RUNNING,
  	SYSTEM_HALT,
  	SYSTEM_POWER_OFF,
  	SYSTEM_RESTART,
  	SYSTEM_SUSPEND,
  } system_state;
  if (system_state == SYSTEM_RESTART
		|| system_state == SYSTEM_POWER_OFF) {
		pr_err("%s crashed during a system poweroff/shutdown.\n", name);
		return -EBUSY;
}


//02.disable_restart_work = DISABLE_SSR;直接跳过


	if (disable_restart_work == DISABLE_SSR) {
		pr_err("subsys-restart: Ignoring restart request for %s\n",
									name);
		return 0;
	}


//03.restart_level
	switch (dev->restart_level) {

	case RESET_SUBSYS_COUPLED://related 已经确认是这里
		__subsystem_restart_dev(dev);
		break;
	case RESET_SOC://system
		__pm_stay_awake(dev->ssr_wlock);
		schedule_work(&dev->device_restart_work);
		return 0;
	default:
		panic("subsys-restart: Unknown restart level!\n");
		break;
	}
//__subsystem_restart_dev 
//04.正常的情况下,应该是track->p_state为SUBSYS_NORMAL;dev->track.state为SUBSYS_ONLINE;否则系统重启
	if (track->p_state != SUBSYS_CRASHED &&
					dev->track.state == SUBSYS_ONLINE) {
		if (track->p_state != SUBSYS_RESTARTING) {
			track->p_state = SUBSYS_CRASHED;
			__pm_stay_awake(dev->ssr_wlock);
			queue_work(ssr_wq, &dev->work);//触发子系统重启
		} else {
			pr_err("Subsystem %s crashed during SSR!", name);
		}
	} else
		WARN(dev->track.state == SUBSYS_OFFLINE,
			"SSR aborted: %s subsystem not online\n", name);

//	INIT_WORK(&subsys->work, subsystem_restart_wq_func);
//05.再次检测系统状态,系统关机重启abort SSR 
	if (system_state == SYSTEM_RESTART
		|| system_state == SYSTEM_POWER_OFF) {
		WARN(1, "SSR aborted: %s, system reboot/shutdown is under way\n",
			desc->name);
		pr_err("SSR aborted: %s, system reboot/shutdown is under way\n",
			desc->name);
		return;
	}
//06.子系统没有起来,abort SSR
	if (dev->track.state == SUBSYS_OFFLINE) {
		mutex_unlock(&track->lock);
		WARN(1, "SSR aborted: %s subsystem not online\n", desc->name);
		pr_err("SSR aborted: %s subsystem not online\n",
			desc->name);
		return;
	}

2.首先肯定是wlan adsp(audio sensor) modem 子系统异常触发中断或者直接进入下面的函数:
subsystem_restart_dev

3.内核中有许多地方调用类似BUG()的语句,它非常像一个内核运行时的断言,意味着本来不该执行到BUG()这条语句,一旦执行即抛出Oops。 BUG()的定义为:

#define BUG() do { \
  	printk("BUG at %s:%d/%s()!\n", __FILE__, __LINE__, __func__); \
  	panic("BUG!"); \
  } while (0)

BUG()还有一个变体叫BUG_ON(),它的内部会引用BUG()

#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while (0)

其中的panic()定义在kernel/panic.c中,会导致内核崩溃,并打印Oops。
内核有个稍微弱一些WARN_ON(),在括号中的条件成立时,内核会打印栈回溯,但是不会panic(),表示内核抛出一个警告,暗示某种不太合理的事情发生了。

4.CONFIG_SETUP_SSR_NOTIF_TIMEOUTS 这个宏控可以关闭,没有什么影响


原因分析:

目前出现一个问题,modem子系统重启偶现不生效;还是panic.

我这边加了个延时,手动触发modem crash 可以模拟出来“Subsystem modem crashed during SSR!”:是是因为前一次modem 子系统重启未完成又触发了下一次modem 子系统重启,使得p_state为SUBSYS_RESTARTING从而导致panic;正常情况下是不应该出现这么频繁的子系统重启的,为防止这种情况,可以加个标志位,等待上一次modem 子系统重启完成才会进行下一次子系统重启。

模拟方法:

static void subsystem_restart_wq_func(struct work_struct *work)
{
	struct subsys_device *dev = container_of(work,
						struct subsys_device, work);
	struct subsys_device **list;
	struct subsys_desc *desc = dev->desc;
	struct subsys_soc_restart_order *order = dev->restart_order;
	struct subsys_tracking *track;
	unsigned int count;
	unsigned long flags;
	int ret;

	/*
	 * It's OK to not take the registration lock at this point.
	 * This is because the subsystem list inside the relevant
	 * restart order is not being traversed.
	 */
	if (order) {
		list = order->subsys_ptrs;
		count = order->count;
		track = &order->track;
	} else {
		list = &dev;
		count = 1;
		track = &dev->track;
	}

	/*
	 * If a system reboot/shutdown is under way, ignore subsystem errors.
	 * However, print a message so that we know that a subsystem behaved
	 * unexpectedly here.
	 */
	 if(meig_work_flag==1){
		 pr_err("wait complete at the last time\n");
		return; 
	 }
	 meig_work_flag=1;
	 
	if (system_state == SYSTEM_RESTART
		|| system_state == SYSTEM_POWER_OFF) {
		WARN(1, "SSR aborted: %s, system reboot/shutdown is under way\n",
			desc->name);
		pr_err("SSR aborted: %s, system reboot/shutdown is under way\n",
			desc->name);
		return;
	}

	mutex_lock(&track->lock);
	do_epoch_check(dev);

	if (dev->track.state == SUBSYS_OFFLINE) {
		mutex_unlock(&track->lock);
		WARN(1, "SSR aborted: %s subsystem not online\n", desc->name);
		pr_err("SSR aborted: %s subsystem not online\n",
			desc->name);
		return;
	}

	/*
	 * It's necessary to take the registration lock because the subsystem
	 * list in the SoC restart order will be traversed and it shouldn't be
	 * changed until _this_ restart sequence completes.
	 */
	mutex_lock(&soc_order_reg_lock);

	pr_err("[%s:%d]: Starting restart sequence for %s\n",
			current->comm, current->pid, desc->name);
	notify_each_subsys_device(list, count, SUBSYS_BEFORE_SHUTDOWN, NULL);
	ret = for_each_subsys_device(list, count, NULL, subsystem_shutdown);
	if (ret)
		goto err;
	notify_each_subsys_device(list, count, SUBSYS_AFTER_SHUTDOWN, NULL);

	notify_each_subsys_device(list, count, SUBSYS_RAMDUMP_NOTIFICATION,
									NULL);

	spin_lock_irqsave(&track->s_lock, flags);
	track->p_state = SUBSYS_RESTARTING;
	spin_unlock_irqrestore(&track->s_lock, flags);

	//msleep(3000);
	/* Collect ram dumps for all subsystems in order here */
	for_each_subsys_device(list, count, NULL, subsystem_ramdump);

	for_each_subsys_device(list, count, NULL, subsystem_free_memory);

	notify_each_subsys_device(list, count, SUBSYS_BEFORE_POWERUP, NULL);
	ret = for_each_subsys_device(list, count, NULL, subsystem_powerup);
	if (ret)
		goto err;
	notify_each_subsys_device(list, count, SUBSYS_AFTER_POWERUP, NULL);

	pr_err("[%s:%d]: Restart sequence for %s completed.\n",
			current->comm, current->pid, desc->name);
+	msleep(3000);//加个延时,通过QXDM发送命令send_data 75 37 03 00 00 触发modem死机;多次发送,即可出现panic("Subsystem %s crashed during SSR!", name);

err:
	/* Reset subsys count */
	if (ret)
		dev->count = 0;
	//msleep(9000);

	mutex_unlock(&soc_order_reg_lock);
	mutex_unlock(&track->lock);

	spin_lock_irqsave(&track->s_lock, flags);
	pr_err("zhanghong 6666666666666\n");
	track->p_state = SUBSYS_NORMAL;
	meig_work_flag=0;
	__pm_relax(dev->ssr_wlock);
	spin_unlock_irqrestore(&track->s_lock, flags);
}

在这里插入图片描述

规避方案:
1.subsystem_restart_wq_func 函数开始的地方加个延时,等待上一次modem重启完成
2.panic(“Subsystem %s crashed during SSR!”, name); 改为仅打印


解决方案:

1.如何设置系统restart_level为related
device/qcom/common/rootdir/Android.mk

#<!-- Enable SSR for user version[Solution]Add use/debug control for init.qcom.rc. 
#LOCAL_SRC_FILES    := etc/init.qcom.rc
ifeq ($(TARGET_BUILD_VARIANT),user)
  LOCAL_SRC_FILES    := etc/init.qcom.user.rc
else
  LOCAL_SRC_FILES    := etc/init.qcom.rc
endif
#END-->

device/qcom/common/rootdir/etc/init.qcom.user.rc

    #sensors log dir
    mkdir /data/vendor/sensors
    chown system system /data/vendor/sensors

#<!-- Enable SSR Add use/debug control for init.qcom.rc. 
    write /sys/bus/msm_subsys/devices/subsys0/restart_level related
    write /sys/bus/msm_subsys/devices/subsys1/restart_level related
    write /sys/bus/msm_subsys/devices/subsys2/restart_level related
    write /sys/bus/msm_subsys/devices/subsys3/restart_level related
#end-->

# msm specific files that need to be created on /data
on post-fs-data
    mkdir /data/vendor/misc 01771 system system

在这里插入图片描述

“qcom,ignore-ssr-failure” can be added in the following node of dtsi
pil_modem: qcom,mss@4080000

modem 如何和AP通,待更新。。。

  • 2
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 3
    评论
QCM6490启动流程梳理: QCM6490是一种启动流程的处理器芯片,它具有一系列的启动流程。 首先,当电源被打开时,芯片进入到Reset状态,在这个状态下,芯片的所有寄存器、电路和功能都被重置为初始状态。接下来,芯片进入到Boot ROM的阶段,Boot ROM是一个存储器模块,其中包含了初始化引导程序的代码。引导程序的主要功能是加载操作系统和其他应用程序。 引导程序被加载完成后,芯片进入到初始化硬件的阶段。在这个阶段,芯片会对各种硬件组件进行初始化设置,例如外设接口、存储器控制器和时钟模块等。 接着,芯片进入到加载操作系统的阶段。操作系统通常存储在外部存储器中,如闪存或SD卡。在这个阶段,芯片会通过外设接口加载操作系统的代码和数据,并将控制权交给操作系统。 一旦操作系统加载完成,芯片进入到操作系统启动的阶段。在这个阶段,操作系统会初始化各种系统服务和设备驱动程序,为用户程序和应用程序提供运行环境。同时,操作系统也会监控芯片的各种状态和响应用户的指令。 最后,芯片进入到用户程序的运行阶段。在这个阶段,用户程序和应用程序可以与芯片进行交互,执行各种任务和功能。芯片的启动流程到此结束。 总体来说,QCM6490的启动流程包括重置芯片、加载引导程序、初始化硬件、加载操作系统和启动操作系统等几个关键步骤。这个流程确保了芯片在启动时的正常运行,并为用户提供一个良好的使用环境。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

墨染天姬

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值