一例冻屏问题分析

1      现象

出现大概率反复开关机卡死,大概测试到几百次就能复现。

2      分析

幸好卡死的时候有adb端口。用ps命令执行发现会卡死,但是能打印出部分进程状态,已经打印出来的进程里面看到systemmanger处于D状态,ps –t 查看是它的主线程处于D状态,但是用cat /proc/***/stack又没有发现该线程有啥异常,

尝试直接用echo w > /proc/sysrq-trigger打印D状态内核D状态调用栈,另外用echo l >/proc/sysrq-trigger打印activeCPU的栈。Log如下:

从log中能看到很多D状态线程都卡在ashmem_ioctl函数中,

<6>[2015:07:24 21:36:21] [  399.118434] [2652.0 sh]main  D c098eb5c 0 2640 1 0x00000000

<4>[2015:07:24 21:36:21] [  399.118514] [2652.0 sh]Backtrace:

<4>[2015:07:24 21:36:21] [  399.118596] [2652.0 sh][<c098e750>](__schedule+0x0/0x95c) from [<c098f20c>] (schedule+0x50/0x68)

<4>[2015:07:24 21:36:21] [  399.118692] [2652.0 sh][<c098f1bc>](schedule+0x0/0x68) from [<c099028c>] (__mutex_lock_slowpath+0x130/0x1c8)

<4>[2015:07:24 21:36:21] [  399.118789] [2652.0 sh][<c099015c>](__mutex_lock_slowpath+0x0/0x1c8) from [<c0990364>](mutex_lock+0x40/0x44)

<4>[2015:07:24 21:36:21] [  399.118887] [2652.0 sh][<c0990324>](mutex_lock+0x0/0x44) from [<c013efd0>] (ashmem_ioctl+0x310/0x74c)

<4>[2015:07:24 21:36:21] [  399.118951] [2652.0 sh] r4:42f66a0cr3:41007701

<4>[2015:07:24 21:36:21] [  399.119056] [2652.0 sh][<c013ecc0>](ashmem_ioctl+0x0/0x74c) from [<c015a044>] (do_vfs_ioctl+0x8c/0x5bc)

<4>[2015:07:24 21:36:21] [  399.119153] [2652.0 sh][<c0159fb8>](do_vfs_ioctl+0x0/0x5bc) from [<c015a5e8>] (sys_ioctl+0x74/0x84)

<4>[2015:07:24 21:36:21] [  399.119249] [2652.0 sh][<c015a574>](sys_ioctl+0x0/0x84) from [<c0057fc0>] (ret_fast_syscall+0x0/0x30)

<4>[2015:07:24 21:36:21] [  399.119313] [2652.0 sh] r8:c0058168r7:00000036 r6:47427020 r5:42f66a0c r4:00000018

对照C代码是如下位置,说明该锁被人占了,这个ashmem_mutex锁是在kernel/mm/ashmem.c中定义的,只有这个文件内部使用,可以挨个检查下。

static intset_name(struct ashmem_area *asma, void __user *name)

{

        int ret = 0;

        mutex_lock(&ashmem_mutex);

        /* cannot change an existing mapping'sname */

        if (unlikely(asma->file)) {

                ret = -EINVAL;

                goto out;

        }

        if(unlikely(copy_from_user(asma->name + ASHMEM_NAME_PREFIX_LEN,

                                    name,ASHMEM_NAME_LEN)))

                ret = -EFAULT;

        asma->name[ASHMEM_FULL_NAME_LEN-1] ='\0';

out:

        mutex_unlock(&ashmem_mutex);

        return ret;

}

同时log中发现有如下信息,

<6>[2015:07:2421:36:21] [  399.134198] [2652.0sh]sysEvents handl D c098eb5c 0 2601 161 0x00000000

<4>[2015:07:2421:36:21] [  399.134222] [2652.0sh]Backtrace:

<4>[2015:07:2421:36:21] [  399.134245] [2652.0sh][<c098e750>] (__schedule+0x0/0x95c) from [<c098f20c>](schedule+0x50/0x68)

<4>[2015:07:2421:36:21] [  399.134269] [2652.0sh][<c098f1bc>] (schedule+0x0/0x68) from [<c0991498>](__down_read+0xb4/0xec)

<4>[2015:07:2421:36:21] [  399.134294] [2652.0sh][<c09913e4>] (__down_read+0x0/0xec) from [<c0990640>](down_read+0x10/0x14)

<4>[2015:07:2421:36:21] [  399.134311] [2652.0 sh]r7:42f6400b r6:d3318440 r5:00000817 r4:d26e21a0

<4>[2015:07:2421:36:21] [  399.134350] [2652.0sh][<c0990630>] (down_read+0x0/0x14) from [<c0060798>](do_page_fault+0x6c/0x1ec)

<4>[2015:07:2421:36:21] [  399.134385] [2652.0sh][<c006072c>] (do_page_fault+0x0/0x1ec) from [<c0051b70>] (do_DataAbort+0x40/0x110)这里发生了缺页中断

<4>[2015:07:2421:36:21] [  399.134413] [2652.0sh][<c0051b30>] (do_DataAbort+0x0/0x110) from [<c0057a2c>](__dabt_svc+0x4c/0x60)

<4>[2015:07:2421:36:21] [  399.134432] [2652.0 sh]Exceptionstack(0xceb57d70 to 0xceb57db8)

<4>[2015:07:2421:36:21] [  399.134451] [2652.0sh]7d60:                                    ceb57e28 42f6400b 00000000 00000000

<4>[2015:07:2421:36:21] [  399.134474] [2652.0 sh]7d80:ceb57e28 00001000 c1a7a240 0001f2e8 00000000 00000000 cffe3b84 ceb57de4

<4>[2015:07:2421:36:21] [  399.134494] [2652.0 sh]7da0:00007000 ceb57db8 c0123090 c0108cd8 20000013 ffffffff

<4>[2015:07:2421:36:21] [  399.134533] [2652.0sh][<c0108c74>] (file_read_actor+0x0/0xfc)from [<c0123090>] (shmem_file_aio_read+0x13c/0x3a0) 

<4>[2015:07:2421:36:21] [  399.134562] [2652.0sh][<c0122f54>] (shmem_file_aio_read+0x0/0x3a0) from [<c0148580>](do_sync_read+0xac/0xdc)

<4>[2015:07:2421:36:21] [  399.134593] [2652.0sh][<c01484d4>] (do_sync_read+0x0/0xdc) from [<c013e3ac>](ashmem_read+0x60/0x98)

<4>[2015:07:2421:36:21] [  399.134618] [2652.0sh][<c013e34c>] (ashmem_read+0x0/0x98)from [<c0148d28>] (vfs_read+0xa4/0x148) 这里正在读数据

<4>[2015:07:2421:36:21] [  399.134636] [2652.0 sh]r7:ceb57f78 r6:42f6300c r5:0001f2e8 r4:ce859640

<4>[2015:07:2421:36:21] [  399.134667] [2652.0sh][<c0148c84>] (vfs_read+0x0/0x148) from [<c0148e0c>](sys_read+0x40/0x78)

<4>[2015:07:2421:36:21] [  399.134684] [2652.0 sh]r8:0001f2e8 r7:42f6300c r6:ce859640 r5:00000000 r4:00000000

<4>[2015:07:2421:36:21] [  399.134719] [2652.0sh][<c0148dcc>] (sys_read+0x0/0x78) from [<c0057fc0>](ret_fast_syscall+0x0/0x30)

<4>[2015:07:2421:36:21] [  399.134736] [2652.0 sh]r8:c0058168 r7:00000003 r6:7151c400 r5:537025cc r4:47428940

       难道这里占着锁,对照代码查看

static ssize_tashmem_read(struct file *file, char __user *buf,

                           size_t len, loff_t*pos)

{

        struct ashmem_area *asma =file->private_data;

        int ret = 0;

        mutex_lock(&ashmem_mutex);

        /* If size is not set, or set to 0,always return EOF. */

        if (asma->size == 0) {

                goto out;

        }

        if (!asma->file) {

                ret = -EBADF;

                goto out;

        }

        ret = asma->file->f_op->read(asma->file, buf, len, pos);

        if (ret < 0) {

                goto out;

        }

        /** Update backing file pos, sincef_ops->read() doesn't */

        asma->file->f_pos = *pos;

out:

        mutex_unlock(&ashmem_mutex);

        return ret;

}

       果然,read函数调用是在持有锁且并没有释放的时候执行的,因为read函数有可能导致触发缺页,那么就存在长时间卡在这里的风险了。问题是4.4版本为什么没有这个问题呢,我理解是升级到L版本后,内存需求变大了,发生缺页的概率提高,导致因为这把锁出现D状态的概率变大了。

修改:

这个问题应该是3.0.8原生内核的bug,对照3.10内核发现已经进行了修正,因此可以仿照新内核版本进行修改,所有可能发生缺页的地方都修改下。这里主要有几个地方

1.  ashmem_read函数,这里的read函数要放在锁之后进行

2.     set_name和get_name函数因为调用copy_from_user和copy_to_user也可能会导致缺页,也需要用先用局部变量缓冲下,再上锁操作局部变量。

3.     除了上述的修改外,也作了些空指针的保护

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值