信息量爆表的struct page->flags

先把kernel4.15的一段注释截图送上:

很明显我关注的是之前crash中0x1ffff0000000000与0x5ffff0000000000的含义,然后再来一个神级大图:

上面这个图很不错,但是对于不同的平台通过配置了.config中的参数如CONFIG_NODES_SHIFT、CONFIG_NR_CPUS等,flags中各段占据的字节数和bit分段点是不一样的,不过安排的顺序是一样的且section、node、zone、last_cpuid都从最高位向下排列,0x0-0x18bit则是固定,最高位显示section,然后是node,之后是zone和last_cpuid,如果section段占0bit,则最高位其实是从node开始。我查了一下所在硬件平台的相关配置,section段由于配置了CONFIG_SPARSEMEM_VMEMMAP,所以section段长度是0bit;然后node段由于CONFIG_NODES_SHIFT=6,所以node段长6bit。而zone段由于MAX_NR_ZONES的值,到page-flags-layout.h转换得到,而MAX_NR_ZONES是根据enum zone_type最终定义的ZONE_*数量决定,当前平台有ZONE_DMA(为不能全局寻址的DMA设备,不同平台大小不一样),ZONE_NORMAL(必有,就是一般内存所在),ZONE_MOVABLE(必有);没有ZONE_DMA32(用于x86_64兼容16M和4G寻址的DMA设备),ZONE_HIGHMEM(在kernel没有映射全内存域的年代,HIGHMEM是指kernel需要另做映射才能访问的内存域),ZONE_DEVICE。因此当前平台MAX_NR_ZONES为3,大于2且小于4,转换得到zone段长度为2。last_cpuid段为根据由于平台CONFIG_NUMA_BALANCING打开,应该是LAST__PID_SHIFT(8)+LAST__CPU_SHIFT(NR_CPUS_BITS为log2(CONFIG_NR_CPUS)也就是log2(256)是8),last_cpuid段长度为16。

当前平台的flags中比特划分:

| NODE | ZONE | LAST_CPUPID ... | FLAGS |

其中node有效6bit,占据bit58-63,zone有效2bit,占据bit56-57,last_cpuid16bit,占据bit40-55,也就是说高24bit即bit40-63是有效的,低25bit也是有效的即bit0-24是有效的。

很明显trace中给出的flags的低32bit是0,然后换算一下高32bit,大概就知道struct page->flags为0x01ffff0000000000的高32bit是‭0b00000001111111111111111100000000‬,bit40-56是置位状态,从last_cpuid段对应的是bit40-55皆为1,值为0xff,这是被reset了,有代码为证:

zone段在bit56-57,bit56为1当做0b01处理,属于ZONE_NORMAL,为啥不是当做0b10也是有函数为证:

然后flags值0x5ffff0000000000的解读也差不多,只比0x01ffff0000000000在bit58多一个1,也就是node段的值为1,也有函数为证:

这两个flags的信息就榨取完毕了,不过由于其bit0-24为0,没有认真讨论这25个bit的内容,这25个bit的含义在kernel的page-flag.h有注释的很好,它们分别是:PG_locked, PG_error, PG_referenced, PG_uptodate, PG_dirty, PG_lru, PG_active, PG_waiters, /* Page has waiters, check its waitqueue. Must be bit #7 and in the same byte as "PG_locked" */ PG_slab, PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ PG_arch_1, PG_reserved, PG_private, /* If pagecache, has fs-private data */ PG_private_2, /* If pagecache, has fs aux data */ PG_writeback, /* Page is under writeback */ PG_head, /* A head page */ PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ PG_swapbacked, /* Page is backed by RAM/swap */ PG_unevictable, /* Page is "unevictable"  */

以及下面一点受到宏控制的:

#ifdef CONFIG_MMU PG_mlocked, /* Page is vma mlocked */ #endif

#ifdef CONFIG_ARCH_USES_PG_UNCACHED PG_uncached, /* Page has been mapped as uncached */ #endif

#ifdef CONFIG_MEMORY_FAILURE PG_hwpoison, /* hardware poisoned page. Don't touch */ #endif

#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)

PG_young,

PG_idle,

#endif

啊啊啊啊啊,都罗列了,怎么能不解释一下!!!烦,直接把kernel中的注释粘来吧:

PG_reserved is set for special pages, which can never be swapped out. Some of them might not even exist...

 The PG_private bitflag is set on pagecache pages if they contain filesystem specific data (which is normally at page->private). It can be used by private allocations for its own usage.

PG_writeback. During initiation of disk I/O, PG_locked is set. This bit is set before I/O and cleared when writeback _starts_ or when read _completes_. PG_writeback is set before writeback starts and cleared when it finishes.

PG_locked also pins a page in pagecache, and blocks truncation of the file while it is held.

PG_waiters(原为page_waitqueue(page),我猜是后来改了字段没改注释) is a wait queue of all tasks waiting for the page to become unlocked.

PG_uptodate tells whether the page's contents is valid.  When a read completes, the page becomes uptodate, unless a disk I/O error happened.

PG_referenced, PG_reclaim are used for page reclaim for anonymous and file-backed pagecache (see mm/vmscan.c).

PG_error is set to indicate that an I/O error occurred on this page.

PG_arch_1 is an architecture specific page state bit.  The generic code guarantees that this bit is cleared for a page when it first is entered into the page cache.

PG_highmem pages are not permanently mapped into the kernel virtual address space, they need to be kmapped separately for doing IO on the pages.  The struct page (these bits with information) are always mapped into kernel address space...

PG_hwpoison indicates that a page got corrupted in hardware and contains data with incorrect ECC bits that triggered a machine check. Accessing is not safe since it may cause another machine check. Don't touch!

就酱,结束!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值