The Kernel Samepage Merging Process

http://alouche.net/blog/2011/07/18/the-kernel-samepage-merging-process/

KSM, simply put is a service daemon which scans the page addresses to find duplicate pages, merges them and therefore reduces the memory density. The code used in this post as example can be found under /mm/ksm.c in the kernel source.

Before continuing, it is important to keep in mind that:

  • KSM uses a red-black tree for the stable and unstable trees - efficiency is  O(log n)  per tree since the height can never be more than  (2log (n+1))  with n being the number of nodes.
  • KSM only scans anonymous pages, file backed pages such as HugePages are not scanned and cannot be merged by KSM. This is different to Transparent Huge pages where as in RedHat 6.1, KSM will break up THP into small pages if shareable 4K pages are found and only if the system is running out of memory.
  • Merged pages are read-only as they are CoW protected.
  • Userspace application can register candidate regions for merging through the madwise() system call. We will not tackle the KSM API details in this post.
  • Because of the CoW nature, a merged page write action by an application will raise a page fault, which in return triggers the break_cow() routine, which issue a copy of the merged page to the writing application.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
static void break_cow(struct rmap_item *rmap_item)
{
         struct mm_struct *mm = rmap_item->mm;
         unsigned long addr = rmap_item->address;
         struct vm_area_struct *vma;

  /*
  * It is not an accident that whenever we want to break COW
  * to undo, we also need to drop a reference to the anon_vma.
  */
         put_anon_vma(rmap_item->anon_vma);

         down_read(&mm->mmap_sem);
         if (ksm_test_exit(mm))
                 goto out;
         vma = find_vma(mm, addr);
         if (!vma || vma->vm_start > addr)
                 goto out;
         if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma)
                 goto out;
         break_ksm(vma, addr);
 out:
         up_read(&mm->mmap_sem);
 }

With this in mind, here is a summarized view on how KSM works per steps

KSM scan pages and elects whether a page could be considered to be merged… these pages are referred as “candidate” pages. To quickly state it, a candidate page which does not exists in the stable tree is added as a node to the unstable tree, but we will get to this later in the post. To determine if a page has changed or not, KSM relies on a 32bit checksum, which is then added to the page content and evaluated on the next scan.

1
2
3
4
5
6
7
8
static u32 calc_checksum(struct page *page)
{
         u32 checksum;
         void *addr = kmap_atomic(page, KM_USER0);
         checksum = jhash2(addr, PAGE_SIZE / 4, 17);
         kunmap_atomic(addr, KM_USER0);
         return checksum;
}

In other words, KSM finds page X, creates a checksum, stores it in page X - on the next scan, if the checksum of page X did not change, then it is considered as a candidate page.

For each candidate page, KSM starts a memcmp_pages() operation to the stable tree which contains the merged pages.

1
2
3
4
5
6
7
8
9
10
11
12
static int memcmp_pages(struct page *page1, struct page *page2)
{
         char *addr1, *addr2;
         int ret;

         addr1 = kmap_atomic(page1, KM_USER0);
         addr2 = kmap_atomic(page2, KM_USER1);
         ret = memcmp(addr1, addr2, PAGE_SIZE);
         kunmap_atomic(addr2, KM_USER1);
         kunmap_atomic(addr1, KM_USER0);
         return ret;
 }

This unique process is as follow:

1
2
3
4
5
6
7
8
9
10
ret = memcmp_pages(page, tree_page);

if (ret < 0) {
  put_page(tree_page);
node = node->rb_left;
} else if (ret > 0) {
put_page(tree_page);
node = node->rb_right;
} else
return tree_page;

Understanding the following requires an understanding of how a binary tree works in general, more specifically how a red-black tree works.

The stable tree is walked left if the candidate page is less than the page in the stable tree, right if the candidate page is superior to the stable page and the page is simply merge and the candidate page freed if both pages are identical.

The stable tree search function is referenced at http://lxr.free-electrons.com/source/mm/ksm.c#L985

Now if the candidate page was not found in the stable tree, its checksum is re-computed to determine whether the data has changed since or not. If it has changed it is then ignored; if not, the searching process continues in the unstable tree as with the search in the stable tree. The recursion __unstable_tree_search_insert() __can be seen at http://lxr.free-electrons.com/source/mm/ksm.c#L1078.

While searching the unstable tree, KSM will create a new node in this binary tree if the candidate page is unique

1
2
3
4
rmap_item->address |= UNSTABLE_FLAG;
rmap_item->address |= (ksm_scan.seqnr & SEQNR_MASK);
rb_link_node(&rmap_item->node, parent, new);
rb_insert_color(&rmap_item->node, &root_unstable_tree);

and if not unique such as the unstable tree contains similar candidate pages, it will be merged to the existing similar node and moved to the stable tree.

Once the KSM scan is done, the unstable tree is destroyed and recreated on the next iteration

I hope that was informative.

Cheers,

在操作系统的内存管理中,虚拟内存是一个关键概念,它允许系统使用硬盘空间作为临时存储来扩展可用的物理内存。虚拟内存与物理内存之间的关系错综复杂,而理解这一概念对于实现有效的内存管理至关重要。为了帮助你更好地理解虚拟内存的实现和优化,我推荐你查看这份资料:《操作系统内存管理(共53张PPT).pptx》。这份PPT资料将为你提供详细的理论知识和案例分析,与你的问题紧密相关。 参考资源链接:[操作系统内存管理(共53张PPT).pptx](https://wenku.csdn.net/doc/14gh2ukbrn) 在实际项目中,虚拟内存的实现通常涉及分页机制,每个进程拥有自己的地址空间,由页表映射到物理内存。为了优化虚拟内存与物理内存的交互,可以采取如下措施: 1. 页面置换算法:选择合适的页面置换算法来决定哪些内存页面需要被替换,例如最近最少使用(LRU)算法。 2. 内存分配策略:采用适当的技术,如内存分页和分段,以及多级页表,来高效地管理内存分配。 3. 内存映射:通过内存映射文件技术,可以将文件内容直接映射到进程的地址空间,减少内存复制。 4. 内存压缩:在物理内存紧张时,可以采用压缩技术减少内存占用,例如通过KSMKernel Samepage Merging)技术。 5. 调度策略:采用合适的进程调度算法,合理安排进程的运行,避免内存抖动现象。 通过这些方法,可以在保证系统稳定运行的同时,提高内存使用效率。在项目实践中,理解并应用这些技术对于开发高效的内存管理系统至关重要。进一步的学习和实践,建议深入研究《操作系统内存管理(共53张PPT).pptx》中的高级主题,这将帮助你全面掌握内存管理的策略和技巧。 参考资源链接:[操作系统内存管理(共53张PPT).pptx](https://wenku.csdn.net/doc/14gh2ukbrn)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值