Race between uselib and msync
One of the interesting features of do_brk is that it can also unmap memory, if called on a region that already posesses a mapping. Therefore, the non-sempahore-protected do_brk in the uselib system call can be used prematurely unmap an arbitrary region of process memory. This comes in handy with the system call msync.
The race window is in msync_interval:
static int msync_interval(struct vm_area_struct * vma,
unsigned long start, unsigned long end, int flags)
{
int ret = 0;
struct file * file = vma->vm_file;
if ( (flags & MS_INVALIDATE) && (vma->vm_flags & VM_LOCKED) )
return -EBUSY;
if (file && (vma->vm_flags & VM_SHARED)) {
ret = filemap_sync(vma, start, end-start, flags);
if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {
struct inode * inode = file->f_dentry->d_inode;
down(&inode->i_sem);
ret = filemap_fdatasync(inode->i_mapping);
munmap_back:
vma = find_vma_prepare(mm, addr, &prev, &rb_link, &rb_parent);
if (vma && vma->vm_start < addr + len) {
if (do_munmap(mm, addr, len))
return -ENOMEM;
Within do_munmap, fput is called, which nulls out f_dentry.
Let's call the address of the contents contents_addr and the address of the bss bss_addr. Our evil lib will have 1 page of contents and 1 page of bss. The interleaving is as follows:
- t1: mmap 1 page at bss_addr that is backed by some file fd1
- t1: uselib->mmap contents at contents_addr
- t2: msync 1 page at bss_addr, SLEEP right before grabbing dentry pointer
- t1: uselib->do_brk->do_munmap at bss_addr
- t1: SLEEP sometime before vma_link
- t2: msync comes out of sleep now and f_dentry has been nulled out
--- /usr/src/kernel-source-2.4.26/mm/mmap.c 2004-02-18 08:36:32.000000000 -0500
+++ mmap.c 2015-11-28 08:57:50.232849728 -0500
@@ -19,6 +19,10 @@
#include <asm/uaccess.h>
#include <asm/pgalloc.h>
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
/*
* WARNING: the debugging will use recursive algorithms so never enable this
* unless you know what you are doing.
@@ -965,6 +969,10 @@
npp = (prev ? &prev->vm_next : &mm->mmap);
free = NULL;
+ if (addr == 0x60000000 && len == 4096) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(1 * HZ);
+ }
spin_lock(&mm->page_table_lock);
for ( ; mpnt && mpnt->vm_start < addr+len; mpnt = *npp) {
*npp = mpnt->vm_next;
--- /usr/src/kernel-source-2.4.26/mm/filemap.c 2004-08-24 12:46:28.000000000 -0400
+++ filemap.c 2015-11-28 08:57:57.566734808 -0500
@@ -30,6 +30,10 @@
#include <linux/highmem.h>
+/* sleep patch */
+#include <asm-i386/param.h>
+#include <linux/sched.h>
+
/*
* Shared mappings implemented 30.11.1994. It's not fully working yet,
* though.
@@ -2353,6 +2357,8 @@
return -EBUSY;
if (file && (vma->vm_flags & VM_SHARED)) {
+ set_current_state(TASK_UNINTERRUPTIBLE);
+ schedule_timeout(2 * HZ);
ret = filemap_sync(vma, start, end-start, flags);
if (!ret && (flags & (MS_SYNC|MS_ASYNC))) {
The Linux kernel uses a data-structure to keep track of each physical page frame. This data-structure is called a ‘struct page’. These ‘struct page’ objects form an array, named ‘mem_map[]’.
/boot $ grep ‘mem_map’ System.map-2.4.26
After writepage() in filemap_fdatasync, we have a valid_page check in page_cache_release():
if (!VALID_PAGE(page))
BUG();
VALID_PAGE is a marco defined like this:
((page - mem_map) < max_mapnr)
It's impossible for our fake page to pass this check. Kernel will panic.