The symbol of sys_call_table is no longer exported since Linux 2.6.x, we have to HACK it
http://www.elliotbradbury.com/linux-syscall-hooking-interrupt-descriptor-table/
Ever since the system call table stopped being exported in the linux kernel it has been some what of a pain to hook system calls in the traditional manner - that is - by directly modifying the pointer entries of sys_call_table
. And let me first say that this article does not contain some new, magical method of doing so. The methods in this article have been used before to intercept syscalls, but in my search for answers to this dilemma, I found that most sources were outdated or incomplete.
So here I present to you a complete kernel module which intercepts the uname
syscall by modifying the sys_call_table
. Topics covered include:
An introduction to the Interrupt Descriptor Table and Software Interrupts as a method of calling the kernel
- Locating the Interrupt Descriptor Table (IDT) using inline x86 assembly from C
- Locating the linux system call Interrupt Handling Routine via the IDT
- Scanning the
INT 0x80
(system call) handler to locate the address of the linux system call table
We'll also touch on memory page protection when we get to the point of actually modifying thesys_call_table
.
First let's talk briefly about interrupts. Interrupts are a way of getting the CPU's attention. They come from either hardware or software. In this case, we're just interested in software interrupts. Programs may raise a number of different software interrupts using the x86 INT
instruction. The INT
instruction takes exactly one operand, which is the interrupt number that you would like to raise. The two most common software interrupts are INT 3
and INT 0x80
. The former is commonly used for debugging programs (and is of little interest to us) and the latter is a special mechanism available to user space programs providing a "gateway" to the kernel. In practice, it allows users to make system calls, like write()
or fork()
directly, as opposed to using wrapper functions in libc and elsewhere. Below is a simple "Hello World" example of using INT 0x80
to make a system call.
System Calls from Assembly
section .text
global main
HW db 'Hello World', 0xa
main:
mov edx, 12
mov ecx, HW
mov ebx, 1
mov eax, 4
int 0x80
xor eax,eax
int 0x80
In this example, the write()
system call is being used. The string length, string pointer, output stream, and system call number are stored in edx
, ecx
, ebx
, and eax
respectively. Finally, INT 0x80
is called and write()
is executed. The real question is, what happens after INT 0x80
is called and before the next instruction is executed?
Well, you may have been wondering what significance the argument to the INT
instruction has. Let me answer this question by introducing something called the Interrupt Descriptor Table (IDT), also known as the Interrupt {Vector, Routine, Handler, etc.} Table. The IDT is nothing more than a special array of entries that describe various event handling routines. When I say events, I mean interrupts (software and hardware), exceptions, and traps. The argument passed to the INT
instruction is nothing more than an offset into this array. So INT 0x80
looks up the 0x80th (128th) interrupt gate (table entry) in the IDT and executes it.
So what exactly is the 0x80th interrupt gate?
Every time the linux bootstraps itself, it populates the IDT. Interrupt gates are created for various devices and exceptions that may occur. Things like page faults and division by zero are handled through the IDT. The INT 0x80th handler is just like any other handler except it is called from user-space with the sole purpose of entering kernel-space and basically utilizing the kernel API.
Interrupt 0x80 Handler
Let's take a look at part of the INT 0x80th handler, straight from arch/x86/kernel/entry_32.S in the kernel source tree.
sysenter_do_call:
cmpl $(nr_syscalls), %eax
jae syscall_badsys
call *sys_call_table(,%eax,4)
...
Recall that when making a system call, eax stores the syscall number. The first two lines ofsysenter_do_call
check to make sure that eax
contains a number within the bounds of thesys_call_table
, and if it is valid, executes the syscall. Note that each "entry" in the sys_call_table
is a pointer, which is 4 bytes, so eax
is multiplied by 4 to obtain the correct offset. So when the instruction INT 0x80
is executed, the 0x80th interrupt handler is triggered, which in linux is the system call handler. Inside of the system call handler, the value passed in eax
is used as an offset into the sys_call_table
. As mentioned before, the sys_call_table
is merely a list of pointers to available system calls (functions).
What if we could change those pointers?
Modifying the Linux Syscall Table
The code below is a Linux kernel module that locates the system call table and hooks the uname
syscall upon initialization. The original syscall table is restored when the module is removed.
This code is x86 specific and should be run in a VM to avoid any (un)happy accidents.
To hook a syscall, the code does the following:
- Locates the Interrupt Descriptor Table using the
sidt
instruction. - Locates the syscall handler routine through the IDT.
- Locates the system call table (
sys_call_table
) by scanning for a known code pattern in memory in the syscall handler. - Saves the state of the
sys_call_table
. - Disables memory protection on the
sys_call_table
. - Overwrites entries in the
sys_call_table
with pointers to the hooked functions.
Kernel Module Code that Hooks uname
/*
* This kernel module locates the sys_call_table by scanning
* the system_call interrupt handler (int 0x80)
*
* Author: Elliot Bradbury 2010
*/
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/unistd.h>
#include <linux/utsname.h>
#include <asm/pgtable.h>
MODULE_LICENSE("GPL");
// see desc_def.h and desc.h in arch/x86/include/asm/
// and arch/x86/kernel/syscall_64.c
typedef void (*sys_call_ptr_t)(void);
typedef asmlinkage long (*orig_uname_t)(struct new_utsname *);
void hexdump(unsigned char *addr, unsigned int length) {
unsigned int i;
for(i = 0; i < length; i++) {
if(!((i+1) % 16)) {
printk("%02x\n", *(addr + i));
} else {
if(!((i+1) % 4)) {
printk("%02x ", *(addr + i));
} else {
printk("%02x ", *(addr + i));
}
}
}
if(!((length+1) % 16)) {
printk("\n");
}
}
// fptr to original uname syscall
orig_uname_t orig_uname = NULL;
// test message
char *msg = "All ur base r belong to us";
asmlinkage long hooked_uname(struct new_utsname *name) {
orig_uname(name)
strncpy(name->sysname, msg, 27);
return 0;
}
// and finally, sys_call_table pointer
sys_call_ptr_t *_sys_call_table = NULL;
// memory protection shinanigans
unsigned int level;
pte_t *pte;
// initialize the module
int init_module() {
printk("+ Loading module\n");
// struct for IDT register contents
struct desc_ptr idtr;
// pointer to IDT table of desc structs
gate_desc *idt_table;
// gate struct for int 0x80
gate_desc *system_call_gate;
// system_call (int 0x80) offset and pointer
unsigned int _system_call_off;
unsigned char *_system_call_ptr;
// temp variables for scan
unsigned int i;
unsigned char *off;
// store IDT register contents directly into memory
asm ("sidt %0" : "=m" (idtr));
// print out location
printk("+ IDT is at %08x\n", idtr.address);
// set table pointer
idt_table = (gate_desc *) idtr.address;
// set gate_desc for int 0x80
system_call_gate = &idt_table[0x80];
// get int 0x80 handler offset
_system_call_off = (system_call_gate->a & 0xffff) | (system_call_gate->b & 0xffff0000);
_system_call_ptr = (unsigned char *) _system_call_off;
// print out int 0x80 handler
printk("+ system_call is at %08x\n", _system_call_off);
// print out the first 128 bytes of system_call() ...notice pattern below
hexdump((unsigned char *) _system_call_off, 128);
// scan for known pattern in system_call (int 0x80) handler
// pattern is just before sys_call_table address
for(i = 0; i < 128; i++) {
off = _system_call_ptr + i;
if(*(off) == 0xff && *(off+1) == 0x14 && *(off+2) == 0x85) {
_sys_call_table = *(sys_call_ptr_t **)(off+3);
break;
}
}
// bail out if the scan came up empty
if(_sys_call_table == NULL) {
printk("- unable to locate sys_call_table\n");
return 0;
}
// print out sys_call_table address
printk("+ found sys_call_table at %08x!\n", _sys_call_table);
// now we can hook syscalls ...such as uname
// first, save the old gate (fptr)
orig_uname = (orig_uname_t) _sys_call_table[__NR_uname];
// unprotect sys_call_table memory page
pte = lookup_address((unsigned long) _sys_call_table, &level);
// change PTE to allow writing
set_pte_atomic(pte, pte_mkwrite(*pte));
printk("+ unprotected kernel memory page containing sys_call_table\n");
// now overwrite the __NR_uname entry with address to our uname
_sys_call_table[__NR_uname] = (sys_call_ptr_t) hooked_uname;
printk("+ uname hooked!\n");
return 0;
}
void cleanup_module() {
if(orig_uname != NULL) {
// restore sys_call_table to original state
_sys_call_table[__NR_uname] = (sys_call_ptr_t) orig_uname;
// reprotect page
set_pte_atomic(pte, pte_clear_flags(*pte, _PAGE_RW));
}
printk("+ Unloading module\n");
}
Building and Running the Code
Note: This module unprotects and overwrites portions of kernel memory space. I recommend testing it in a VM.
git clone https://github.com/ebradbury/linux-syscall-hooker.git
cd linux-syscall-hooker
make
insmod ./my_module.ko
uname
If it worked, the output of uname
should be "All ur base r belong to us". If it didn't work, check dmesg
for details.