# Hack The Virtual Memory: C strings & /proc (1/4)

Posted by Julien Barbier

https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/

## Intro

### Hack The Virtual Memory, Chapter 0: Play with C strings & /proc

This is the first in a series of small articles / tutorials based around virtual memory. The goal is to learn some CS basics, but in a different and more practical way.

For this first piece, we’ll use /proc to find and modify variables (in this example, an ASCII string) contained inside the virtual memory of a running process, and learn some cool things along the way.

## Environment

All scripts and programs have been tested on the following system:

• Ubuntu 14.04 LTS
• Linux ubuntu 4.4.0-31-generic #50~14.04.1-Ubuntu SMP Wed Jul 13 01:07:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
• gcc
• gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
• Python 3:
• Python 3.4.3 (default, Nov 17 2016, 01:08:31)
• [GCC 4.8.4] on linux

## Prerequisites

• The basics of the C programming language
• Some Python
• The very basics of the Linux filesystem and the shell

## Virtual Memory

In computing, virtual memory is a memory management technique that is implemented using both hardware and software. It maps memory addresses used by a program, called virtual addresses, into physical addresses in computer memory. Main storage (as seen by a process or task) appears as a contiguous address space, or collection of contiguous segments. The operating system manages virtual address spaces and the assignment of real memory to virtual memory. Address translation hardware in the CPU, often referred to as a memory management unit or MMU, automatically translates virtual addresses to physical addresses. Software within the operating system may extend these capabilities to provide a virtual address space that can exceed the capacity of real memory and thus reference more memory than is physically present in the computer.

The primary benefits of virtual memory include freeing applications from having to manage a shared memory space, increased security due to memory isolation, and being able to conceptually use more memory than might be physically available, using the technique of paging.

In chapter 2, we’ll go into more details and do some fact checking on what lies inside the virtual memory and where. For now, here are some key points you should know before you read on:

• Each process has its own virtual memory
• The amount of virtual memory depends on your system’s architecture
• Each OS handles virtual memory differently, but for most modern operating systems, the virtual memory of a process looks like this:

In the high memory addresses you can find (this is a non exhaustive list, there’s much more to be found, but that’s not today’s topic):

• The command line arguments and environment variables
• The stack, growing “downwards”. This may seem counter-intuitive, but this is the way the stack is implemented in virtual memory

In the low memory addresses you can find:

• Your executable (it’s a little more complicated than that, but this is enough to understand the rest of this article)
• The heap, growing “upwards”

The heap is a portion of memory that is dynamically allocated (i.e. containing memory allocated using malloc).

Also, keep in mind that virtual memory is not the same as RAM.

## C program

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/**
* main - uses strdup to create a new string, and prints the
* address of the new duplcated string
*
* Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
*/
int main(void)
{
char *s;

s = strdup("Holberton");
if (s == NULL)
{
fprintf(stderr, "Can't allocate mem with malloc\n");
return (EXIT_FAILURE);
}
printf("%p\n", (void *)s);
return (EXIT_SUCCESS);
}


### strdup

Take a moment to think before going further. How do you think strdup creates a copy of the string “Holberton”? How can you confirm that?

.

.

.

strdup has to create a new string, so it first has to reserve space for it. The function strdup is probably using malloc. A quick look at its man page can confirm:

DESCRIPTION
The  strdup()  function returns a pointer to a new string which is a duplicate of the string s.
Memory for the new string is obtained with malloc(3), and can be freed with free(3).


Take a moment to think before going further. Based on what we said earlier about virtual memory, where do you think the duplicate string will be located? At a high or low memory address?

.

.

.

Probably in the lower addresses (in the heap). Let’s compile and run our small C program to test our hypothesis:

julien@holberton:~/holberton/w/hackthevm0$gcc -Wall -Wextra -pedantic -Werror main.c -o holberton julien@holberton:~/holberton/w/hackthevm0$ ./holberton
0x1822010
julien@holberton:~/holberton/w/hackthevm0$./loop [0] Holberton (0xfbd010) [1] Holberton (0xfbd010) [2] Holberton (0xfbd010) [3] Holberton (0xfbd010) [4] Holberton (0xfbd010) [5] Holberton (0xfbd010) [6] Holberton (0xfbd010) [7] Holberton (0xfbd010) ...  If you would like, pause the reading now and try to write a script or program that finds a string in the heap of a running process before reading further. . . . ### looking at /proc Let’s run our loop program. julien@holberton:~/holberton/w/hackthevm0$ ./loop
[0] Holberton (0x10ff010)
[1] Holberton (0x10ff010)
[2] Holberton (0x10ff010)
[3] Holberton (0x10ff010)
...


The first thing we need to find is the PID of the process.

julien@holberton:~/holberton/w/hackthevm0$ps aux | grep ./loop | grep -v grep julien 4618 0.0 0.0 4332 732 pts/14 S+ 17:06 0:00 ./loop  In the above example, the PID is 4618 (it will be different each time we run it, and it is probably a different number if you are trying this on your own computer). As a result, the maps and mem files we want to look at are located in the /proc/4618 directory: • /proc/4618/maps • /proc/4618/mem A quick ls -la in the directory should give you something like this: julien@ubuntu:/proc/4618$ ls -la
total 0
dr-xr-xr-x   9 julien julien 0 Mar 15 17:07 .
dr-xr-xr-x 257 root   root   0 Mar 15 10:20 ..
dr-xr-xr-x   2 julien julien 0 Mar 15 17:11 attr
-rw-r--r--   1 julien julien 0 Mar 15 17:11 autogroup
-r--------   1 julien julien 0 Mar 15 17:11 auxv
-r--r--r--   1 julien julien 0 Mar 15 17:11 cgroup
--w-------   1 julien julien 0 Mar 15 17:11 clear_refs
-r--r--r--   1 julien julien 0 Mar 15 17:07 cmdline
-rw-r--r--   1 julien julien 0 Mar 15 17:11 comm
-rw-r--r--   1 julien julien 0 Mar 15 17:11 coredump_filter
-r--r--r--   1 julien julien 0 Mar 15 17:11 cpuset
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 cwd -> /home/julien/holberton/w/funwthevm
-r--------   1 julien julien 0 Mar 15 17:11 environ
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 exe -> /home/julien/holberton/w/funwthevm/loop
dr-x------   2 julien julien 0 Mar 15 17:07 fd
dr-x------   2 julien julien 0 Mar 15 17:11 fdinfo
-rw-r--r--   1 julien julien 0 Mar 15 17:11 gid_map
-r--------   1 julien julien 0 Mar 15 17:11 io
-r--r--r--   1 julien julien 0 Mar 15 17:11 limits
-rw-r--r--   1 julien julien 0 Mar 15 17:11 loginuid
dr-x------   2 julien julien 0 Mar 15 17:11 map_files
-r--r--r--   1 julien julien 0 Mar 15 17:11 maps
-rw-------   1 julien julien 0 Mar 15 17:11 mem
-r--r--r--   1 julien julien 0 Mar 15 17:11 mountinfo
-r--r--r--   1 julien julien 0 Mar 15 17:11 mounts
-r--------   1 julien julien 0 Mar 15 17:11 mountstats
dr-xr-xr-x   5 julien julien 0 Mar 15 17:11 net
dr-x--x--x   2 julien julien 0 Mar 15 17:11 ns
-r--r--r--   1 julien julien 0 Mar 15 17:11 numa_maps
-rw-r--r--   1 julien julien 0 Mar 15 17:11 oom_adj
-r--r--r--   1 julien julien 0 Mar 15 17:11 oom_score
-rw-r--r--   1 julien julien 0 Mar 15 17:11 oom_score_adj
-r--------   1 julien julien 0 Mar 15 17:11 pagemap
-r--------   1 julien julien 0 Mar 15 17:11 personality
-rw-r--r--   1 julien julien 0 Mar 15 17:11 projid_map
lrwxrwxrwx   1 julien julien 0 Mar 15 17:11 root -> /
-rw-r--r--   1 julien julien 0 Mar 15 17:11 sched
-r--r--r--   1 julien julien 0 Mar 15 17:11 schedstat
-r--r--r--   1 julien julien 0 Mar 15 17:11 sessionid
-rw-r--r--   1 julien julien 0 Mar 15 17:11 setgroups
-r--r--r--   1 julien julien 0 Mar 15 17:11 smaps
-r--------   1 julien julien 0 Mar 15 17:11 stack
-r--r--r--   1 julien julien 0 Mar 15 17:07 stat
-r--r--r--   1 julien julien 0 Mar 15 17:11 statm
-r--r--r--   1 julien julien 0 Mar 15 17:07 status
-r--------   1 julien julien 0 Mar 15 17:11 syscall
dr-xr-xr-x   3 julien julien 0 Mar 15 17:11 task
-r--r--r--   1 julien julien 0 Mar 15 17:11 timers
-rw-r--r--   1 julien julien 0 Mar 15 17:11 uid_map
-r--r--r--   1 julien julien 0 Mar 15 17:11 wchan


### /proc/pid/maps

As we have seen earlier, the /proc/pid/maps file is a text file, so we can directly read it. The content of the maps file of our process looks like this:

julien@ubuntu:/proc/4618$cat maps 00400000-00401000 r-xp 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 00600000-00601000 r--p 00000000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 00601000-00602000 rw-p 00001000 08:01 1070052 /home/julien/holberton/w/funwthevm/loop 010ff000-01120000 rw-p 00000000 00:00 0 [heap] 7f144c052000-7f144c20c000 r-xp 00000000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 7f144c20c000-7f144c40c000 ---p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 7f144c40c000-7f144c410000 r--p 001ba000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 7f144c410000-7f144c412000 rw-p 001be000 08:01 136253 /lib/x86_64-linux-gnu/libc-2.19.so 7f144c412000-7f144c417000 rw-p 00000000 00:00 0 7f144c417000-7f144c43a000 r-xp 00000000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 7f144c61e000-7f144c621000 rw-p 00000000 00:00 0 7f144c636000-7f144c639000 rw-p 00000000 00:00 0 7f144c639000-7f144c63a000 r--p 00022000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 7f144c63a000-7f144c63b000 rw-p 00023000 08:01 136229 /lib/x86_64-linux-gnu/ld-2.19.so 7f144c63b000-7f144c63c000 rw-p 00000000 00:00 0 7ffc94272000-7ffc94293000 rw-p 00000000 00:00 0 [stack] 7ffc9435e000-7ffc94360000 r--p 00000000 00:00 0 [vvar] 7ffc94360000-7ffc94362000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]  Circling back to what we said earlier, we can see that the stack ([stack]) is located in high memory addresses and the heap ([heap]) in the lower memory addresses. ### [heap] Using the maps file, we can find all the information we need to locate our string: 010ff000-01120000 rw-p 00000000 00:00 0 [heap]  The heap: • Starts at address 0x010ff000 in the virtual memory of the process • Ends at memory address: 0x01120000 • Is readable and writable (rw) A quick look back to our (still running) loop program: ... [1024] Holberton (0x10ff010) ...  -> 0x010ff000 < 0x10ff010 < 0x01120000. This confirms that our string is located in the heap. More precisely, it is located at index 0x10 of the heap. If we open the /proc/pid/mem/ file (in this example /proc/4618/mem) and seek to the memory address 0x10ff010, we can write to the heap of the running process, overwriting the “Holberton” string! Let’s write a script or program that does just that. Choose your favorite language and let’s do it! If you would like, stop reading now and try to write a script or program that finds a string in the heap of a running process, before reading further. The next paragraph will give away the source code of the answer! . . . ### Overwriting the string in the virtual memory We’ll be using Python 3 for writing the script, but you could write this in any language. Here is the code: #!/usr/bin/env python3 ''' Locates and replaces the first occurrence of a string in the heap of a process Usage: ./read_write_heap.py PID search_string replace_by_string Where: - PID is the pid of the target process - search_string is the ASCII string you are looking to overwrite - replace_by_string is the ASCII string you want to replace search_string with ''' import sys def print_usage_and_exit(): print('Usage: {} pid search write'.format(sys.argv[0])) sys.exit(1) # check usage if len(sys.argv) != 4: print_usage_and_exit() # get the pid from args pid = int(sys.argv[1]) if pid <= 0: print_usage_and_exit() search_string = str(sys.argv[2]) if search_string == "": print_usage_and_exit() write_string = str(sys.argv[3]) if search_string == "": print_usage_and_exit() # open the maps and mem files of the process maps_filename = "/proc/{}/maps".format(pid) print("[*] maps: {}".format(maps_filename)) mem_filename = "/proc/{}/mem".format(pid) print("[*] mem: {}".format(mem_filename)) # try opening the maps file try: maps_file = open('/proc/{}/maps'.format(pid), 'r') except IOError as e: print("[ERROR] Can not open file {}:".format(maps_filename)) print(" I/O error({}): {}".format(e.errno, e.strerror)) sys.exit(1) for line in maps_file: sline = line.split(' ') # check if we found the heap if sline[-1][:-1] != "[heap]": continue print("[*] Found [heap]:") # parse line addr = sline[0] perm = sline[1] offset = sline[2] device = sline[3] inode = sline[4] pathname = sline[-1][:-1] print("\tpathname = {}".format(pathname)) print("\taddresses = {}".format(addr)) print("\tpermisions = {}".format(perm)) print("\toffset = {}".format(offset)) print("\tinode = {}".format(inode)) # check if there is read and write permission if perm[0] != 'r' or perm[1] != 'w': print("[*] {} does not have read/write permission".format(pathname)) maps_file.close() exit(0) # get start and end of the heap in the virtual memory addr = addr.split("-") if len(addr) != 2: # never trust anyone, not even your OS :) print("[*] Wrong addr format") maps_file.close() exit(1) addr_start = int(addr[0], 16) addr_end = int(addr[1], 16) print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end)) # open and read mem try: mem_file = open(mem_filename, 'rb+') except IOError as e: print("[ERROR] Can not open file {}:".format(mem_filename)) print(" I/O error({}): {}".format(e.errno, e.strerror)) maps_file.close() exit(1) # read heap mem_file.seek(addr_start) heap = mem_file.read(addr_end - addr_start) # find string try: i = heap.index(bytes(search_string, "ASCII")) except Exception: print("Can't find '{}'".format(search_string)) maps_file.close() mem_file.close() exit(0) print("[*] Found '{}' at {:x}".format(search_string, i)) # write the new string print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) mem_file.seek(addr_start + i) mem_file.write(bytes(write_string, "ASCII")) # close files maps_file.close() mem_file.close() # there is only one heap in our example break  Note: You will need to run this script as root, otherwise you won’t be able to read or write to the /proc/pid/mem file, even if you are the owner of the process. Running the script julien@holberton:~/holberton/w/hackthevm0$ sudo ./read_write_heap.py 4618 Holberton "Fun w vm!"
[*] maps: /proc/4618/maps
[*] mem: /proc/4618/mem
[*] Found [heap]:
pathname = [heap]
permisions = rw-p
offset = 00000000
inode = 0
Addr start [10ff000] | end [1120000]
[*] Found 'Holberton' at 10
[*] Writing 'Fun w vm!' at 10ff010
julien@holberton:~/holberton/w/hackthevm0\$


Note that this address corresponds to the one we found manually:

• The heap lies from addresses 0x010ff000 to 0x01120000 in the virtual memory of the running process
• Our string is at index 0x10 in the heap, so at the memory address 0x10ff010

If we go back to our loop program, it should now print “fun w vm!”

...
[2676] Holberton (0x10ff010)
[2677] Holberton (0x10ff010)
[2678] Holberton (0x10ff010)
[2679] Holberton (0x10ff010)
[2680] Holberton (0x10ff010)
[2681] Holberton (0x10ff010)
[2682] Fun w vm! (0x10ff010)
[2683] Fun w vm! (0x10ff010)
[2684] Fun w vm! (0x10ff010)
[2685] Fun w vm! (0x10ff010)
...


## Outro

### Questions? Feedback?

If you have questions or feedback don’t hesitate to ping us on Twitter at @holbertonschool or @julienbarbier42.
Haters, please send your comments to /dev/null.

Happy Hacking!

As always, no-one is perfect (except Chuck of course), so don’t hesitate to contribute or send me your comments.

### Files

This repo contains the source code for all programs shown in this tutorial:

• main.c: the first C program that prints the location of the string and exits
• loop.c: the second C program that loops indefinitely
• read_write_heap.py: the script used to modify the string in the running C program

### What’s next?

In the next chapter we will do almost the same thing, but instead we’ll access the memory of a running Python 3 script. It won’t be that straightfoward. We’ll take this as an excuse to look at some Python 3 internals. If you are curious, try to do it yourself, and find out why the above read_write_heap.py script won’t work to modify a Python 3 ASCII string.

See you next time and Happy Hacking!

Many thanks to KristineTim for English proof-reading & Guillaume for PEP8 proof-reading

