Angr学习笔记

最新推荐文章于 2023-07-21 16:54:09 发布

AlwaysBetter

最新推荐文章于 2023-07-21 16:54:09 发布

阅读量877

点赞数

分类专栏： Higher Reverse

本文链接：https://blog.csdn.net/qq_35740100/article/details/119718285

版权

Higher Reverse 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

学习的主要资料是Github上2018年Angr指导以下模板均来自此

Theory Part

Symbolic Execution

It’s a system that walks through all possible paths of a program.
Execution Path

It’s a possible way to travel through the program.
Symbol
It’s a variable in the program.A symbol depends on the execution path(s) that constrain it.

Of course, in the real world, the binaries will be complex. There could be many symbols and many branches. The exponential growth of the complexity of the binary is symbolic execution’s largest problem.

Veritesting

The Veritesting algorithm, developed at CMU, attempts to reduce state explosion by combining branches. Due to the difficult nature of reducing the algorithm, Veritesting relies on a heuristic to best determine how to merge states.

Angr Part

Angr

A symbolic execution engine.

Step through binaries (and follow any branch)
Search for a program state that meets a given criteria
Solve for symbolic variables given path (and other) constraints

Execution path

Represents a possible execution of the program that begins somewhere and ends somewhere else.

SimState

The node of an execution path.It stores the state of the program, as well as a history of the previous states.Chaining these SimStates together creates a path.

Simulation manager

Angr stores and handles a set of possible paths for a given program in a ‘simulation manager’ object.Simulation managers provide functionality to step through the program to generate possible paths/states.

Building a Set of Paths

Angr starts the program wherever you instruct it to start (this is the first active state)
Execute instructions in each active (nonterminated) state until we reach a branching point or the state terminates
At every branching point, split the state into multiple states, and add them to the set of active states
Repeat step 2…4 until we find what we want or all states terminate

How do we determine which conditions might lead to a failed state?
Human intuition!
Also, there are various heuristic algorithms that are mostly out of the scope of these notes. We will briefly touch on a method called Veritesting much later.

Algorithm for Find and Avoid

Load the binary
Specify a starting point and create a simulation manager
While we have not found what we want…
- Step all active states
- Run our ‘should_accept_state’ predicate on each active state
  - If one accepts, we found what we wanted! Exit the loop
- Run our ‘should_avoid_state’ predicate on each active state
  - For each state that is accepted, mark it for termination
- Remove all state that are marked for termination from the set of active states

The ‘Explore’ Method

The previous algorithm is so common that Angr wrote a single function to do it for you, called the ‘explore’ function:
simulation.explore(find=should_accept_path, avoid=should_avoid_path)
will add any path that is accepted to the list ‘simulation.found’
Additionally, searching or avoiding a specific instruction address is common enough that the find and avoid parameters also accept addresses:
simulation.explore(find=0x80430a, avoid=0x9aa442)
would search for address 0x80430a and terminate anything that reaches 0x9aa442.

SimProcedures

In some cases, Angr automatically injects a symbols when user input is queried from the stdin file.

Bitvectors

Angr’s symbols are represented by what it calls bitvectors.
Bitvectors have a size, the number of bits they represent.
As with all data in programming, bitvectors can represent any type that can fit. Most commonly, they represent either n-bit integers or strings.
The difference between a bitvector and a typical variable is that, while typical variables store a single value, bitvectors store every value that meet certain constraints.

Between Symbolic and Concrete

A concrete bitvector: a bitvector that can take on exactly 1 value.
(Example: { λ: λ = 1 })
A symbolic bitvector: a bitvector that can take on more than 1 value.
(Example: { λ: λ > 10 })
An unconstrained bitvector: a bitvector that can take on any value, within the
bounds of its size.
(Example: { λ: λ = *})
An unsatisfiable bitvector: a bitvector that cannot take on any values.
(Example: { λ: λ = 10, λ ≠ 10 })

Z3

An open-source constraint solver. It has the following functionality (and more):

Find any (single) value of a bitvector
Find up to n possible values of a bitvector
Find the maximum or minimum possible values of a bitvector
Determine if a bitvector is ‘true’ or ‘false’
Determine if a bitvector is satisfiable

We can manually add constraints to any bitvector at any time during the execution of the program.

Using Part

Basic find & avoid

import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state()
  simulation = project.factory.simgr(initial_state)
  print_good_address = 0x080485DD
  will_not_succeed_address = (0x080485EF,0x080485A8,0x0804855B)
  simulation.explore(find=print_good_address, avoid=will_not_succeed_address)

  if simulation.found:
    solution_state = simulation.found[0]
    print solution_state.posix.dumps(sys.stdin.fileno())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Find Condition

# It is very useful to be able to search for a state that reaches a certain
# instruction. However, in some cases, you may not know the address of the
# specific instruction you want to reach (or perhaps there is no single
# instruction goal.) In this challenge, you don't know which instruction
# grants you success. Instead, you just know that you want to find a state where
# the binary prints "Good Job."
#
# Angr is powerful in that it allows you to search for a states that meets an
# arbitrary condition that you specify in Python, using a predicate you define
# as a function that takes a state and returns True if you have found what you
# are looking for, and False otherwise.

import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state()
  simulation = project.factory.simgr(initial_state)

  # Define a function that checks if you have found the state you are looking
  # for.
  def is_successful(state):
    # Dump whatever has been printed out by the binary so far into a string.
    stdout_output = state.posix.dumps(sys.stdout.fileno())

    # Return whether 'Good Job.' has been printed yet.
    return 'Good Job.' in stdout_output

  # Same as above, but this time check if the state should abort. If you return
  # False, Angr will continue to step the state. In this specific challenge, the
  # only time at which you will know you should abort is when the program prints
  # "Try again."
  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  # Tell Angr to explore the binary and find any state that is_successful identfies
  # as a successful state by returning True.
  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    print solution_state.posix.dumps(sys.stdin.fileno())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Inject Symbols

Registers

In Angr, you can write to a register with either a concrete or a symbolic value:
state.regs.eax = my_bitvector
will write the value of my_bitvector to eax.

# Angr doesn't currently support reading multiple things with scanf (Ex: 
# scanf("%u %u).) You will have to tell the simulation engine to begin the
# program after scanf is called and manually inject the symbols into registers.
import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # Sometimes, you want to specify where the program should start. The variable
  # start_address will specify where the symbolic execution engine should begin.
  # Note that we are using blank_state, not entry_state.

  start_address = 0x08048980
  initial_state = project.factory.blank_state(addr=start_address)

  # Create a symbolic bitvector (the datatype Angr uses to inject symbolic
  # values into the binary.) The first parameter is just a name Angr uses
  # to reference it. 

  password0_size_in_bits = 32
  password0 = claripy.BVS('password0', password0_size_in_bits)
  password1_size_in_bits = 32
  password1 = claripy.BVS('password1', password1_size_in_bits)
  password2_size_in_bits = 32
  password2 = claripy.BVS('password2', password2_size_in_bits)

  # Set a register to a symbolic value. This is one way to inject symbols into
  # the program.
  # initial_state.regs stores a number of convenient attributes that reference
  # registers by name. For example, to set eax to password0, use:
  #
  # initial_state.regs.eax = password0
  #

  initial_state.regs.eax = password0
  initial_state.regs.ebx = password1
  initial_state.regs.edx = password2

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # Solve for the symbolic values. If there are multiple solutions, we only
    # care about one, so we can use eval, which returns any (but only one)
    # solution. Pass eval the bitvector you want to solve for.
    
    solution0 = solution_state.se.eval(password0)
    solution1 = solution_state.se.eval(password1)
    solution2 = solution_state.se.eval(password2)

    # Aggregate and format the solutions you computed above, and then print
    # the full string. Pay attention to the order of the integers, and the
    # expected base (decimal, octal, hexadecimal, etc).
    solution = str(hex(solution0))[2:-1]+' '+str(hex(solution1))[2:-1]+' '+str(hex(solution2))[2:-1]
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Global Memory

In Angr, you can write to an address with either a concrete or a symbolic value:
state.memory.store(0xaf84120, my_bitvector)
will write the value of my_bitvector to 0xaf84120.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048601
  initial_state = project.factory.blank_state(addr=start_address)

  # The binary is calling scanf("%8s %8s %8s %8s").

  password0 = claripy.BVS('password0', 64)
  password1 = claripy.BVS('password1', 64)
  password2 = claripy.BVS('password2', 64)
  password3 = claripy.BVS('password3', 64)

  # Determine the address of the global variable to which scanf writes the user
  # input. The function 'initial_state.memory.store(address, value)' will write
  # 'value' (a bitvector) to 'address' (a memory location, as an integer.) The
  # 'address' parameter can also be a bitvector (and can be symbolic!).
  
  password0_address = 0x0A1BA1C0
  initial_state.memory.store(password0_address, password0)
  initial_state.memory.store(password0_address+8, password1)
  initial_state.memory.store(password0_address+16, password2)
  initial_state.memory.store(password0_address+24, password3)

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # Solve for the symbolic values. We are trying to solve for a string.
    # Therefore, we will use eval, with named parameter cast_to=str
    # which returns a string instead of an integer.
    
    solution0 = solution_state.se.eval(password0,cast_to=str)
    solution1 = solution_state.se.eval(password1,cast_to=str)
    solution2 = solution_state.se.eval(password2,cast_to=str)
    solution3 = solution_state.se.eval(password3,cast_to=str)
    solution = solution0+' '+solution1+' '+solution2+' '+solution3

    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Stack

In Angr, you can push to the stack with either a concrete or a symbolic value:
state.stack_push(my_bitvector)
will push the value of my_bitvector to the top of the stack.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # For this challenge, we want to begin after the call to scanf. Note that this
  # is in the middle of a function.
  #
  # This challenge requires dealing with the stack, so you have to pay extra
  # careful attention to where you start, otherwise you will enter a condition
  # where the stack is set up incorrectly. In order to determine where after
  # scanf to start, we need to look at the dissassembly of the call and the
  # instruction immediately following it:
  #   sub    $0x4,%esp
  #   lea    -0x10(%ebp),%eax
  #   push   %eax
  #   lea    -0xc(%ebp),%eax
  #   push   %eax
  #   push   $0x80489c3
  #   call   8048370 <__isoc99_scanf@plt>
  #   add    $0x10,%esp
  # Now, the question is: do we start on the instruction immediately following
  # scanf (add $0x10,%esp), or the instruction following that (not shown)?
  # Consider what the 'add $0x10,%esp' is doing. Hint: it has to do with the
  # scanf parameters that are pushed to the stack before calling the function.
  
  start_address = 0x08048697
  initial_state = project.factory.blank_state(addr=start_address)

  # We are jumping into the middle of a function! Therefore, we need to account
  # for how the function constructs the stack. The second instruction of the
  # function is:
  #   mov    %esp,%ebp
  # At which point it allocates the part of the stack frame we plan to target:
  #   sub    $0x18,%esp
  # Note the value of esp relative to ebp. The space between them is (usually)
  # the stack space. Since esp was decreased by 0x18
  #
  #        /-------- The stack --------\
  # ebp -> |                           |
  #        |---------------------------|
  #        |                           |
  #        |---------------------------|
  #         . . . (total of 0x18 bytes)
  #         . . . Somewhere in here is
  #         . . . the data that stores
  #         . . . the result of scanf.
  # esp -> |                           |
  #        \---------------------------/
  #
  # Since we are starting after scanf, we are skipping this stack construction
  # step. To make up for this, we need to construct the stack ourselves. Let us
  # start by initializing ebp in the exact same way the program does.
  initial_state.regs.ebp = initial_state.regs.esp

  # scanf("%u %u") needs to be replaced by injecting two bitvectors. The
  # reason for this is that Angr does not (currently) automatically inject
  # symbols if scanf has more than one input parameter. This means Angr can
  # handle 'scanf("%u")', but not 'scanf("%u %u")'.
  
  password0 = claripy.BVS('password0', 32)
  password1 = claripy.BVS('password1', 32)

  # Here is the hard part. We need to figure out what the stack looks like, at
  # least well enough to inject our symbols where we want them. In order to do
  # that, let's figure out what the parameters of scanf are:
  #   sub    $0x4,%esp
  #   lea    -0x10(%ebp),%eax
  #   push   %eax
  #   lea    -0xc(%ebp),%eax
  #   push   %eax
  #   push   $0x80489c3
  #   call   8048370 <__isoc99_scanf@plt>
  #   add    $0x10,%esp 
  # As you can see, the call to scanf looks like this:
  # scanf(  0x80489c3,   ebp - 0xc,   ebp - 0x10  )
  #      format_string    password0    password1
  #  From this, we can construct our new, more accurate stack diagram:
  #
  #            /-------- The stack --------\
  # ebp ->     |          padding          |
  #            |---------------------------|
  # ebp - 0x01 |       more padding        |
  #            |---------------------------|
  # ebp - 0x02 |     even more padding     |
  #            |---------------------------|
  #                        . . .               <- How much padding? Hint: how
  #            |---------------------------|      many bytes is password0?
  # ebp - 0x0b |   password0, second byte  |
  #            |---------------------------|
  # ebp - 0x0c |   password0, first byte   |
  #            |---------------------------|
  # ebp - 0x0d |   password1, last byte    |
  #            |---------------------------|
  #                        . . .
  #            |---------------------------|
  # ebp - 0x10 |   password1, first byte   |
  #            |---------------------------|
  #                        . . .
  #            |---------------------------|
  # esp ->     |                           |
  #            \---------------------------/
  #
  # Figure out how much space there is and allocate the necessary padding to
  # the stack by decrementing esp before you push the password bitvectors.
  padding_length_in_bytes = 0x8  # :integer
  initial_state.regs.esp -= padding_length_in_bytes

  # Push the variables to the stack. Make sure to push them in the right order!
  # The syntax for the following function is:
  #
  # initial_state.stack_push(bitvector)
  #
  # This will push the bitvector on the stack, and increment esp the correct
  # amount. You will need to push multiple bitvectors on the stack.
  
  initial_state.stack_push(password0)
  initial_state.stack_push(password1)
  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution0 = solution_state.se.eval(password0)
    solution1 = solution_state.se.eval(password1)

    solution = str(solution0) + ' '+ str(solution1)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Dynamic Memory (like malloc)

By default, Angr stores integers in memory with big-endianness. To
specify to use the endianness of your architecture, use the parameter
endness=project.arch.memory_endness. On x86, this is little-endian.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048699
  initial_state = project.factory.blank_state(addr=start_address)

  # The binary is calling scanf("%8s %8s").
  
  password0 = claripy.BVS('password0', 64)
  password1 = claripy.BVS('password1', 64)
  
  # Instead of telling the binary to write to the address of the memory
  # allocated with malloc, we can simply fake an address to any unused block of
  # memory and overwrite the pointer to the data. This will point the pointer
  # with the address of pointer_to_malloc_memory_address0 to fake_heap_address.
  
  fake_heap_address0 = 0x0ABCC790
  fake_heap_address1 = 0x0ABCC798
  pointer_to_malloc_memory_address0 = 0x0ABCC8A4
  initial_state.memory.store(pointer_to_malloc_memory_address0, fake_heap_address0, endness=project.arch.memory_endness)
  pointer_to_malloc_memory_address1 = 0x0ABCC8AC
  initial_state.memory.store(pointer_to_malloc_memory_address1, fake_heap_address1, endness=project.arch.memory_endness)

  # Store our symbolic values at our fake_heap_address. Look at the binary to
  # determine the offsets from the fake_heap_address where scanf writes.
  
  initial_state.memory.store(fake_heap_address0, password0)
  initial_state.memory.store(fake_heap_address1, password1)

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution0 = solution_state.se.eval(password0,cast_to=str)
    solution1 = solution_state.se.eval(password1,cast_to=str)
    solution = solution0 + ' '+ solution1
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

File System

# Problem description and general solution strategy:
# The binary loads the password from a file using the fread function. If the
# password is correct, it prints "Good Job." In order to keep consistency with
# the other challenges, the input from the console is written to a file in the 
# ignore_me function. As the name suggests, ignore it, as it only exists to
# maintain consistency with other challenges.
# We want to:
# 1. Determine the file from which fread reads.
# 2. Use Angr to simulate a filesystem where that file is replaced with our own
#    simulated file.
# 3. Initialize the file with a symbolic value, which will be read with fread
#    and propogated through the program.
# 4. Solve for the symbolic input to determine the password.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x80488db
  initial_state = project.factory.blank_state(addr=start_address)

  # Specify some information needed to construct a simulated file. For this
  # challenge, the filename is hardcoded, but in theory, it could be symbolic. 
  # Note: to read from the file, the binary calls
  # 'fread(buffer, sizeof(char), 64, file)'.
  filename = 'OJKSQYDP.txt'
  symbolic_file_size_bytes = 64

  # A file, in Linux, represents a stream of sequential data. This stream may
  # come from a physical file on your hard drive, the network, the output of
  # another program (ex: /dev/urandom), or anything else. In our case, we want
  # to construct a block of memory where we store our symbolic variables for the
  # program to read. The following constructs the symbolic memory that will
  # supply the stream of data to the Linux file. Also, to communicate with 
  # Angr's constraint solving system, we need to associate the memory with the 
  # initial_state.
  symbolic_file_backing_memory = angr.state_plugins.SimSymbolicMemory()
  symbolic_file_backing_memory.set_state(initial_state)

  # Construct a bitvector for the password and then store it in the file's
  # backing memory. The store method works exactly the same as the store method
  # you have already used.
  # For example, imagine a simple file, 'hello.txt':
  #
  # Hello world, my name is John.
  # ^                       ^
  # ^ address 0             ^ address 24 (count the number of characters)
  # In order to represent this in memory, we would want to write the string to
  # the beginning of the file:
  #
  # hello_txt_contents = claripy.BVV('Hello world, my name is John.', 30*8)
  # hello_txt_backing_memory.store(0, hello_txt_contents)
  #
  # Perhaps, then, we would want to replace John with a
  # symbolic variable. We would call:
  #
  # name_bitvector = claripy.BVS('symbolic_name', 4*8)
  # hello_txt_backing_memory.store(24, name_bitvector)
  #
  # Then, after the program calls fopen('hello.txt', 'r') and then
  # fread(buffer, sizeof(char), 30, hello_txt_file), the buffer would contain
  # the string from the file, except four symbolic bytes where the name would be
  # stored.
  
  password = claripy.BVS('password', symbolic_file_size_bytes * 8)
  symbolic_file_backing_memory.store(0, password)

  # Construct the symbolic file. The file_options parameter specifies the Linux
  # file permissions (read, read/write, execute etc.) The content parameter
  # specifies from where the stream of data should be supplied. If content is
  # an instance of SimSymbolicMemory (we constructed one above), the stream will
  # contain the contents (including any symbolic contents) of the memory,
  # beginning from address zero.
  # Set the content parameter to our SimSymbolicMemory instance that holds the
  # symbolic data.
  
  file_options = 'r'
  password_file = angr.storage.SimFile(filename, file_options, content=symbolic_file_backing_memory, size=symbolic_file_size_bytes)

  # We have already created the file and the memory that stores the data that
  # the file will stream to the program, but we now need to tell Angr where the
  # file should appear to exist on the filesystem. This is a mapping between 
  # strings representing the filenames and the angr.storage.SimFiles themselves. For
  # example, if hello_txt_file was a SimFile,
  # symbolic_filesystem = {
  #   'hello.txt' : hello_txt_file
  # }
  # would specify that any fopen('hello.txt', 'r') calls should stream data from
  # hello_txt_file.
  symbolic_filesystem = {
    filename : password_file
  }
  initial_state.posix.fs = symbolic_filesystem

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution = solution_state.se.eval(password,cast_to=str)

    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Hook

# This level performs the following computations:
#
# 1. Get 16 bytes of user input and encrypt it.
# 2. Save the result of check_equals_AABBCCDDEEFFGGHH (or similar)
# 3. Get another 16 bytes from the user and encrypt it.
# 4. Check that it's equal to a predefined password.
#
# The ONLY part of this program that we have to worry about is #2. We will be
# replacing the call to check_equals_ with our own version, using a hook, since
# check_equals_ will run too slowly otherwise.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state()

  # Hook the address of where check_equals_ is called.
  check_equals_called_address = 0x080486AC

  # The length parameter in angr.Hook specifies how many bytes the execution
  # engine should skip after completing the hook. This will allow hooks to
  # replace certain instructions (or groups of instructions). Determine the
  # instructions involved in calling check_equals_, and then determine how many
  # bytes are used to represent them in memory. This will be the skip length.
  
  instruction_to_skip_length = 0x080486BB - 0x080486AC
  @project.hook(check_equals_called_address, length=instruction_to_skip_length)
  def skip_check_equals_(state):
    # Determine the address where user input is stored. It is passed as a
    # parameter ot the check_equals_ function. Then, load the string. Reminder:
    # int check_equals_(char* to_check, int length) { ...
    user_input_buffer_address = 0x0804A054 # :integer, probably hexadecimal
    user_input_buffer_length = 0x10

    # Reminder: state.memory.load will read the stored value at the address
    # user_input_buffer_address of byte length user_input_buffer_length.
    # It will return a bitvector holding the value. This value can either be
    # symbolic or concrete, depending on what was stored there in the program.
    user_input_string = state.memory.load(
      user_input_buffer_address, 
      user_input_buffer_length
    )
    
    # Determine the string this function is checking the user input against.
    # It's encoded in the name of this function; decompile the program to find
    # it.
    check_against_string = 'XYMKBKUHNIQYNQXE' # :string

    # gcc uses eax to store the return value, if it is an integer. We need to
    # set eax to 1 if check_against_string == user_input_string and 0 otherwise.
    # However, since we are describing an equation to be used by z3 (not to be
    # evaluated immediately), we cannot use Python if else syntax. Instead, we 
    # have to use claripy's built in function that deals with if statements.
    # claripy.If(expression, ret_if_true, ret_if_false) will output an 
    # expression that evaluates to ret_if_true if expression is true and
    # ret_if_false otherwise. 
    # Think of it like the Python "value0 if expression else value1".
    state.regs.eax = claripy.If(
      user_input_string == check_against_string, 
      claripy.BVV(1, 32), 
      claripy.BVV(0, 32)
    )

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    solution = solution_state.posix.dumps(sys.stdin.fileno())
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Constrain

# The binary asks for a 16 character password to which is applies a complex
# function and then compares with a reference string with the function
# check_equals_[reference string]. (Decompile the binary and take a look at it!)
# The source code for this function is provided here. However, the reference
# string in your version will be different than AABBCCDDEEFFGGHH:
#
# #define REFERENCE_PASSWORD = "AABBCCDDEEFFGGHH";
# int check_equals_AABBCCDDEEFFGGHH(char* to_check, size_t length) {
#   uint32_t num_correct = 0;
#   for (int i=0; i<length; ++i) {
#     if (to_check[i] == REFERENCE_PASSWORD[i]) {
#       num_correct += 1;
#     }
#   }
#   return num_correct == length;
# }
#
# ...
# 
# char* input = user_input();
# char* encrypted_input = complex_function(input);
# if (check_equals_AABBCCDDEEFFGGHH(encrypted_input, 16)) {
#   puts("Good Job.");
# } else {
#   puts("Try again.");
# }
#
# The function checks if *to_check == "AABBCCDDEEFFGGHH". Verify this yourself.
# While you, as a human, can easily determine that this function is equivalent
# to simply comparing the strings, the computer cannot. Instead the computer 
# would need to branch every time the if statement in the loop was called (16 
# times), resulting in 2^16 = 65,536 branches, which will take too long of a 
# time to evaluate for our needs.
#
# We do not know how the complex_function works, but we want to find an input
# that, when modified by complex_function, will produce the string:
# AABBCCDDEEFFGGHH.
#
# In this puzzle, your goal will be to stop the program before this function is
# called and manually constrain the to_check variable to be equal to the
# password you identify by decompiling the binary. Since, you, as a human, know
# that if the strings are equal, the program will print "Good Job.", you can
# be assured that if the program can solve for an input that makes them equal,
# the input will be the correct password.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  start_address = 0x08048625
  initial_state = project.factory.blank_state(addr=start_address)

  password = claripy.BVS('password', 16*8)

  password_address = 0x0804A050
  initial_state.memory.store(password_address, password)

  simulation = project.factory.simgr(initial_state)

  # Angr will not be able to reach the point at which the binary prints out
  # 'Good Job.'. We cannot use that as the target anymore.
  
  address_to_check_constraint = 0x0804866C
  simulation.explore(find=address_to_check_constraint)

  if simulation.found:
    solution_state = simulation.found[0]

    # Recall that we need to constrain the to_check parameter (see top) of the 
    # check_equals_ function. Determine the address that is being passed as the
    # parameter and load it into a bitvector so that we can constrain it.
    
    constrained_parameter_address = 0x0804A050
    constrained_parameter_size_bytes = 16
    constrained_parameter_bitvector = solution_state.memory.load(
      constrained_parameter_address,
      constrained_parameter_size_bytes
    )

    # We want to constrain the system to find an input that will make
    # constrained_parameter_bitvector equal the desired value.
    
    constrained_parameter_desired_value = 'AUPDNNPROEZRJWKB'

    # Specify a claripy expression (using Pythonic syntax) that tests whether
    # constrained_parameter_bitvector == constrained_parameter_desired_value.
    # We will let z3 attempt to find an input that will make this expression
    # true.
    constraint_expression = constrained_parameter_bitvector == constrained_parameter_desired_value
    
    # Add the constraint to the state to instruct z3 to include it when solving
    # for input.
    solution_state.add_constraints(constrained_parameter_bitvector == constrained_parameter_desired_value)

    # Solve for the constrained_parameter_bitvector.
    solution = solution_state.se.eval(password,cast_to=str)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

SimProcedure

# This challenge is similar to the previous one. It operates under the same
# premise that you will have to replace the check_equals_ function. In this 
# case, however, check_equals_ is called so many times that it wouldn't make 
# sense to hook where each one was called. Instead, use a SimProcedure to write
# your own check_equals_ implementation and then hook the check_equals_ symbol 
# to replace all calls to scanf with a call to your SimProcedure.
#
# You may be thinking: 
#   Why can't I just use hooks? The function is called many times, but if I hook
#   the address of the function itself (rather than the addresses where it is 
#   called), I can replace its behavior everywhere. Furthermore, I can get the
#   parameters by reading them off the stack (with memory.load(regs.esp + xx)),
#   and return a value by simply setting eax! Since I know the length of the 
#   function in bytes, I can return from the hook just before the 'ret'
#   instruction is called, which will allow the program to jump back to where it
#   was before it called my hook.
# If you thought that, then congratulations! You have just invented the idea of
# SimProcedures! Instead of doing all of that by hand, you can let the already-
# implemented SimProcedures do the boring work for you so that you can focus on
# writing a replacement function in a Pythonic way.
# As a bonus, SimProcedures allow you to specify custom calling conventions, but
# unfortunately it is not covered in this CTF.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state()

  # Define a class that inherits angr.SimProcedure in order to take advantage
  # of Angr's SimProcedures.
  class ReplacementCheckEquals(angr.SimProcedure):
    # A SimProcedure replaces a function in the binary with a simulated one
    # written in Python. Other than it being written in Python, the function
    # acts largely the same as any function written in C. Any parameter after
    # 'self' will be treated as a parameter to the function you are replacing.
    # The parameters will be bitvectors. Additionally, the Python can return in
    # the ususal Pythonic way. Angr will treat this in the same way it would
    # treat a native function in the binary returning. An example:
    #
    # int add_if_positive(int a, int b) {
    #   if (a >= 0 && b >= 0) return a + b;
    #   else return 0;
    # }
    #
    # could be simulated with...
    #
    # class ReplacementAddIfPositive(angr.SimProcedure):
    #   def run(self, a, b):
    #     if a >= 0 and b >=0:
    #       return a + b
    #     else:
    #       return 0
    #
    # Finish the parameters to the check_equals_ function. Reminder:
    # int check_equals_AABBCCDDEEFFGGHH(char* to_check, int length) { ...
    
    def run(self, to_check, length):
      # We can almost copy and paste the solution from the previous challenge.
      # Hint: Don't look up the address! It's passed as a parameter.
      user_input_buffer_address = to_check
      user_input_buffer_length = length

      # Note the use of self.state to find the state of the system in a 
      # SimProcedure.
      user_input_string = self.state.memory.load(
        user_input_buffer_address,
        user_input_buffer_length
      )

      check_against_string = 'ORSDDWXHZURJRBDH'
      
      # Finally, instead of setting eax, we can use a Pythonic return statement
      # to return the output of this function. 
      # Hint: Look at the previous solution.
      return claripy.If(
        user_input_string == check_against_string,
        claripy.BVV(1,32),
        claripy.BVV(0,32)
      )


  # Hook the check_equals symbol. Angr automatically looks up the address 
  # associated with the symbol. Alternatively, you can use 'hook' instead
  # of 'hook_symbol' and specify the address of the function. To find the 
  # correct symbol, disassemble the binary.
  
  check_equals_symbol = 'check_equals_ORSDDWXHZURJRBDH'
  project.hook_symbol(check_equals_symbol, ReplacementCheckEquals())

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    solution = solution_state.posix.dumps(sys.stdin.fileno())
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

SimScanf

# This time, the solution involves simply replacing scanf with our own version,
# since Angr does not support requesting multiple parameters with scanf.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state()

  class ReplacementScanf(angr.SimProcedure):
    def run(self, format_string, scanf0_address,scanf1_address):
      scanf0 = claripy.BVS('scanf0', 32)
      scanf1 = claripy.BVS('scanf1', 32)

      # The scanf function writes user input to the buffers to which the 
      # parameters point.
      self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
      self.state.memory.store(scanf1_address, scanf1, endness=project.arch.memory_endness)

      # Now, we want to 'set aside' references to our symbolic values in the
      # globals plugin included by default with a state. You will need to
      # store multiple bitvectors. You can either use a list, tuple, or multiple
      # keys to reference the different bitvectors.
      self.state.globals['solution0'] = (scanf0,scanf1)

  scanf_symbol = '__isoc99_scanf'
  project.hook_symbol(scanf_symbol, ReplacementScanf())

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]

    # Grab whatever you set aside in the globals dict.
    stored_solutions = solution_state.globals['solution0']
    solution = ' '.join(map(str, map(solution_state.se.eval, stored_solutions)))

    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Veritesting

"veritesting=True"

import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state()
  simulation = project.factory.simgr(initial_state,veritesting=True)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output

  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)

  if simulation.found:
    solution_state = simulation.found[0]
    print solution_state.posix.dumps(sys.stdin.fileno())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Static Linked

Manually hook them.

# This challenge is the exact same as the first challenge, except that it was
# compiled as a static binary. Normally, Angr automatically replaces standard
# library functions with SimProcedures that work much more quickly.
#
# Here are a few SimProcedures Angr has already written for you. They implement
# standard library functions. You will not need all of them:
# angr.SIM_PROCEDURES['libc']['malloc']
# angr.SIM_PROCEDURES['libc']['fopen']
# angr.SIM_PROCEDURES['libc']['fclose']
# angr.SIM_PROCEDURES['libc']['fwrite']
# angr.SIM_PROCEDURES['libc']['getchar']
# angr.SIM_PROCEDURES['libc']['strncmp']
# angr.SIM_PROCEDURES['libc']['strcmp']
# angr.SIM_PROCEDURES['libc']['scanf']
# angr.SIM_PROCEDURES['libc']['printf']
# angr.SIM_PROCEDURES['libc']['puts']
# angr.SIM_PROCEDURES['libc']['exit']
#
# As a reminder, you can hook functions with something similar to:
# project.hook(malloc_address, angr.SIM_PROCEDURES['libc']['malloc'])
#
# There are many more, see:
# https://github.com/angr/angr/tree/master/angr/procedures/libc
#
# Additionally, note that, when the binary is executed, the main function is not
# the first piece of code called. In the _start function, __libc_start_main is 
# called to start your program. The initialization that occurs in this function
# can take a long time with Angr, so you should replace it with a SimProcedure.
# angr.SIM_PROCEDURES['glibc']['__libc_start_main']
# Note 'glibc' instead of 'libc'.
import angr
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state()
 
  project.hook(0x804ed40, angr.SIM_PROCEDURES['libc']['printf']())
  project.hook(0x804ed80, angr.SIM_PROCEDURES['libc']['scanf']())
  project.hook(0x804f350, angr.SIM_PROCEDURES['libc']['puts']())
  project.hook(0x8048d10, angr.SIM_PROCEDURES['glibc']['__libc_start_main']())

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Good Job.' in stdout_output
  
  def should_abort(state):
    stdout_output = state.posix.dumps(sys.stdout.fileno())
    return 'Try again.' in stdout_output

  simulation.explore(find=is_successful, avoid=should_abort)
  
  if simulation.found:
    solution_state = simulation.found[0]
    print solution_state.posix.dumps(sys.stdin.fileno())
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Shared Library

# The shared library has the function validate, which takes a string and returns
# either true (1) or false (0). The binary calls this function. If it returns
# true, the program prints "Good Job." otherwise, it prints "Try again."
#
# Note: When you run this script, make sure you run it on 
# lib14_angr_shared_library.so, not the executable. This level is intended to 
# teach how to analyse binary formats that are not typical executables.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]

  # The shared library is compiled with position-independent code. You will need
  # to specify the base address. All addresses in the shared library will be
  # base + offset, where offset is their address in the file.
  
  base = 0x4000000
  project = angr.Project(path_to_binary, load_options={ 
    'main_opts' : { 
      'custom_base_addr' : base 
    } 
  })

  # Initialize any symbolic values here; you will need at least one to pass to
  # the validate function.
  buffer_pointer = claripy.BVV(0x3000000,32)

  # Begin the state at the beginning of the validate function, as if it was
  # called by the program. Determine the parameters needed to call validate and
  # replace 'parameters...' with bitvectors holding the values you wish to pass.
  # Recall that 'claripy.BVV(value, size_in_bits)' constructs a bitvector 
  # initialized to a single value.
  # Remember to add the base value you specified at the beginning to the
  # function address!
  # Hint: int validate(char* buffer, int length) { ...
  # Another hint: the password is 8 bytes long.
  
  validate_function_address = base + 0x6d7
  initial_state = project.factory.call_state(validate_function_address,buffer_pointer, claripy.BVV(8, 32))

  # You will need to add code to inject a symbolic value into the program at the
  # end of the function that constrains eax to equal true (value of 1) just
  # before the function returns. There are multiple ways to do this:
  # 1. Use a hook.
  # 2. Search for the address just before the function returns and then
  #    constrain eax (this may require putting code elsewhere)
  password = claripy.BVS('password',8*8)
  initial_state.memory.store(buffer_pointer,password)

  simulation = project.factory.simgr(initial_state)

  success_address = base + 0x783
  simulation.explore(find=success_address)

  if simulation.found:
    solution_state = simulation.found[0]
    solution_state.add_constraints(solution_state.regs.eax != 0)
    # Determine where the program places the return value, and constrain it so
    # that it is true. Then, solve for the solution and print it.
    
    solution = solution_state.se.eval(password,cast_to=str)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Automatic Exploit Generation

Arbitrary Read

# This binary takes both an integer and a string as a parameter. A certain
# integer input causes the program to reach a buffer overflow with which we can
# read a string from an arbitrary memory location. Our goal is to use Angr to 
# search the program for this buffer overflow and then automatically generate
# an exploit to read the string "Good Job." 
#
# What is the point of reading the string "Good Job."?
# This CTF attempts to replicate a simplified version of a possible vulnerability
# where a user can exploit the program to print a secret, such as a password or
# a private key. In order to keep consistency with the other challenges and to
# simplify the challenge, the goal of this program will be to print "Good Job."
# instead.
#
# The general strategy for crafting this script will be to:
# 1) Search for calls of the 'puts' function, which will eventually be exploited
#    to print out "Good Job."
# 2) Determine if the first parameter of 'puts', a pointer to the string to be 
#    printed, can be controlled by the user to be set to the location of the
#    "Good Job." string.
# 3) Solve for the input that prints "Good Job."
#
# Note: The script is structured to implement step #2 before #1.

# Some of the source code for this challenge:
#
# #include <stdio.h>
# #include <stdlib.h>
# #include <string.h>
# #include <stdint.h>
# 
# // This will all be in .rodata
# char msg[] = "${ description }$";
# char* try_again = "Try again.";
# char* good_job = "Good Job.";
# uint32_t key;
# 
# void print_msg() {
#   printf("%s", msg);
# }
#
# uint32_t complex_function(uint32_t input) {
#   ...
# }
# 
# struct overflow_me {
#   char buffer[16];
#   char* to_print;
# }; 
# 
# int main(int argc, char* argv[]) {
#   struct overflow_me locals;
#   locals.to_print = try_again;
# 
#   print_msg();
# 
#   printf("Enter the password: ");
#   scanf("%u %20s", &key, locals.buffer);
#
#   key = complex_function(key);
# 
#   switch (key) {
#     case ?:
#       puts(try_again);
#       break;
#
#     ...
#
#     case ?:
#       // Our goal is to trick this call to puts to print the "secret
#       // password" (which happens, in our case, to be the string
#       // "Good Job.")
#       puts(locals.to_print);
#       break;
#     
#     ...
#   }
# 
#   return 0;
# }

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  # You can either use a blank state or an entry state; just make sure to start
  # at the beginning of the program.
  
  initial_state = project.factory.entry_state()

  # Again, scanf needs to be replaced.
  class ReplacementScanf(angr.SimProcedure):
    # Hint: scanf("%u %20s")
    def run(self, format_string,param0,param1):
      scanf0 = claripy.BVS('scanf0', 32)
      scanf1 = claripy.BVS('scanf1', 20*8)

      # The bitvector.chop(bits=n) function splits the bitvector into a Python
      # list containing the bitvector in segments of n bits each. In this case,
      # we are splitting them into segments of 8 bits (one byte.)
      for char in scanf1.chop(bits=8):
        # Ensure that each character in the string is printable. An interesting
        # experiment, once you have a working solution, would be to run the code
        # without constraining the characters to the capital letters.
        # Even though the solution will technically work without this, it's more
        # difficult to enter in a solution that contains character you can't
        # copy, paste, or type into your terminal or the web form that checks 
        # your solution.
        # If you are using the web form to submit answers, your solution must be
        # entirely alphanumeric except for spaces.
        self.state.add_constraints(char >= 'A', char <= 'Z')

      # Warning: Endianness only applies to integers. If you store a string in
      # memory and treat it as a little-endian integer, it will be backwards.
      scanf0_address = param0
      self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
      scanf1_address = param1
      self.state.memory.store(scanf1_address, scanf1)
      self.state.globals['solutions'] = (scanf0,scanf1)

  scanf_symbol = '__isoc99_scanf'
  project.hook_symbol(scanf_symbol, ReplacementScanf())

  # We will call this whenever puts is called. The goal of this function is to
  # determine if the pointer passed to puts is controllable by the user, such
  # that we can rewrite it to point to the string "Good Job."
  def check_puts(state):
    # Recall that puts takes one parameter, a pointer to the string it will
    # print. If we load that pointer from memory, we can analyse it to determine
    # if it can be controlled by the user input in order to point it to the
    # location of the "Good Job." string.
    #
    # Treat the implementation of this function as if puts was just called.
    # The stack, registers, memory, etc should be set up as if the x86 call
    # instruction was just invoked (but, of course, the function hasn't copied
    # the buffers yet.)
    # The stack will look as follows:
    # ...
    # esp + 7 -> /----------------\
    # esp + 6 -> |      puts      |
    # esp + 5 -> |    parameter   |
    # esp + 4 -> \----------------/
    # esp + 3 -> /----------------\
    # esp + 2 -> |     return     |
    # esp + 1 -> |     address    |
    #     esp -> \----------------/
    #
    # Hint: Look at level 08, 09, or 10 to review how to load a value from a
    # memory address. Remember to use the correct endianness in the future when
    # loading integers; it has been included for you here.
    
    puts_parameter = state.memory.load(state.regs.esp+4, 4, endness=project.arch.memory_endness)

    # The following function takes a bitvector as a parameter and checks if it
    # can take on more than one value. While this does not necessary tell us we
    # have found an exploitable state, it is a strong indication that the 
    # bitvector we checked may be controllable by the user.
    # Use it to determine if the pointer passed to puts is symbolic.
    
    if state.se.symbolic(puts_parameter):
      # Determine the location of the "Good Job." string. We want to print it
      # out, and we will do so by attempting to constrain the puts parameter to
      # equal it.
      
      good_job_string_address = 0x484F4A47

      # Create an expression that will test if puts_parameter equals
      # good_job_string_address. If we add this as a constraint to our solver,
      # it will try and find an input to make this expression true.
      
      is_vulnerable_expression = puts_parameter == good_job_string_address

      # Have Angr evaluate the state to determine if all the constraints can
      # be met, including the one we specified above. If it can be satisfied,
      # we have found our exploit!
      #
      # When doing this, however, we do not want to edit our state in case we
      # have not yet found what we are looking for. To test if our expression
      # is satisfiable without editing the original, we need to clone the state.
      copied_state = state.copy()

      # We can now play around with the copied state without changing the
      # original. We need to add our vulnerable expression as a state to test it.
      # Look at level 08 and compare this call to how it is called there.
      copied_state.add_constraints(is_vulnerable_expression)

      # Finally, we test if we can satisfy the constraints of the state.
      if copied_state.satisfiable():
        # Before we return, let's add the constraint to the solver for real.
        state.add_constraints(is_vulnerable_expression)
        return True
      else:
        return False
    else: # not state.se.symbolic
      return False

  simulation = project.factory.simgr(initial_state)

  # In order to determine if we have found a vulnerable call to 'puts',  we need
  # to run the function check_puts (defined above) whenever we reach a 'puts'
  # call. To do this, we will look for the place where the instruction pointer,
  # state.addr, is equal to the beginning of the puts function.
  def is_successful(state):
    # We are looking for puts. Check that the address is at the (very) beginning
    # of the puts function. Warning: while, in theory, you could look for
    # any address in puts, if you execute any instruction that adjusts the stack
    # pointer, the stack diagram above will be incorrect. Therefore, it is
    # recommended that you check for the very beginning of puts.
    
    puts_address = 0x08048370
    if state.addr == puts_address:
      # Return True if we determine this call to puts is exploitable.
      return check_puts(state)
    else:
      # We have not yet found a call to puts; we should continue!
      return False

  simulation.explore(find=is_successful)

  if simulation.found:
    solution_state = simulation.found[0]
    (scanf0, scanf1) = solution_state.globals['solutions']
    solution = str(solution_state.se.eval(scanf0)) + ' ' + solution_state.se.eval(scanf1,cast_to=str)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Arbitrary Write

# Essentially, the program does the following:
#
# scanf("%d %20s", &key, user_input);
# ...
#   // if certain unknown conditions are true...
#   strncpy(random_buffer, user_input);
# ...
# if (strncmp(secure_buffer, reference_string)) {
#   // The secure_buffer does not equal the reference string.
#   puts("Try again.");
# } else {
#   // The two are equal.
#   puts("Good Job.");
# }
#
# If this program has no bugs in it, it would _always_ print "Try again." since
# user_input copies into random_buffer, not secure_buffer.
#
# The question is: can we find a buffer overflow that will allow us to overwrite
# the random_buffer pointer to point to secure_buffer? (Spoiler: we can, but we
# will need to use Angr.)
#
# We want to identify a place in the binary, when strncpy is called, when we can:
#  1) Control the source contents (not the source pointer!)
#     * This will allow us to write arbitrary data to the destination.
#  2) Control the destination pointer
#     * This will allow us to write to an arbitrary location.
# If we can meet both of those requirements, we can write arbitrary data to an
# arbitrary location. Finally, we need to contrain the source contents to be
# equal to the reference_string and the destination pointer to be equal to the
# secure_buffer.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)
  initial_state = project.factory.entry_state()

  class ReplacementScanf(angr.SimProcedure):
    # Hint: scanf("%u %20s")
    def run(self, format_string,param0,param1):
      scanf0 = claripy.BVS('scanf0', 32)
      scanf1 = claripy.BVS('scanf1', 20*8)
      for char in scanf1.chop(bits=8):
        self.state.add_constraints(char >= 'A', char <= 'Z')

      scanf0_address = param0
      self.state.memory.store(scanf0_address, scanf0, endness=project.arch.memory_endness)
      scanf1_address = param1
      self.state.memory.store(scanf1_address, scanf1)
      self.state.globals['solutions'] = (scanf0,scanf1)

  scanf_symbol = '__isoc99_scanf'
  project.hook_symbol(scanf_symbol, ReplacementScanf())

  # In this challenge, we want to check strncpy to determine if we can control
  # both the source and the destination. It is common that we will be able to
  # control at least one of the parameters, (such as when the program copies a
  # string that it received via stdin).
  
  def check_strncpy(state):
    # The stack will look as follows:
    # ...          ________________
    # esp + 15 -> /                \
    # esp + 14 -> |     param2     |
    # esp + 13 -> |      len       |
    # esp + 12 -> \________________/
    # esp + 11 -> /                \
    # esp + 10 -> |     param1     |
    #  esp + 9 -> |      src       |
    #  esp + 8 -> \________________/
    #  esp + 7 -> /                \
    #  esp + 6 -> |     param0     |
    #  esp + 5 -> |      dest      |
    #  esp + 4 -> \________________/
    #  esp + 3 -> /                \
    #  esp + 2 -> |     return     |
    #  esp + 1 -> |     address    |
    #      esp -> \________________/
    # (!)
    strncpy_src = state.memory.load(state.regs.esp + 8, 4, endness=project.arch.memory_endness)
    strncpy_dest = state.memory.load(state.regs.esp + 4, 4, endness=project.arch.memory_endness)
    strncpy_len = state.memory.load(state.regs.esp + 12, 4, endness=project.arch.memory_endness)

    # We need to find out if src is symbolic, however, we care about the
    # contents, rather than the pointer itself. Therefore, we have to load the
    # the contents of src to determine if they are symbolic.
    
    src_contents = state.memory.load(strncpy_src, strncpy_len)

    # Our goal is to determine if we can write arbitrary data to an arbitrary
    # location. This means determining if the source contents are symbolic 
    # (arbitrary data) and the destination pointer is symbolic (arbitrary
    # destination).
    
    if state.se.symbolic(src_contents) and state.se.symbolic(strncpy_dest):
      # Use ltrace to determine the reference string. Decompile the binary to 
      # determine the address of the buffer it checks the password against. Our 
      # goal is to overwrite that buffer to store the password.
      
      password_string = 'NDYNWEUJ'
      buffer_address = 0x57584344

      # Create an expression that tests if the first n bytes is length. Warning:
      # while typical Python slices (array[start:end]) will work with bitvectors,
      # they are indexed in an odd way. The ranges must start with a high value
      # and end with a low value. Additionally, the bits are indexed from right
      # to left. For example, let a bitvector, b, equal 'ABCDEFGH', (64 bits).
      # The following will read bit 0-7 (total of 1 byte) from the right-most
      # bit (the end of the string).
      #  b[7:0] == 'H'
      # To access the beginning of the string, we need to access the last 16
      # bits, or bits 48-63:
      #  b[63:48] == 'AB'
      # In this specific case, since we don't necessarily know the length of the
      # contents (unless you look at the binary), we can use the following:
      #  b[-1:-16] == 'AB', since, in Python, -1 is the end of the list, and -16
      # is the 16th element from the end of the list. The actual numbers should
      # correspond with the length of password_string.
      
      does_src_hold_password = src_contents[-1:-64] == password_string
      
      # Create an expression to check if the dest parameter can be set to
      # buffer_address. If this is true, then we have found our exploit!
      
      does_dest_equal_buffer_address = strncpy_dest == buffer_address

      # In the previous challenge, we copied the state, added constraints to the
      # copied state, and then determined if the constraints of the new state 
      # were satisfiable. Since that pattern is so common, Angr implemented a
      # parameter 'extra_constraints' for the satisfiable function that does the
      # exact same thing:
      if state.satisfiable(extra_constraints=(does_src_hold_password, does_dest_equal_buffer_address)):
        state.add_constraints(does_src_hold_password, does_dest_equal_buffer_address)
        return True
      else:
        return False
    else: # not state.se.symbolic
      return False

  simulation = project.factory.simgr(initial_state)

  def is_successful(state):
    strncpy_address = 0x08048410
    if state.addr == strncpy_address:
      return check_strncpy(state)
    else:
      return False

  simulation.explore(find=is_successful)

  if simulation.found:
    solution_state = simulation.found[0]
    scanf0, scanf1 = solution_state.globals['solutions']
    solution = str(solution_state.se.eval(scanf0)) + ' ' + solution_state.se.eval(scanf1, cast_to=str)
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)

Arbitrary Jump

# An unconstrained state occurs when there are too many
# possible branches from a single instruction. This occurs, among other ways,
# when the instruction pointer (on x86, eip) is completely symbolic, meaning
# that user input can control the address of code the computer executes.
# For example, imagine the following pseudo assembly:
#
# mov user_input, eax
# jmp eax
#
# The value of what the user entered dictates the next instruction. This
# is an unconstrained state. It wouldn't usually make sense for the execution
# engine to continue. (Where should the program jump to if eax could be
# anything?) Normally, when Angr encounters an unconstrained state, it throws
# it out. In our case, we want to exploit the unconstrained state to jump to
# a location of our choosing. We will get to how to disable Angr's default
# behavior later.
#
# This challenge represents a classic stack-based buffer overflow attack to
# overwrite the return address and jump to a function that prints "Good Job."
# Our strategy for solving the challenge is as follows:
# 1. Initialize the simulation and ask Angr to record unconstrained states.
# 2. Step through the simulation until we have found a state where eip is
#    symbolic.
# 3. Constrain eip to equal the address of the "print_good" function.

import angr
import claripy
import sys

def main(argv):
  path_to_binary = argv[1]
  project = angr.Project(path_to_binary)

  initial_state = project.factory.entry_state() 

  # The save_unconstrained=True parameter specifies to Angr to not throw out
  # unconstrained states. Instead, it will move them to the list called
  # 'simulation.unconstrained'.
  simulation = project.factory.simgr(initial_state, save_unconstrained=True)

  # Explore will not work for us, since the method specified with the 'find'
  # parameter will not be called on an unconstrained state. Instead, we want to
  # explore the binary ourselves.
  # To get started, construct an exit condition to know when we've found a
  # solution. We will later be able to move states from the unconstrained list
  # to the simulation.found list. Alternatively, you can create a boolean value
  # that serves the same purpose.
  
  # We will set this to the exploitable state once we find it.
  solution_state = None
  def has_found_solution():
    return solution_state is not None

  # Check if there are still unconstrained states left to check. Once we
  # determine a given unconstrained state is not exploitable, we can throw it
  # out. Use the simulation.unconstrained list.
  def has_unconstrained():
    return len(simulation.unconstrained) > 0

  # The list simulation.active is a list of all states that can be explored
  # further.
  def has_active():
    return len(simulation.active) > 0

  while (has_active() or has_unconstrained()) and (not has_found_solution()):
    # Check every unconstrained state that the simulation has found so far.
    
    for unconstrained_state in simulation.unconstrained:
      eip = unconstrained_state.regs.eip

      # Check if we can set the state to our print_good function.
      if unconstrained_state.satisfiable(extra_constraints=(eip == 0x42585249,)):
        # We can!
        solution_state = unconstrained_state

        # Now, constrain eip to equal the address of the print_good function.
        ex_constraints = eip == 0x42585249
        solution_state.add_constraints(ex_constraints)
        break

    # Since we already checked all of the unconstrained states and did not find
    simulation.drop(stash='unconstrained')

    # Advance the simulation.
    simulation.step()

  if solution_state:
    # Ensure that every printed byte is within the acceptable ASCII range (A..Z)
    for byte in solution_state.posix.files[sys.stdin.fileno()].all_bytes().chop(bits=8):
      solution_state.add_constraints(byte >= 'A', byte <= 'Z')
    
    solution = solution_state.posix.dumps(sys.stdin.fileno())
    print solution
  else:
    raise Exception('Could not find the solution')

if __name__ == '__main__':
  main(sys.argv)