Static Linking
- Programs are translated and linked using a compiler driver
- Source files
- Separately compiled relocatable object files
- Fully linked executable object file
Why linkers
- Modularity
- Program can be written as a collection of smaller source files, rather than one monolithic mass
- Can build libraries of common functions (more on this later)
- e.g., Math library, standard C library
- Efficiency
- Time: Separate compilation
- Change one source file, compile, and then relink
- No need to recompile other source files
- Space: Libraries
- Common functions can be aggregated into a single file
- Yet executable files and running memory images contain only code for the functions they actually use
- Time: Separate compilation
What linkers do
- Symbol resolution
- Programs define and reference symbols (global variables and functions)
- Symbol definitions are stored in object file (by assembler) in symbol table
- Symbol table is an array of structs
- Each entry includes name, size, and location of symbol
- During symbol resolution step, the linker associates each symbols with exactly one symbol definition
- Relocation
- Merges separate code and data sections into single sections
- Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable
- Updates all references to these symbols to reflect their new positions
Three Kinds of Object Files (Modules)
- Relocatable object file (.o file)
- Contains code and data in a form that can be combined with other relocatable object files to form executable object file
- Each .o file is produced from exactly one source (.c) file
- Contains code and data in a form that can be combined with other relocatable object files to form executable object file
- Executable object file (a .out file)
- Contains code and data in a form that can be copied directly into memory and then executed
- Shared object file (.so file)
- Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time
- Called Dynamic Link Libraries (DLLs) by Windows
Executable and Linkable Format (ELF)
- Standard binary format for object files
- One unified format for
- Relocatable object files (.o)
- Executable object files (a.out)
- Shared object files (.so)
- Generic name: ELF binaries
ELF Object File Format
- Elf header
- Word size, byte ordering, file type (.o, exec, .so), machine type, etc.
- Segment header table
- Page size, virtual addresses memory segments (sections), segment sizes
- .text section
- Code
- .rodata section
- Read only data: jump tables, …
- .data section
- Initialized global variables
- .bss section
- Uninitialized global variables
- “Block Started by Symbol”
- “Better Save Space”
- Has section header but occupies no space
- .symtab section
- Symbol table
- Procedure and static variable names
- Section names and locations
- .rel .text section
- Relocation info for .text section
- Address of instructions that will need to be modified in the executable
- Instructions for modifying
- .rel .data section
- Relocation info for .data section
- Addresses of pointer data that will need to be modified in the merged executable
- .debug section
- Info for symbolic debugging (gcc -g)
- Section header table
- Offsets and sized of each section
Linker Symbols
- Global symbols
- Symbols defined by module m that can be referenced by other modules
- E.g., non-static C functions and non-static global variables
- External symbols
- Global symbols that are referenced by module m but defined by some other module
- Local symbols
- Symbols that are defined and referenced exclusively by module m
- E.g., C functions and global variables defined with the static attribute
- Local linker symbols are not local program variables
Local Symbols
- Local non-static C variables vs. local static C variables
- local non-static C variables: stored on the stack
- local static C variables: stored in either .bss, or .data
How linker resolves duplicate symbol definitions
- Program symbols are either strong or weak
- Strong: procedures and initialized globals
- Weak: uninitialized globals
Linker’s Symbol Rules
- Multiple strong symbols are not allowed
- Each item can be defined only once
- Otherwise: Linker error
- Given a strong symbol and multiple weak symbols, choose the strong symbol
- References to the weak symbol resolve to the strong symbol
- If there are multiple weak symbols, pick an arbitrary one
- Can override this with gcc -fno-common
Global Variables
- Avoid if u can
- Otherwise
- Use static
- Initialize if you define a global variable
- Use extern if you reference an external global variable
Packaging Commonly Used Functions
- How to package functions commonly used by programmers
- Math, I/O, memory management, string manipulation, etc.
- Awkward, given linker framework so far
- Put all functions into a single source file
- Put each functions in a separate source file
Old-fashioned Solution: Static Libraries
- Static libraries (.a archive files)
- Concatenate related relocatable object files into a single file with an index (called an archive)
- Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives
- If an archive member file resolves reference, link it into the executable
- Disadvantages
- Duplication in the stored executables (every function needs libc)
- Duplication in the running executables
- Minor bug fixes of system libraries require each application to explicitly relink
Modern Solution: Shared Libraries
- Shared Libraries
- Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time
- Also called: dynamic link libraries, DLLs, .so files
- Advantages
- Dynamic linking can occur when executable is first loaded and run (load-time linking)
- Dynamic linking can also occur after program has begun (run-time linking)
- Shared library routines can be shared by multiple processes
Library Interpositioning
- Library insterpositioning: powerful linking technique that allows programmers to intercept calls to arbitrary functions
- Interpositioning can occur at
- Compile time: when the source is compiled
- Link time: when the relocatable object files are statically linked to form an executable object file
- Load/run time: when an executable object file is loaded into memory, dynamically linked, and then executed
Some Interpositioning Applications
- Security
- Confinement (sandboxing)
- Behind the scenes encryption
- Debugging
- Code in the SPDY networking stack was writing to the wrong location
- Solved by intercepting calls to Posix write functions (write, writev, pwrite)
- Monitoring and Profiling
- Count number of calls to functions
- Characterize call sites and arguments to functions
- Malloc tracing
- Detecting memory leaks
- Generating address traces