The Makefile and Compilation Tutorial
The first thing to remember about compiling, as with all things in computers, is Don't Panic! This will become second nature all too soon. The second thing to know about compiling is that when you write a program, you generally have nothing more than a text file which is no different to the computer than a letter that you typed to your great aunt Betsy. The computer has no idea what to do with a text file. To the computer, the text file is just data. You've got to compile your program into an executable code which the computer can understand. (There are such things as interpreted languages — languages where your code is compiled as it is executed, but they don't count here since you don't compile them yourself.)
The third thing to know about compiling is that there are generally three distinct steps in compiling some code to an executable. They are:
- Compiling the code to Assembly code
- This step is usually done transparently as most compilers perform it and then invoke the assembler themselves, so you don't really have to worry about it. It can be useful later on if you're trying to perform optimizations and you want to see how your compiler actually implements your code, if you can get your compiler to dump the assembly code that it generates. Other than that, just be aware that it happens and don't bother about it for a while.
- Assembling the Assembly code to object code
- As you should know by now, a computer executes binary codes in its CPU. In order to run your program, your program will need to be translated into these binary codes. This is what this stage is about. The generated assembly (normally never actually present on your system but just passed from the compiler to the assembler) is translated into binary code.
- Some compilers will translate the assembly code directly into an executable file, and others will leave it as what is known as object code. We're only concerned with those that create object code.
- Object code is executable code which isn't properly organized to run. It may be missing functions that it calls, it may be missing global variables, things like that. Generally, in C and C++, object code is stored in files with the extension .o. Thus if you compiled your code into object code but didn't link it, you would go from a program called hello_world.c to hello_world.o.
- Linking the object code together into an executable
- If you have several object files (though you technically only need one), you can perform what is known as linking. This is the process of resolving dependencies between the object files, resolving dependencies of the object files on external libraries, and resolving addresses of global variables and such. In simple terms, linking is the process of taking code which is in executable form and turn it into code which can actually be run by your computer on your operating system.
Example: Suppose the source code for an executable binary file named project.exe is stored in a file named project.cpp. The program #includes a library's header file named lib.h and the library's implementation file is named lib.cpp. Please download source_lab04_project.cpp and rename it to project.cpp. Then, do the same for source_lab04_lib.h and source_lab04_lib.cpp (renaming them to lib.h and lib.cpp, respectively).
The directive #include "lib.h" causes the compiler (actually, the preprocessor) to insert the declaration of the prototype for function doSomething(int x) into project.cpp. However, the definition of doSomething(int x) is in lib.cpp, not in project.cpp.
Translating a program like project.cpp to produce a binary executable program involves three phases:
- Compiling, in which C++ code is translated into assembly;
- Assembling, in which assembly is translated into machine language; and
- Linking, in which calls to functions outside the main file are linked to their definitions.
- Step 1:
- Separately compile the program file name.cpp to produce an object file named name.o in some systems. This may require using a special compiler switch, as in the GNU C++ command:
g++ -c project.cpp
project.cpp → project.o
- Step 2:
- Separately compile each library implementation file in the same manner; for example:
g++ -c lib.cpp
lib.cpp → lib.o
- Step 3:
- Link the object files together into a binary executable program — for example, in GNU C++ with the command:
g++ project.o lib.o -o project.exe
project.o → project.exe lib.o →
If a program involves several different libraries, it can be very difficult to keep track of which files are out of date, which need to be recompiled, and so on. Single commands like the preceding that do this automatically are very usefull in this regard. With GNU C++, UNIX's make utility can be used to execute a file named Makefile that contains the commands for the compilations and linking.
UNIX's make Utility
make is a system designed to create programs from large source code trees and to maximize the efficiency of doing so. To that effect, make uses a file in each directory called a Makefile. This file contains instructions for make on how to build your program and when.
There are three important parts to a Makefile: the target, the dependencies, and the instructions. Just so that you know what this will look like, it's of the form:
Using the make utility requires a programmer to create a special file named Makefile, from which the make program reads information. A Makefile consists of pairs of lines (each pair governs the updating of one file).
- Upon what files does project.exe depend?
→ project.o and lib.o
(This means project.exe won't compile without project.o and lib.o)
- Upon what files does project.o depend?
→ project.cpp and lib.h
- Upon what files does lib.o depend?
→ lib.cpp and lib.h
The first line of each line-pair in a Makefile has the form:
TargetFile: DependencyFile1 DependencyFile2 ... DependencyFilen
where TargetFile is the file that needs to be updated, and each Dependencyi is a file upon which TargetFile depends.
The second line of the pair is a UNIX command to make TargetFile. The command must be preceded by a TAB and end with a Return.
To illustrate: The first line-pair in our Makefile appears as follows:
project.exe: project.o lib.o g++ project.o lib.o -o project.exe
Note that the first line specifies the dependencies of project.exe, and the second line is the UNIX command to make project.exe.
Of course, project.o won't exist the first time we compile, so we should specify a line-pair for it, too:
project.o: project.cpp lib.h g++ -c project.cpp
We should then do the same thing for lib.o:
lib.o: lib.cpp lib.h g++ -c lib.cpp
The Makefile thus appears as follows:
project.exe: project.o lib.o g++ project.o lib.o -o project.exe project.o: project.cpp lib.h g++ -c project.cpp lib.o: lib.cpp lib.h g++ -c lib.cpp
Now, when a user types
the program reads the Makefile, and:
- Sees that project.exe depends upon project.o, and
- Checks project.o, which depends on project.cpp and lib.h;
- Determines whether project.o is out of date;
- If so, it executes the command to make project.o:
g++ -c project.cpp;
- Sees that project.exe also depends upon lib.o, and
- Checks lib.o, which depends on lib.cpp and lib.h;
- Determines whether lib.o is out of date;
- If so, it executes the command to make lib.o:
g++ -c lib.cpp;
- Sees that everything on which project.exe depends is now up to date, and so executes the command to make project.exe:
g++ project.o lib.o -o project.exe
- While a Makefile usually consists of pairs of lines, there can in fact be any number of commands after the line specifying the dependencies.
Example: We could write,
project.exe: project.o lib.o g++ project.o lib.o -o project.exe rm project.o rm lib.o
This would automatically remove the object files after project.exe is made.
- The make utility also allows a user to specify what is to be made:
uname% make lib.o
will operate using lib.o as its primary TargetFile instead of project.exe.
- A TargetFile need not have any dependencies. This, combined with (1) and (2) allows make to be used for all kinds of non-compilation activities.
Example: Suppose our Makefile contains the following lines:
clean: rm -f project.exe *.o *~ *#
and the user types
uname% make clean
What happens? (This is a fool-proof way to clean up a messy directory.)
- make is coordinated with emacs. When an emacs user types the command:
emacs responds with
Compile command: make -k
If a Makefile is in the directory containing the file on which you are working, then pressing the Return key will execute make using that Makefile.
The make utility eliminates the complexity of separate compilation by determining what files are out of date and re-making them. Learning to use it effectively can save a great deal of time, especially on projects that have several files.
- There can be only one file named Makefile in a directory.
- Since a Makefile coordinates the translation of one project, each project should be stored in its own dedicated directory, with a separate Makefile to coordinate its translation. Doing so allows you to remove the object files, binary executables, etc., because to remake the project, you need only cd to the directory and type make.
- An added benefit is that all the files for one project are confined within one directory, making it easier to port the project to a different machine. (Just copy the directory to the new machine, cd to the directory, and type make).
Hand In: This lab handout with the answers filled in attached to a listing of your final Makefile
(use the enscript command from your Programming Style Sheet to print it out:
enscript -E -G -2rj -M Letter -PECT2_PS <filename>).