Introduction to Gcc four stages

27 篇文章 0 订阅
19 篇文章 0 订阅

Introduction

This document is a practical introduction to using GCC and related utilities. It explains the different stages of compilation and covers some typical errors that may occur at each stage.

The front-end program gcc

The program gcc does not do compilation; rather, it calls other programs to do the four stages that are involved in compilation. Those stages are:

  • Preprocessing: outputs C code
  • Compiling: outputs assembler for a specific machine
  • Assembling: outputs object code for a specific machine
  • Linking: gathers object files together into executables or shared objects
To see this in action, use the overall option (can be found in the info page for gcc) that will print the stages of the compilation as they are done. Compile any program of your choosing as an example, and make sure you understand the output, as your demonstrator will ask you to explain it. The following sections explaining each of the four stages will help. In each of them you will be directed to stop gcc at that stage, and it will always be possible to specify an output file with the normal gcc option for that purpose, no matter what kind the output is.

The C preprocessor cpp

While it is possible to use cpp directly, the easiest way is still by using gcc. Find the overall option for gcc that will cause it to stop after the preprocessing stage, and use this to get the result of running the preprocessor on your program. Make sure you have at least one system #include and one #define so you can see the result of these, even if it means making a boring or strange program.The #include will probably cause the output, which is sent to standard out, to be quite long, so you may want to redirect it to or specify an output file. There may also be many blank lines, and so it can be good to view the file or the output directly using
less -s
which will eliminate many of those blank lines. A further problem may be lines which contain only whitespace, and this can be corrected by piping through
sed 's/^ *$//'
before using less -s.

Now modify your program to contain a #include that you have written yourself, and have a look at the output. Next, put your header file in a directory below the current one (creating a new subdirectory if you have to), but do not modify the #include. The preprocessing should stop working, and if you try a full compilation that shouldn't work either. It is often the case, however, that header files are in a different directory, and there is a directory option which will make gcc look in your subdirectory for the header file. Use this option to make your program compile again. There is also a preprocessor option that defines a macro, which is equivalent to using a #define. Remove your #define, and try to get the same effect with the command line option. This option is particularly useful if you want to vary your code at compile time without modifying the source files. Specifying extra debugging output is a typical use.

Milestone: Show your demonstrator the command line options that you use to preprocess a source file and to define a macro. Also show your demonstrator the output of these commands.

The C compiler

While the preprocessor is sometimes run directly, the compiler proper, which generates assembly code, never is. Like the earlier section, look for an overall option that stops gcc after compilation proper. This time, however, the output will be in a file ending in '.s', which is the usual suffix for assembler files. To make the exercise more interesting, make sure your program involves more than one .c file. Having main() in one file with a function call to a function in another file would be sufficient. Compile both files to assembler, and find the function call from main(). If your are compiling on an Intel compatible machine a 'call' instruction will have been used, and the name of your function should be in the assembly code. You should notice, however, that the call is the only place in the main file that mentions the function. In other words, for the call to work it must jump to the code in the other file. This is not really possible, and so what is needed is to join the two files into one so that all the required code is gathered together. This is what the linker does, and it is discussed below.

Try compiling now with the debugging option which turns on debugging information. The assembler files will be much larger now, and will probably be quite confusing, but the instructions that were in the original version, which are the instructions that actually carry out your program, should still be there. Look for them (the 'call' instruction should be easy to find), and you will notice they have been divided up into small groups, sometimes of only one instruction, surrounded by some special code containing numbers. These numbers are the line numbers of each instruction in your original C program, and will be in '.loc' directives on Intel machines. Seeing which line in your C program corresponds to a particular step at runtime is an essential debugging tool, and this is the only way it can be made to work. The assembly code which was compiled without debugging information did not mention line numbers at all, and there is no way to get that information back at runtime. To simplify the rest of this tutorial it will better to compile without debugging information.

Milestone: Show your demonstrator that you can compile multi-file programs using gcc. Show that you know the command line for making gcc stop at the assembly language stage.

Assembling

Recall at the start of this tutorial you had gcc display the steps it executed. One of them takes the.s file and makes it into a .o, although the names of these files would have been random strings. This step is assembling, and the name of the assembler is probably as. Unlike the other steps in compilation, the assembler is designed to be run directly as we sometimes want to write assembly code rather than C code. Do this now, and assemble any of the .s files from the previous section. By default,the assembler will output a file called a.out, but this will not be executable; it is simply a default name. As mentioned earlier, the output name can be changed with the right option, which is the same for as and gcc. No matter what it is called, the result of assembling is an object file.

If you look at a.out you will find that it isa binary file, and mostly does not make any sense. Only mostly because you might find some text strings corresponding to text strings in your program, but you will not easily be able to see any instructions. That is because they have been encoded as machine code which, as you probably know, are special numbers understood by the CPU in the computer as standing for particular instructions. When discussion compilation, we usually refer to the contents of the file as object code, which includes the machine code and other data.

There is a utility for looking at object files, and it is called objdump, meaning object dump. There probably isn't an info page about it, only a man page, although info may display the man page if you try it. The object dump utility can do many things, but for the moment we only want to disassemble which, as the name suggests, attempts a reversal of the assembling. It will take the machine code and translate back to some assembler code which should be much like the code in the original .s file. Try this and compare the result with the original file.

Once you have understood the operation of the assember and the nature of the object file, use gcc to generate the .o files for the two files mentioned earlier. There is an option which will do compilation and assembly but not linking that you can use.

Milestone: Show your demonstrator the object code of your program, as well as the result of using objdump. Point out differences.

The linker ld

The linker is the final step in the work of gcc, and if it is called directly we use the name ld, but it may be called collect when used by gcc.All the linker really does is join object files together, so that wherea name in one file refers to a label in another, the use of the name will correspond to the code at that label when the files are joined. Since the function call is a typical example, run ld with the both .o files as arguments. You will probably get a strange warning about '_start', but there will still bea default output file called a.out. This output, however, is not yet a working executable, and it will probably crash if you run it. So instead of running it, disassemble it as before since it is still just object code, and you will see that the code which was in two files earlier is now gathered together in this file, and that the function name has been replaced by a number which corresponds to the number marking where the function begins. The linker will probably have kept the function name to make this easier to see.

The conventional way of calling the linker is through gcc, however, and will happen automatically if you give .o files as the arguments. Do this now with both .o files together, and the a.out file you get this time will be an executable. You can run it if you like, but it is more interesting to disassemble it again. This time you will see a lot of extra assembly code but you should still be able to find the code for your program.The extra code is what is required to start a program in the operating system,which is what makes the a.out fileexecutable now. It is a standard library in the system and often has 'crt' in its name. You will notice in the step breakdown of gcc that it uses the complete path to this and other files.

Milestone: Run your program. Show your demonstrator the output of objdump. Point out differences between this disassembly and the one from the previous milestone.

It is common to use libraries in programming of course, but they usually need to be specified. Amongst the linker options there is one for library searching, and it is called searching because you do not need to give the path of the library; the compiler will find it for you. Sometimes, however, the compiler will not know the right directory to search in, and there is a directory option to help with this, like there was one for the preprocessor. Since libraries are only collections of object code, these options only have an effect during linking.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值