The cost of virtual functions is often invoked as a reason to C++’s poor performance compared to other languages, especially C. This is an enduring myth that, like most myths, have always bugged me. C++ myths are propagated by individuals that did not know C++ very well, tried it one weekend in 1996, used a bad compiler, knew nothing about optimization switches, and peremptorily declared C++ as fundamentally broken. Well, I must agree that C++ compilers in the mid-90s weren’t all that hot, but in the last fifteen years, a lot have been done. Compilers are now rather good at generating efficient C++ code.
However, the cost of calls, whether or not they are virtual, is not dominated by the the call itself (getting the address to jump to and jumping) but by everything else surrounding the call, like the stack setup and argument passing. Let us debunk that myth by looking at what types of calls are available in C and C++, how they translate to machine code, and see how faster or slower they are relative to each other.
Function call methods. C and C++ offer the following call methods:
Direct. This is the normal function call, where the address of the function is known at compile-time—or patched at link time, in either cases, the address is fixed in the code at run-time. The call itself will consist of stack set up (we’ll be back later on this) and of the call on a constant.
Indirect. This is usually known as call by function pointer, where the address of the function is unknown at compile-time but held in a variable that may be modifiable by the program at run-time. Example of these are callbacks, and object oriented programming, which is entirely possible (if painful) to do in C. Function pointers will have their natural use in a number of design patterns such as visitor, observer, or just event handling.
Inline. inline is a compiler directive to instruct the compiler to eliminate the function call by placing the function’s body at the point of call. By doing so, the compiler eliminates the overhead of setting up the stack and doing the call, but at the possible cost of code expansion. The compiler takes inline as a hint, as it will apply heuristic to determine whether or not inlining the function will result in performance gain. If the compiler estimates a positive gain, the function is inlined, if the compiler estimates a negative gain, the function is called normally. This is indeed what the standard says the compiler should do (C++ ISO 14882:1998 § 7.1.2). In C and C++, inline also modifies the function’s storage semantics, preventing the function modified by inline from having external storage (C99, ISO 9899:1999 § 6.7.4). However, the compilers are free to instantiate a hidden copy of the inline function to make normal calls.
Basically, the compiler does what it wants with the keyword inline, treating it like register, a hint for optimization, and nothing more. However, compilers are rather smart when you use proper optimization switches, they may well inline the function you asked for, and even other functions. Generally, a function with one or two arguments with just two or three lines of code will be inlined automagically.
Virtual. In Object Oriented Programming, inheritance and polymorphism warrant the use of indirect function calls so that the correct function is called even though a given object is manipulated as a base type (with a is-a relation). Virtual functions ensures that the right function is called for the actual object type, regardless of as what type it is manipulated, greatly simplifying code and helping to ensure correctness.
Implementation may vary from language to language, but in C++, for each object having virtual functions, there’s a hidden pointer to a table that is created for the class, the VMT, the virtual method table that holds the function pointers. Each time a virtual method (method is object oriented lingo for “member function”) is called on an object, the address of the function is retrieved from the table. This is clearly only a special case of indirect calls, only that the function pointers are hidden and that the compiler generates all the necessary code and storage to manipulate them transparently. In particular, the compiler generates the code to fill the table when the object’s constructor is called.
The mechanics of function calls. Although the particular may somewhat vary from processor to processor, and from compiler to compiler, calls are performed more or less the same way because C and C++ enforce a particular type of stack management. In C, and C++, the stack management is split between caller (the one that performs the call) and the callee (the function that is called).
The caller pushes the arguments on the stack in reverse order, then performs the call. The call instruction pushes the current instruction address on the stack, and this address will become the return address, where the code is to resume execution when the function call returns. Upon return, it is the caller’s task to clean up the stack by popping (unstacking) the arguments.
The callee will set its own stack environment, ensuring that the stack is restored when it exits, so that the at the top of the stack the return address is found, allowing the code to resume normally, returning to the caller. The callee will only modify the stack in its own frame, using it to store its local variables.
Stack manipulation is supported by a set of specialized instructions which vary depending on the processor, but they basically offer push, pop, and clean up (a massive pop that simply discards pushed data). The call instruction is basically a push-jump instruction, where the current address (or the address of the next instruction) is pushed on the stack before jumping to the specified address. The return instruction is therefore a pop-jump instruction, where the target address is retrieved from the top of the stack (popped) before jumping at the retrieved address.
So, to perform a call, the caller evaluates the arguments and push the resulting values in reverse order on the stack before calling the function. The function creates its own local storage, accesses the arguments, cleans up its local storage, returns. The caller then cleans up the arguments from the stack, and program execution resumes.
You may ask yourself why the the caller must push the arguments in reverse order. Well, the reason is that C (and C++) allows the use of the ellipsis, the feared and troublesome … that one finds in function such as printf, and that is used to pass an indefinite number of arguments to a function. The prototype of printf is:
1
int printf(const char *format, …);
And only the position of the first argument must be known. Pushing it last, it means that it is a known position, that is, the top of the stack (just under the return address).
The cost of calls. So, clearly, the cost of a call, by itself, is not much:
Direct. If the address is known at compile time, the call is merely a push with a direct jump at a known address embedded in the instruction. The penalty incurred is processor-specific: whether it breaks the address prediction, flushes the pipeline, jumps to an address for which the code is not in cache, etc, but those penalties apply to all types of calls.
Indirect., If the call is indirect, we add the extra cost of fetching the address from a memory location. Clearly, using a simple function pointer is not going to cost much more than adding a single read from memory. Typically, a function call such as
1
fonction_ptr(arg1,arg2);
will translate into this x86 code:
1
2
3
4
push arg2
push arg1
call function_ptr // loads the value from
// memory location
where function_ptr may expand to a complex address generation instruction, like eds:[ebx+some_offset], but that is not very costly as processors are optimized to handle those efficiently. Of course, if you have the idea of doing something like:
1
function_ptrthingie->method[arg3];
the cost increases dramatically, as a lot of operations are needed to get the destination address. This may be useful, but you can’t complain about the performance loss.
Inline. Inlined function have no call cost. They may perturb the optimization of your code and have an adverse effect, but the call code is removed. However, that does not dispense one from evaluating the arguments, which is always costly. Also, the compiler may decide not to inline the function at all, in which case it will behave exactly as a normal direct call.
Virtual Functions. Virtual functions are only indirect calls, but the pointer is stored in a table for the class rather than some other location in memory. The following:
1
an_object->virtual_function(arg1,arg2);
will compile to something similar to:
1
2
3
4
5
push arg2
push arg1
// esi almost always contains THIS already
mov edx,[esi+vmt_offset] // loads the vmt pointer (*)
call [edx+method_offset*sizeof_ptr] // calls from memory location
where the vmt_offset is a constant, determined at compile time. The instructions may even be optimized further using the fact that the this pointer is already in a register, or using more complex instructions such as lea (for load effective address, a “do it all” address calculation instruction provided by x86 CPUs). Moreover, the line marked by (*) may be simplified if the VMT is stored as the beginning of the object, as it is with G++.
But the cost of a call, whatever the flavor, is dominated by the evaluation of its arguments. We saw that indirect calls, whether C-style or C++ virtual methods, are inherently inexpensive. A call to a virtual method is not any more expensive than an indirect call using a struct member (something->function(arg1,arg2)) so deeming virtual function as incredibly slow is just misinformed.
However, care must be exercised when comparing performances between languages. One must understand exactly what the programs in different language do exactly, and how the semantics—the subtle, tenuous differences—impact program performance. Of course, if a minimal translation of a program in language A to language B doesn’t do the same thing, any conclusions drawn on performance are moot.