From <thinking in C++> Page:481
Passing & returning by value
To understand the need for the copy-constructor, consider the wayC handles passing and returning variables by value during function
calls. If you declare a function and make a function call,
int f(int x, char c);
int g = f(a, b);
how does the compiler know how to pass and return those
variables? It just knows! The range of the types it must deal with is
so small – char, int, float, double, and their variations – that this
information is built into the compiler.
If you figure out how to generate assembly code with your
compiler and determine the statements generated by the function
call to f( ), you’ll get the equivalent of:
push b
push a
call f()
add sp,4
mov g, register a
This code has been cleaned up significantly to make it generic; the
expressions for band awill be different depending on whether the
variables are global (in which case they will be _band _a) or local
(the compiler will index them off the stack pointer). This is also true
for the expression for g. The appearance of the call to f( )will
depend on your name-decoration scheme, and “register a” depends
on how the CPU registers are named within your assembler. The
logic behind the code, however, will remain the same.
In C and C++, arguments are first pushed on the stack from right to
left, then the function call is made. The calling code is responsible
11: References & the Copy-Constructor 481
for cleaning the arguments off the stack (which accounts for the
add sp,4). But notice that to pass the arguments by value, the
compiler simply pushes copies on the stack – it knows how big
they are and that pushing those arguments makes accurate copies
of them.
The return value of f( )is placed in a register. Again, the compiler
knows everything there is to know about the return value type
because that type is built into the language, so the compiler can
return it by placing it in a register. With the primitive data types in
C, the simple act of copying the bits of the value is equivalent to
copying the object.
Passing & returning large objects
But now consider user-defined types. If you create a class and you
want to pass an object of that class by value, how is the compiler
supposed to know what to do? This is not a type built into the
compiler; it’s a type you have created.
To investigate this, you can start with a simple structure that is
clearly too large to return in registers:
//: C11:PassingBigStructures.cpp
struct Big {
char buf[100];
int i;
long d;
} B, B2;
Big bigfun(Big b) {
b.i = 100; // Do something to the argument
return b;
}
int main() {
B2 = bigfun(B);
} ///:~
Decoding the assembly output is a little more complicated here
because most compilers use “helper” functions instead of putting
482 Thinking in C++ www.BruceEckel.com
all functionality inline. In main( ), the call to bigfun( )starts as you
might guess – the entire contents of Bis pushed on the stack. (Here,
you might see some compilers load registers with the address of
the Bigand its size, then call a helper function to push the Bigonto
the stack.)
In the previous code fragment, pushing the arguments onto the
stack was all that was required before making the function call. In
PassingBigStructures.cpp , however, you’ll see an additional
action: the address of B2is pushed before making the call, even
though it’s obviously not an argument. To comprehend what’s
going on here, you need to understand the constraints on the
compiler when it’s making a function call.
Function-call stack frame
When the compiler generates code for a function call, it first pushes
all the arguments on the stack, then makes the call. Inside the
function, code is generated to move the stack pointer down even
farther to provide storage for the function’s local variables.
(“Down” is relative here; your machine may increment or
decrement the stack pointer during a push.) But during the
assembly-language CALL, the CPU pushes the address in the
program code where the function call came from, so the assemblylanguage RETURN can use that address to return to the calling
point. This address is of course sacred, because without it your
program will get completely lost. Here’s what the stack frame looks
like after the CALL and the allocation of local variable storage in
the function:
Function arguments
Return address
Local variables
11: References & the Copy-Constructor 483
The code generated for the rest of the function expects the memory
to be laid out exactly this way, so that it can carefully pick from the
function arguments and local variables without touching the return
address. I shall call this block of memory, which is everything used
by a function in the process of the function call, the function frame.
You might think it reasonable to try to return values on the stack.
The compiler could simply push it, and the function could return
an offset to indicate how far down in the stack the return value
begins.
Re-entrancy
The problem occurs because functions in C and C++ support
interrupts; that is, the languages are re-entrant. They also support
recursive function calls. This means that at any point in the
execution of a program an interrupt can occur without breaking the
program. Of course, the person who writes the interrupt service
routine (ISR) is responsible for saving and restoring all the registers
that are used in the ISR, but if the ISR needs to use any memory
further down on the stack, this must be a safe thing to do. (You can
think of an ISR as an ordinary function with no arguments and
voidreturn value that saves and restores the CPU state. An ISR
function call is triggered by some hardware event instead of an
explicit call from within a program.)
Now imagine what would happen if an ordinary function tried to
return values on the stack. You can’t touch any part of the stack
that’s above the return address, so the function would have to push
the values below the return address. But when the assemblylanguage RETURN is executed, the stack pointer must be pointing
to the return address (or right below it, depending on your
machine), so right before the RETURN, the function must move the
stack pointer up, thus clearing off all its local variables. If you’re
trying to return values on the stack below the return address, you
become vulnerable at that moment because an interrupt could
come along. The ISR would move the stack pointer down to hold
484 Thinking in C++ www.BruceEckel.com
its return address and its local variables and overwrite your return
value.
To solve this problem, the caller couldbe responsible for allocating
the extra storage on the stack for the return values before calling
the function. However, C was not designed this way, and C++
must be compatible. As you’ll see shortly, the C++ compiler uses a
more efficient scheme.
Your next idea might be to return the value in some global data
area, but this doesn’t work either. Reentrancy means that any
function can be an interrupt routine for any other function,
including the same function you’re currently inside. Thus, if you put
the return value in a global area, you might return into the same
function, which would overwrite that return value. The same logic
applies to recursion.
The only safe place to return values is in the registers, so you’re
back to the problem of what to do when the registers aren’t large
enough to hold the return value. The answer is to push the address
of the return value’s destination on the stack as one of the function
arguments, and let the function copy the return information
directly into the destination. This not only solves all the problems,
it’s more efficient. It’s also the reason that, in
PassingBigStructures.cpp , the compiler pushes the address of B2
before the call to bigfun( )in main( ). If you look at the assembly
output for bigfun( ), you can see it expects this hidden argument
and performs the copy to the destination insidethe function.
Function-call stack frame
When the compiler generates code for a function call, it first pushes
all the arguments on the stack, then makes the call. Inside the
function, code is generated to move the stack pointer down even
farther to provide storage for the function’s local variables.
(“Down” is relative here; your machine may increment or
decrement the stack pointer during a push.) But during the
assembly-language CALL, the CPU pushes the address in the
program code where the function call came from, so the assemblylanguage RETURN can use that address to return to the calling
point. This address is of course sacred, because without it your
program will get completely lost. Here’s what the stack frame looks
like after the CALL and the allocation of local variable storage in
the function:
Function arguments
Return address
Local variables
11: References & the Copy-Constructor 483
The code generated for the rest of the function expects the memory
to be laid out exactly this way, so that it can carefully pick from the
function arguments and local variables without touching the return
address. I shall call this block of memory, which is everything used
by a function in the process of the function call, the function frame.
You might think it reasonable to try to return values on the stack.
The compiler could simply push it, and the function could return
an offset to indicate how far down in the stack the return value
begins.