rvalue references: initialization
C++0x introduces a new kind of reference, the rvalue reference, with the syntax Type&& and const Type&& . The current C++0x Working Draft, N2798 8.3.2/2, says: "A reference type that is declared using & is called an lvalue reference, and a reference type that is declared using && is called an rvalue reference. Lvalue references and rvalue references are distinct types. Except where explicitly noted, they are semantically equivalent and commonly referred to as references." This means that your intuition for C++98/03 references (now known as lvalue references) translates over to rvalue references; all you have to learn are the differences between them.
(Note: I've settled on pronouncing Type& as "Type ref" and Type&& as "Type ref ref". They're fully known as "lvalue reference to Type" and "rvalue reference to Type", respectively, just like how "const pointer to int" is written as int * const and can be pronounced as "int star const".)
What are the differences? Compared to lvalue references, rvalue references behave differently during initialization and overload resolution. They differ in what they are willing to bind to (i.e. initialization) and what prefers to bind to them (i.e. overload resolution). Let's look at initialization first:
· We've already seen how the modifiable lvalue reference, Type& , is willing to bind to modifiable lvalues, but not to anything else (const lvalues, modifiable rvalues, const rvalues).
· We've already seen how the const lvalue reference, const Type& , is willing to bind to everything.
· The modifiable rvalue reference, Type&& , is willing to bind to modifiable lvalues and modifiable rvalues, but not to const lvalues and const rvalues (which would violate const correctness).
· The const rvalue reference, const Type&& , is willing to bind to everything.
These rules may sound arcane, but they're derived from two simple rules:
· Obey const correctness by preventing modifiable references from binding to const things.
· Avoid accidentally modifying temporaries by preventing modifiable lvalue references from binding to modifiable rvalues.
If you like reading compiler errors instead of reading English, here's a demonstration:
C:/Temp>type initialization.cpp
#include <string>
using namespace std;
string modifiable_rvalue() {
return "cute";
}
const string const_rvalue() {
return "fluffy";
}
int main() {
string modifiable_lvalue("kittens");
const string const_lvalue("hungry hungry zombies");
string& a = modifiable_lvalue; // Line 16
string& b = const_lvalue; // Line 17 - ERROR
string& c = modifiable_rvalue(); // Line 18 - ERROR
string& d = const_rvalue(); // Line 19 - ERROR
const string& e = modifiable_lvalue; // Line 21
const string& f = const_lvalue; // Line 22
const string& g = modifiable_rvalue(); // Line 23
const string& h = const_rvalue(); // Line 24
string&& i = modifiable_lvalue; // Line 26
string&& j = const_lvalue; // Line 27 - ERROR
string&& k = modifiable_rvalue(); // Line 28
string&& l = const_rvalue(); // Line 29 - ERROR
const string&& m = modifiable_lvalue; // Line 31
const string&& n = const_lvalue; // Line 32
const string&& o = modifiable_rvalue(); // Line 33
const string&& p = const_rvalue(); // Line 34
}
C:/Temp>cl /EHsc /nologo /W4 /WX initialization.cpp
initialization.cpp
initialization.cpp(17) : error C2440: 'initializing' : cannot convert from 'const std::string' to 'std::string &'
Conversion loses qualifiers
initialization.cpp(18) : warning C4239: nonstandard extension used : 'initializing' : conversion from 'std::string' to 'std::string &'
A non-const reference may only be bound to an lvalue
initialization.cpp(19) : error C2440: 'initializing' : cannot convert from 'const std::string' to 'std::string &'
Conversion loses qualifiers
initialization.cpp(27) : error C2440: 'initializing' : cannot convert from 'const std::string' to 'std::string &&'
Conversion loses qualifiers
initialization.cpp(29) : error C2440: 'initializing' : cannot convert from 'const std::string' to 'std::string &&'
Conversion loses qualifiers
It's okay for modifiable rvalue references to bind to modifiable rvalues; the whole point is that they can be used to modify temporaries.
Although lvalue references and rvalue references behave similarly during initialization (only lines 18 and 28 above differ), they increasingly diverge during overload resolution.
rvalue references: overload resolution
You're already familiar with how functions can be overloaded on modifiable and const lvalue reference parameters. In C++0x, functions can also be overloaded on modifiable and const rvalue reference parameters. Given all four overloads of a unary function, you shouldn't be surprised to discover that each expression prefers to bind to its corresponding reference:
C:/Temp>type four_overloads.cpp
#include <iostream>
#include <ostream>
#include <string>
using namespace std;
void meow(string& s) {
cout << "meow(string&): " << s << endl;
}
void meow(const string& s) {
cout << "meow(const string&): " << s << endl;
}
void meow(string&& s) {
cout << "meow(string&&): " << s << endl;
}
void meow(const string&& s) {
cout << "meow(const string&&): " << s << endl;
}
string strange() {
return "strange()";
}
const string charm() {
return "charm()";
}
int main() {
string up("up");
const string down("down");
meow(up);
meow(down);
meow(strange());
meow(charm());
}
C:/Temp>cl /EHsc /nologo /W4 four_overloads.cpp
four_overloads.cpp
C:/Temp>four_overloads
meow(string&): up
meow(const string&): down
meow(string&&): strange()
meow(const string&&): charm()
In practice, overloading on Type& , const Type& , Type&& , and const Type&& is not very useful. A far more interesting overload set is const Type& and Type&& :
C:/Temp>type two_overloads.cpp
#include <iostream>
#include <ostream>
#include <string>
using namespace std;
void purr(const string& s) {
cout << "purr(const string&): " << s << endl;
}
void purr(string&& s) {
cout << "purr(string&&): " << s << endl;
}
string strange() {
return "strange()";
}
const string charm() {
return "charm()";
}
int main() {
string up("up");
const string down("down");
purr(up);
purr(down);
purr(strange());
purr(charm());
}
C:/Temp>cl /EHsc /nologo /W4 two_overloads.cpp
two_overloads.cpp
C:/Temp>two_overloads
purr(const string&): up
purr(const string&): down
purr(string&&): strange()
purr(const string&): charm()
How does this work? Here are the rules:
· The initialization rules have veto power.
· Lvalues strongly prefer binding to lvalue references, and rvalues strongly prefer binding to rvalue references.
· Modifiable expressions weakly prefer binding to modifiable references.
(By "veto", I mean that candidate functions which would involve forbidden bindings of expressions to references are immediately deemed to be "non-viable" and are removed from further consideration.) Let's walk through the process of applying the rules.
· For purr(up) , the initialization rules veto neither purr(const string&) nor purr(string&&) . up is an lvalue, so it strongly prefers binding to the lvalue reference purr(const string&) . up is modifiable, so it weakly prefers binding to the modifiable reference purr(string&&) . The strongly preferred purr(const string&) wins.
· For purr(down) , the initialization rules veto purr(string&&) due to const correctness, so purr(const string&) wins by default.
· For purr(strange()) , the initialization rules veto neither purr(const string&) nor purr(string&&) . strange() is an rvalue, so it strongly prefers binding to the rvalue reference purr(string&&) . strange() is modifiable, so it weakly prefers binding to the modifiable reference purr(string&&) . The doubly preferred purr(string&&) wins.
· For purr(charm()) , the initialization rules veto purr(string&&) due to const correctness, so purr(const string&) wins by default.
The important thing to notice is that when you overload on const Type& and Type&& , modifiable rvalues bind to Type&& , while everything else binds to const Type& . Therefore, this is the overload set for move semantics.
Important note: functions returning by value should return Type (like strange()) instead of const Type (like charm()). The latter buys you virtually nothing (forbidding non-const member function calls) and prevents the move semantics optimization.
move semantics: the pattern
Here's a simple class, remote_integer, that stores a pointer to a dynamically allocated int . (This is "remote ownership".) Its default constructor, unary constructor, copy constructor, copy assignment operator, and destructor should all look very familiar to you. I've additionally given it a move constructor and move assignment operator. They're guarded by #ifdef MOVABLE so that I can demonstrate what happens with and without them; real code won't do this.
C:/Temp>type remote.cpp
#include <stddef.h>
#include <iostream>
#include <ostream>
using namespace std;
class remote_integer {
public:
remote_integer() {
cout << "Default constructor." << endl;
m_p = NULL;
}
explicit remote_integer(const int n) {
cout << "Unary constructor." << endl;
m_p = new int(n);
}
remote_integer(const remote_integer& other) {
cout << "Copy constructor." << endl;
if (other.m_p) {
m_p = new int(*other.m_p);
} else {
m_p = NULL;
}
}
#ifdef MOVABLE
remote_integer(remote_integer&& other) {
cout << "MOVE CONSTRUCTOR." << endl;
m_p = other.m_p;
other.m_p = NULL;
}
#endif // #ifdef MOVABLE
remote_integer& operator=(const remote_integer& other) {
cout << "Copy assignment operator." << endl;
if (this != &other) {
delete m_p;
if (other.m_p) {
m_p = new int(*other.m_p);
} else {
m_p = NULL;
}
}
return *this;
}
#ifdef MOVABLE
remote_integer& operator=(remote_integer&& other) {
cout << "MOVE ASSIGNMENT OPERATOR." << endl;
if (this != &other) {
delete m_p;
m_p = other.m_p;
other.m_p = NULL;
}
return *this;
}
#endif // #ifdef MOVABLE
~remote_integer() {
cout << "Destructor." << endl;
delete m_p;
}
int get() const {
return m_p ? *m_p : 0;
}
private:
int * m_p;
};
remote_integer square(const remote_integer& r) {
const int i = r.get();
return remote_integer(i * i);
}
int main() {
remote_integer a(8);
cout << a.get() << endl;
remote_integer b(10);
cout << b.get() << endl;
b = square(a);
cout << b.get() << endl;
}
C:/Temp>cl /EHsc /nologo /W4 remote.cpp
remote.cpp
C:/Temp>remote
Unary constructor.
8
Unary constructor.
10
Unary constructor.
Copy assignment operator.
Destructor.
64
Destructor.
Destructor.
C:/Temp>cl /EHsc /nologo /W4 /DMOVABLE remote.cpp
remote.cpp
C:/Temp>remote
Unary constructor.
8
Unary constructor.
10
Unary constructor.
MOVE ASSIGNMENT OPERATOR.
Destructor.
64
Destructor.
Destructor.
There are several things to notice here.
· The copy and move constructors are overloaded, and the copy and move assignment operators are overloaded. We've already seen what happens to functions overloaded on const Type& and Type&& . This is what allows b = square(a); to automatically select the move assignment operator when it's available.
· Instead of dynamically allocating memory, the move constructor and move assignment operator simply steal it from other . When stealing, we copy other's pointer and then null it out. When other is destroyed, its destructor will do nothing.
· Both the copy and move assignment operators need self-assignment checks. It's well-known why copy assignment operators need self-assignment checks. This is because plain old data types like ints can be assigned to themselves harmlessly (e.g. with x = x;), so user-defined data types should also be harmlessly self-assignable. Self-assignment virtually never happens in handwritten code, but it can easily happen inside algorithms like std::sort() . In C++0x, algorithms like std::sort() can move elements around instead of copying them. The same potential for self-assignment exists here.
At this point, you may be wondering how this interacts with automatically generated ("implicitly declared" in Standardese) constructors and assignment operators.
· Move constructors and move assignment operators are never implicitly declared.
· The implicit declaration of a default constructor is inhibited by any user-declared constructors, including copy constructors and move constructors.
· The implicit declaration of a copy constructor is inhibited by a user-declared copy constructor, but not a user-declared move constructor.
· The implicit declaration of a copy assignment operator is inhibited by a user-declared copy assignment operator, but not a user-declared move assignment operator.
Basically, the automatic generation rules don't interact with move semantics, except that declaring a move constructor, like declaring any constructor, inhibits the implicitly declared default constructor.