Rvalue References: C++0x Features in VC10, Part 2（上部分）

最新推荐文章于 2024-09-19 13:36:34 发布

muye_fly

最新推荐文章于 2024-09-19 13:36:34 发布

阅读量336

点赞数

文章标签： features c++ string reference extension overloading

Part 1 of this series covered lambdas, auto, and static_assert.

Today, I'm going to talk about rvalue references, which enable two different things: move semantics and perfect forwarding. This post will be long, because I'm going to explain how rvalue references work in great detail. They're initially very confusing because they distinguish lvalues from rvalues, which very few C++98/03 programmers are extensively familiar with.

Fear not, for using rvalue references is easy, much easier than it initially sounds. Implementing either move semantics or perfect forwarding in your own code boils down to following simple patterns, which I will demonstrate. And it's definitely worth learning how to use rvalue references, as move semantics can produce order of magnitude performance improvements, and perfect forwarding makes writing highly generic code very easy.

lvalues and rvalues in C++98/03

In order to understand rvalue references in C++0x, you must first understand lvalues and rvalues in C++98/03.

The terminology of "lvalues" and "rvalues" is confusing because their history is confusing. (By the way, they're just pronounced as "L values" and "R values", although they're written as single words.) These concepts originally came from C, and then were elaborated upon by C++. To save time, I'll skip over their history, including why they're called "lvalues" and "rvalues", and I'll go directly to how they work in C++98/03. (Okay, it's not a big secret: "L" stands for "left" and "R" stands for "right". But the concepts have evolved since the names were chosen, and the names aren't very accurate anymore. Instead of going through the whole history lesson, you can consider the names to be arbitrary like "up quark" and "down quark", and you won't lose anything.)

C++03 3.10/1 says: "Every expression is either an lvalue or an rvalue." It's important to remember that lvalueness versus rvalueness is a property of expressions, not of objects.

Lvalues name objects that persist beyond a single expression. For example, obj , *ptr , ptr[index] , and ++x are all lvalues.

Rvalues are temporaries that evaporate at the end of the full-expression in which they live ("at the semicolon"). For example, 1729 , x + y , std::string("meow") , and x++ are all rvalues.

Notice the difference between ++x and x++ . If we have int x = 0; then the expression x is an lvalue, as it names a persistent object. The expression ++x is also an lvalue. It modifies and then names the persistent object. However, the expression x++ is an rvalue. It copies the original value of the persistent object, modifies the persistent object, and then returns the copy. This copy is a temporary. Both ++x and x++ increment x, but ++x returns the persistent object itself, while x++ returns a temporary copy. That's why ++x is an lvalue, while x++ is an rvalue. Lvalueness versus rvalueness doesn't care about what an expression does, it cares about what an expression names (something persistent or something temporary).

If you want to build up intuition for this, another way to determine whether an expression is an lvalue is to ask "can I take its address?". If you can, it's an lvalue. If you can't, it's an rvalue. For example, &obj , &*ptr , &ptr[index] , and &++x are all valid (even though some of those expressions are silly), while &1729 , &(x + y) , &std::string("meow") , and &x++ are all invalid. Why does this work? The address-of operator requires that its "operand shall be an lvalue" (C++03 5.3.1/2). Why does it require that? Taking the address of a persistent object is fine, but taking the address of a temporary would be extremely dangerous, because temporaries evaporate quickly.

The preceding examples ignore operator overloading, which is convenient syntax for a function call. "A function call is an lvalue if and only if the result type is a reference." (C++03 5.2.2/10) Therefore, given vector<int> v(10, 1729); , v[0] is an lvalue because operator[]() returns int& (and &v[0] is valid and useful), while given string s("foo"); and string t("bar"); , s + t is an rvalue because operator+() returns string (and &(s + t) is invalid).

Both lvalues and rvalues can be either modifiable (non-const) or non-modifiable (const). Here are examples:

string one("cute");

const string two("fluffy");

string three() { return "kittens"; }

const string four() { return "are an essential part of a healthy diet"; }

one; // modifiable lvalue

two; // const lvalue

three(); // modifiable rvalue

four(); // const rvalue

Type& binds to modifiable lvalues (and can be used to observe and mutate them). It can't bind to const lvalues, as that would violate const correctness. It can't bind to modifiable rvalues, as that would be extremely dangerous. Accidentally modifying temporaries, only to have the temporaries evaporate along with your modifications, would lead to subtle and obnoxious bugs, so C++ rightly prohibits this. (I should mention that VC has an evil extension that allows this, but if you compile with /W4 , it warns when the evil extension is activated. Usually.) And it can't bind to const rvalues, as that would be doubly bad. (Careful readers should note that I'm not talking about template argument deduction here.)

const Type& binds to everything: modifiable lvalues, const lvalues, modifiable rvalues, and const rvalues (and can be used to observe them).

A reference is a name, so a reference bound to an rvalue is itself an lvalue (yes, L). (As only a const reference can be bound to an rvalue, it will be a const lvalue.) This is confusing, and will be an extremely big deal later, so I'll explain further. Given the function void observe(const string& str) , inside observe()'s implementation, str is a const lvalue, and its address can be taken and used before observe() returns. This is true even though observe() can be called with rvalues, such as three() or four() above. observe("purr") can also be called, which constructs a temporary string and binds str to that temporary. The return values of three() and four() don't have names, so they're rvalues, but within observe(), str is a name, so it's an lvalue. As I said above, "lvalueness versus rvalueness is a property of expressions, not of objects". Of course, because str can be bound to a temporary which will evaporate, its address shouldn't be stored anywhere where it could be used after observe() returns.

Have you ever bound an rvalue to a const reference and then taken its address? Yes, you have! This is what happens when you write a copy assignment operator, Foo& operator=(const Foo& other) , with a self-assignment check, if (this != &other) { copy stuff; } return *this; , and you copy assign from a temporary, like Foo make_foo(); Foo f; f = make_foo(); .

At this point, you might ask, "So what's the difference between modifiable rvalues and const rvalues? I can't bind Type& to modifiable rvalues, and I can't assign things (etc.) to modifiable rvalues, so can I really modify them?" This is a very good question! In C++98/03, the answer is that there's a slight difference: non-const member functions can be called on modifiable rvalues. C++ doesn't want you to accidentally modify temporaries, but directly calling a non-const member function on a modifiable rvalue is explicit, so it's allowed. In C++0x, the answer changes dramatically, making move semantics possible.

Congratulations! Now you have what I call "lvalue/rvalue vision", the ability to look at an expression and determine whether it's an lvalue or an rvalue. Combined with your "const vision", you can precisely reason that given void mutate(string& ref) and the definitions above, mutate(one) is valid, while mutate(two), mutate(three()), mutate(four()), and mutate("purr") are invalid, and all of observe(one), observe(two), observe(three()), observe(four()), and observe("purr") are valid. If you're a C++98/03 programmer, you already knew which of these calls were valid and which were invalid; your "gut feeling", if not your compiler, would have told you that mutate(three()) was bogus. Your new lvalue/rvalue vision tells you precisely why (three() is an rvalue, and modifiable references can't be bound to rvalues). Is that useful? To language lawyers, yes, but not really to normal programmers. After all, you've gotten this far without knowing all of this stuff about lvalues and rvalues. But here's the catch: compared to C++98/03, C++0x has vastly more powerful lvalue/rvalue vision (in particular, the ability to look at an expression, determine whether it's a modifiable/const lvalue/rvalue, and do something about it). In order to use C++0x effectively, you need lvalue/rvalue vision too. And now you have it, so we can proceed!