Rust 学习之路

最新推荐文章于 2024-02-29 14:49:07 发布

weixin_34343000

最新推荐文章于 2024-02-29 14:49:07 发布

阅读量254

点赞数

文章标签： rust runtime c/c++

原文链接：https://my.oschina.net/greister/blog/405246

版权

2019独角兽企业重金招聘Python工程师标准>>>

Rust 的学习

Rust’s Guiding Principles

Manual memory management: There must be some way for the programmer to control when an object on the heap will be deleted.
Memory safety: Pointers must never point to areas of memory that have been changed or deleted.
Safe concurrency : There should be no dataraces between threads. Multiple threads must not read and modify the same part of memory at the same time.
Compile time checks: Ensure correctness at compile time instead of runtime whenever possible.

This, in conjunction with the features that Rust provides, will give us a good idea why certain things must be the way they are in Rust.

##Variable Bindings

在函数语音中，没有赋值概念。是绑定。

Rust is a statically typed language, which means that we specify our types up front, and they’re checked at compile time. ‘type inference’
Variable Bindings
Rust is a statically typed language, which means that we specify our types up front, and they’re checked at compile time. ‘type inference’ bindings are required to be initialized with a value before you're allowed to use them.

-Array :Rust has list types to represent a sequence of things. The most basic is the array, a fixed-size list of elements of the same type. By default, arrays are immutable.

Pointers Ownership, and Lifetimes

Understanding Pointers, Ownership, and Lifetimes in Rust http://paulkoerbitz.de/posts/Understanding-Pointers-Ownership-and-Lifetimes-in-Rust.html
Rust Borrow and Lifetimes http://arthurtw.github.io/2014/11/30/rust-borrow-lifetimes.html

Safe Manual Memory Management by Enforcing Ownership

The way Rust achieves safe manual memory management is by enforcing sane ownership semantics through a number of different pointers. There are several types of pointers in Rust: the most important are boxes or owned pointers and references or borrowed pointers. There are also different kinds of reference counted pointers, but these are for more complicated situations which I won’t get into in this post. Therefore, this post will focus on owned and borrowed pointers.

Boxes / Owned Pointers

A box or owned pointer in Rust has ownership over a certain part of the heap. When it goes out of scope it deletes that part of the heap. This achieves manual memory management: the programmer has control over when memory is released by controlling when an owned pointer goes out of scope. A box is a datatype, parameterized by the type that it boxes, so Box<i32> is the type of an owned pointer to an i32 and Box::new(3) is the literal notation for allocating space on the heap for an i32, putting 3 into it and handling back an owned pointer. Like all pointers, boxes are derferenced by prefixing them with *. Here is a bit of a contrived example:

// The type annotations in the let statements in this example
// (e.g. ': Box<int>') are not necessary and only for clarity

fn owned_seven() -> Box<i32> {
    // Allocate an i32 with value '3' on the heap, 'three' points to it
    let three : Box<i32> = Box::new(3);
    // The same for four
    let four : Box<i32> = Box::new(4);
    // Dereference both 'three' and 'four', add them, store the result
    // in a newly allocated variable on the heap
    Box::new(*three + *four)
}   // <-- 'three' and 'four' go out of scope, so the memory they own
    //     is released. The memory of the return value is owned by the
    //     return value so it survives the function call.
    // Note: returning a pointer from a function is considered an anti-
    // pattern in rust. It is prefered to return a value so the caller
    // can decide what he wants to do with it. This is done for illustration
    // purposes here.


fn main() {
    let seven : Box<i32> = owned_seven();
    println!("3 + 4 = {}", *seven);
}   // <-- seven goes out of scope and the memory it points to is
    //     deallocated here

Referneces / Borrowed Pointers

Rust语言中对内存块的引用类型叫做box。最新版本的Rust在语言层面只保留了一种owned box，它在使用时具有一种所有权（Ownership）的概念，只有具有所有权的变量才可以访问这段内存。owned box在同一时刻只允许一个变量作为所有者，它的变量赋值称为move。一旦owned pointer被赋值，用户就无法通过原先的引用访问这块数据，这种错误会在编译时检查。下面是一个简单的例子。

Having only owned pointers would make writing many programs difficult: there could only ever be one reference to every thing. Fortunately, Rust offers another type of pointer called a reference or borrowed pointer. References do not imply ownership and they can point to objects both on the heap and the stack, so they are quite flexible. We can create a reference by taking the address of something with the address-of operator &. In a slight abuse of notation, the types of references are also denoted by prefixing the type of the variable it points to by &, so &i32 is a borrowed pointer to an i32.

fn main() {
    let three : &i32 = &3;
    let four : &i32 = &4;
    println!("3 + 4 = {}", *three + *four);
}

References in Rust are a lot like references and pass-by-reference bound variables in C and C++, but note that unlike C/C++-references borrowed pointers must be dereferenced to get to their values. I think this is really more consistent, because references really hold the address to a memory location, just like other pointes. So it makes sense to treat them similarly in terms of syntax. References in Rust also have a number of safety mechanisms that C/C++ references lack, but more on that later.

Move Semantics

Memory safety implies that owned pointers cannot be copied or cloned. Otherwise, two such pointers could point to the same block of memory and that memory would be deleted twice. Therefore, owned pointers have move semantics:1 when owned pointer o2 is initialized from owned pointer o1, o1 is no longer valid. By guiding principle number four, we would perfer to ensure this at compile time, and Rust indeed does this.2

The other alternative would be that owned pointers can never be reassigned, they would be non-copiable and non-moveable. This seems pretty cumbersome, fortunately Rust’s owned pointers have move semantics.↩

Ensuring the validity of owned pointers at compile time is much better than the alternatives: If it was assured at runtime, there would be fewer correctness guarantees about the program and the check would have to be performed every time a pointer is dereferenced. Checking the validity of pointers at compile time is a major achievement of the Rust language: tracking such moves at compile time requires an advanced type-system feature called affine types. As far as I know Rust is the only mainstreamy language which has such a feature.↩

fn main() {
   let o1 = Box::new("world");
   let o2 = o1;                // <-- o1 is 'moved' into o2 and now invalid
   println!("Hello, {}!", o1); // <-- this is a compile time error
}

Structs and Enums

In general Rust has move semantics. When an object is initialized via assignment its memory is moved to the newly assigned variable. However, structs can implement the Copy trait, which means they will have copy semantics instead: When assigned the new object gets a bitwise copy of the object used to assign it.

The Copy trait cannot be implemented when an object contains a box: the box does not implement the copy trait, so we can’t copy it when copying the object containing it. This makes sense because the box has move semantics, the object containing it must also have move semantics, otherwise we would again incur two independent owning copies.

// Derive the Copy trait so objects of this type have copy semantics
#[derive(Show,Copy)]
struct Pod {x: i32, y: u32}

// Can't derive the Copy trait because Box<T> does not have the Copy trait
#[derive(Show)]
struct WithBox {x: i32, p: Box<i32>}

fn main() {
   let a1 = Pod {x: 3, y: 4};
   let a2 = a1;
   println!("{:?}", a1);                   // <-- OK, a1 has been copied
   let b1 = WithBox {x: 3, p: Box::new(4)};
   let b2 = b1;
   println!("{:?}", b1);                   // <-- Compile time error, b1 has been moved
}

The ref Keyword

####Lifetimes

The difficulty with borrowed pointers is that they themselves cannot ensure that they point to valid memory. What if the thing that owns the memory they point to goes out of scope or is reassigned? Since the borrowed pointer has no ownership that memory would be deleted and possibly reassigned. The borrowed pointer would become a dangling reference, which is precisely what we wanted to avoid per guiding principle number 2: memory safety.

Therefore Rust must take a number of precautions to ensure these scenarios do not happen. First, the memory that a borrowed pointer points to must not be freed during that borrowed pointers lifetime. Second, this memory must not change while it is borrowed.

fn lifetimes1() {
    let name = Box::new("world");      //                 <--+
    if 3 < 5 {                         //                    |
        let bname = &name;             // <--+               | name's
        println!("Hello, {}!", name);  //    | bname's       | lifetime
        println!("Hello, {}!", bname); //    | lifetime      |
    }                                  // <--+               |
}

In this example, it is quite clear that the lifetime of bname will be shorter than that of name and thus the compiler needs no help in figuring this out. However, things need not always be this simple, consider the following example:

fn lifetimes2() {
    let mut x_ref = &3;       //                 <--+
    if true {                 //                    |
        let mut y_ref = &4;   // <--+ y_ref's       | x_ref's
        x_ref = y_ref;        //    | lifetime      | lifetime
    }                         // <--+               |
}

Here we have a problem: x_ref is reassigned to point to the same memory location as y_ref, but y_ref’s lifetime is shorter than x_ref’s. To ensure memory safety, the compiler must rejetct this program, which it does:

lifetimes.rs:21:24: 21:26 error: borrowed value does not live long enough
lifetimes.rs:18:16: 24:1 note: borrowed pointer must be valid for the block at 18:16...
lifetimes.rs:20:12: 23:5 note: ...but borrowed value is only valid for the block at 20:12

Things become even more interesting when we work with borrowed pointers inside of a function:

fn min_life(x: &i32, y: &i32) -> &i32 {
    if *x < *y {
        x
    } else {
        y
    }
}

Here the lifetime of the result depends on the condition evaluated in the if statement: depending on it the lifetime will either be that of x or that of y. Clearly, the compiler can’t resolve this automatically, it would need to know the values to which x and y point, which may only be known at runtime:

lifetimes.rs:1:33: 1:37 error: missing lifetime specifier [E0106]
lifetimes.rs:1 fn minLife(x: &i32, y: &i32) -> &i32 {
                                               ^~~~
lifetimes.rs:1:33: 1:37 help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `x` or `y`
lifetimes.rs:1 fn minLife(x: &i32, y: &i32) -> &i32 {
                                               ^~~~

Since the compiler can’t infer the lifetimes we must annotate them. Alas, we too would be hard pressed to give the exact lifetime in this example. However, there is a trick by which we can manage this

fn min_life<'a>(x: &'a i32, y: &'a i32) -> &'a i32 {
    if *x < *y {
        x
    } else {
        y
    }
}

Here we explictly annotate the lifetime of the parameters and the return value. Lifetime parameters are introduced by a single tick ' followed by an identifier. In functions these must be the first template parameters. As you can see we use the same parameter for the lifetime everywhere. If the compiler would take this information too literally, then this function whould be less flexible than we might wish: In this case we could only use it on borrowed pointers which have the exact same lifetime. Fortunately, the compiler interprets the provided lifetimes as a lower bound. Thus 'a is the minimum of the lifetimes of x and y. There is one special lifetime, which is called 'static and is for objects which are allocated for the entire life of the program.

####Freezing

##Functions This reveals two interesting things about Rust: it is an expression-based language, and semicolons are different from semicolons in other ‘curly brace and semicolon’-based languages. These two things are related.

##Expressions vs. Statements Rust is primarily an expression-based language. There are only two kinds of statements, and everything else is an expression. Here are two kinds of statements in Rust: ‘declaration statements’ and ‘expression statements’. Everything else is an expression.

Expressions return a value, and statements do not. expression statement. Its purpose is to turn any expression into a statement. In practical terms, Rust's grammar expects statements to follow other statements. This means that you use semicolons to separate expressions from each other. This means that Rust looks a lot like most other languages that require you to use semicolons at the end of every line, and you will see semicolons at the end of almost every line of Rust code you see.

###Diverging functions Why macros vs. functions?http://users.rust-lang.org/t/newbie-why-macros-vs-functions/1012 ###Primitive Types Unlike some other languages, this means that Rust’s char is not a single byte, but four.

###Method Syntax

Chaining method calls
Static methods
Builder Pattern

###Vector Vectors always allocate their data on the heap. Vectors are to slices what String is to &str.

###Traits traits that prevent this from getting out of hand. First, traits must be used in any scope where you wish to use the trait's method.

There's one more restriction on implementing traits. Either the trait or the type you're writing the impl for must be inside your crate.

One last thing about traits: generic functions with a trait bound use monomorphization (mono: one, morph: form), so they are statically dispatched.

Where clause The name of the function is on the far left, and the parameter list is on the far right. The bounds are getting in the way.

they allow bounds where the left-hand side is an arbitrary type (i32 in this case), not just a plain type parameter (like T).
Default methods

###Strings A string is a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. All strings are guaranteed to be validly encoded UTF-8 sequences. Additionally, strings are not null-terminated and can contain null bytes.

Rust has two main types of strings: &str and String. The first kind is a &str. These are called string slices. String literals are of the type &str A String, on the other hand, is a heap-allocated string. This string is growable, and is also guaranteed to be UTF-8. Strings are commonly created by converting from a string slice using the to_string method. ust remember that Strings allocate memory and control their data, while &strs are a reference to another string, and you'll be all set.

###Generics泛型 Generics are called parametric polymorphism in type theory, which means that they are types or functions that have multiple forms (poly is multiple, morph is form) over a given parameter (parametric).

###Trait Objects Dynamic dispatch

here are two major forms of dispatch: static dispatch and dynamic dispatch. While Rust favors static dispatch, it also supports dynamic dispatch through a mechanism called 'trait objects.'

static dispatch : This means that Rust will create a special version of do_something() for both u8 and String, and then replace the call sites with calls to these specialized functions. tatic dispatch allows function calls to be inlined because the callee is known at compile time, and inlining is the key to good optimization. Static dispatch is fast, but it comes at a tradeoff: 'code bloat', due to many copies of the same function existing in the binary, one for each type.
Dynamic dispatch :

###Copy VS Move

Two different "copies": - a byte copy, which is just shallowly copying an object byte-by-byte, not following pointers, e.g. if you have (&uint, u64), it is 16 bytes on a 64-bit computer, and a shallow copy would be taking those 16 bytes and replicating their value in some other 16-byte chunk of memory, without touching the uint at the other end of the &. That is, it's equivalent to calling memcpy.

a semantic copy, duplicating a value to create a new (somewhat) independent instance that can be safely used separately to the old one. E.g. a semantic copy of an Rc<T> involves just increasing the reference count, and a semantic copy of a Vec<T> involves creating a new allocation, and then semantically copying each stored element from the old to the new. These can be deep copies (e.g. Vec<T>) or shallow (e.g. Rc<T> doesn't touch the stored T), Clone is loosely defined as the smallest amount of work required to semantically copy a value of type T from inside a &T to T.

Rust is like C, every by-value use of a value is a byte copy:

let x: T = ...;
let y: T = x; // byte copy

fn foo(z: T) -> T {
    return z // byte copy
}

foo(y) // byte copy

They are byte copies whether or not T moves or is "implicitly copyable". (To be clear, they are just semantically byte copies: the compiler is free to optimise them out if the semantics are preserved.)

However, there's a fundamental problem with byte copies: you end up with duplicated values in memory, which can be very bad if they have destructors, e.g.

{
    let v: Vec<u8> = vec![1, 2, 3];
    let w: Vec<u8> = v;
} // destructors run here

If w was just a plain byte copy of v then there would be two vectors pointing at the same allocation, both with destructors that free it... causing a double free, which is a problem. NB. This would be perfectly fine, if we did a semantic copy of v into w, since then w would be its own independent Vec<u8> and destructors wouldn't be trampling on each other.

There's a few possible fixes here:

- Let the programmer handle it, like C. (there's no destructors in C, so it's not as bad... you just get left with memory leaks instead. :P )

- Perform a semantic copy implicitly, so that w has its own allocation, like C++ with its copy constructors.
- Regard by-value uses as a transfer of ownership, so that v can no longer be used and doesn't have its destructor run.

The latter is what Rust does: a move is just a by-value use where the source is statically invalidated, so the compiler prevents further use of the now-invalid memory.

let v: Vec<u8> = vec![1, 2, 3];
let w: Vec<u8> = v;
println!("{}", v); // error: use of moved value

Types that have destructors must move when used by-value (aka when byte copied), since they have management/ownership of some resource (e.g. a memory allocation, or a file handle) and its very unlikely that a byte copy will correctly duplicate this ownership.

转载于:https://my.oschina.net/greister/blog/405246