Rust Ownership - rFronteddu/general_wiki GitHub Wiki
Ownership at Run-time
- Rust allocates local variables in stack frames, which are allocated when a function is called and deallocated when the call ends.
- Local variables can hold either data (like numbers, booleans, tuples, etc.) or pointers.
- Pointers can be created either through boxes (pointers owning data on the heap) or references (non-owning pointers).
- Slices are a special kind of reference that refer to a contiguous sequence of data in memory.
Ownership at Compile-time *Rust tracks R (read), W (write), and O (own) permissions on each variable. Rust requires that a variable has appropriate permissions to perform a given operation.
- A variable’s permissions can be changed if it is moved or borrowed. A move of a variable with a non-copyable type (like Box or String) requires the RO permissions, and the move eliminates all permissions on the variable. That rule prevents the use of moved variables
- Borrowing a variable (creating a reference to it) temporarily removes some of the variable’s permissions. An immutable borrow creates an immutable reference, and also disables the borrowed data from being mutated or moved.
- mutating an immutable reference is not ok
- mutating the immutably borrowed data is not ok
- moving data out of the reference is not ok
- A mutable borrow creates a mutable reference, which disables the borrowed data from being read, written, or moved.
- accessing the mutably borrowed data is not ok
- Rust’s permissions are designed to prevent undefined behavior. For example, one kind of undefined behavior is a use-after-free where freed memory is read or written. Immutable borrows remove the W permission to avoid use-after-free
- Dereferences of references to non-copyable data do not have the O permission to avoid double-frees
Rust’s goal is to compile programs into efficient binaries that require as few runtime checks as possible. Therefore Rust does not check at runtime whether a variable is defined before being used. Instead, Rust checks at compile-time. A foundational goal of Rust is to ensure that your programs never have undefined behavior. A secondary goal of Rust is to prevent undefined behavior at compile-time instead of run-time. Rust provides a particular way to think about memory. Ownership is a discipline for safely using memory within that way of thinking.
- Variables live in frames. A frame is a mapping from variables to values within a single scope, such as a function.
- Frames are organized into a stack of currently-called-functions.
- After a function returns, Rust deallocates the function’s frame. (Deallocation is also called freeing or dropping, and we use those terms interchangeably.) This sequence of frames is called a stack because the most recent frame added is always the next frame freed.
To transfer access to data without copying it, Rust uses pointers.
- A pointer is a value that describes a location in memory. The value that a pointer points-to is called its pointee. One common way to make a pointer is to allocate memory in the heap. The heap is a separate region of memory where data can live indefinitely. Heap data is not tied to a specific stack frame. Rust provides a construct called Box for putting data on the heap.
Stack frames are automatically managed by Rust. When a function is called, Rust allocates a stack frame for the called function. When the call ends, Rust deallocates the stack frame.
Heap data is allocated when calling Box::new(..). Rust does not allow programs to manually deallocate memory. Instead, Rust automatically frees a box’s heap memory.
When a is bound to Box::new([0; 1_000_000]), we say that a owns the box. The statement let b = a moves ownership of the box from a to b. Given these concepts, Rust’s policy for freeing boxes is more accurately described as:
- Box deallocation principle: If a variable owns a box, when Rust deallocates the variable’s frame, then Rust deallocates the box’s heap memory.
Boxes are used by Rust data structures. If you move a variable, Rust will stop you from using that variable later. More generally, the compiler will enforce this principle:
- Moved heap data principle: if a variable x moves ownership of heap data to another variable y, then x cannot be used after the move. One way to avoid moving data is to clone it using the .clone() method.
Ownership is primarily a discipline of heap management:
- All heap data must be owned by exactly one variable.
- Rust deallocates heap data once its owner goes out of scope.
- Ownership can be transferred by moves, which happen on assignments and function calls.
- Heap data can only be accessed through its current owner, not a previous owner.
Ownership, boxes, and moves provide a foundation for safely programming with the heap. However, move-only APIs can be inconvenient to use. The expression &m1 uses the ampersand operator to create a reference to (or “borrow”) m1. The type of the greet parameter g1 is changed to &String, meaning “a reference to a String”.
References are non-owning pointers*, because they do not own the data they point to. The underlying mechanism is the dereference operator, written with an asterisk (*). You probably won’t see the dereference operator very often when you read Rust code. Rust implicitly inserts dereferences and references in certain cases, such as calling a method with the dot operator.
For example, this program shows two equivalent ways of calling the i32::abs and str::len (string length) functions:
let x: Box<i32> = Box::new(-1);
let x_abs1 = i32::abs(*x); // explicit dereference
let x_abs2 = x.abs(); // implicit dereference
assert_eq!(x_abs1, x_abs2);
let r: &Box<i32> = &x;
let r_abs1 = i32::abs(**r); // explicit dereference (twice)
let r_abs2 = r.abs(); // implicit dereference (twice)
assert_eq!(r_abs1, r_abs2);
let s = String::from("Hello");
let s_len1 = str::len(&s); // explicit reference
let s_len2 = s.len(); // implicit reference
assert_eq!(s_len1, s_len2);
Pointers are a powerful and dangerous feature because they enable aliasing. Aliasing is accessing the same data through different variables. But combined with mutation, we have a recipe for disaster. One variable can “pull the rug out” from another variable in many ways, for example:
- By deallocating the aliased data, leaving the other variable to point to deallocated memory.
- By mutating the aliased data, invalidating runtime properties expected by the other variable.
- By concurrently mutating the aliased data, causing a data race with nondeterministic behavior for the other variable.
Rust follows a basic principle:
- Pointer Safety Principle: data should never be aliased and mutated at the same time. Data can be aliased. Data can be mutated. But data cannot be both aliased and mutated.
For example, Rust enforces this principle for boxes (owned pointers) by disallowing aliasing. Assigning a box from one variable to another will move ownership, invalidating the previous variable. Owned data can only be accessed through the owner — no aliases.
However, because references are non-owning pointers, they need different rules than boxes to ensure the Pointer Safety Principle. By design, references are meant to temporarily create aliases. Rust ensures the safety of references through the borrow checker.
The core idea behind the borrow checker is that variables have three kinds of permissions on their data:
- Read (R): data can be copied to another location.
- Write (W): data can be mutated.
- Own (O): data can be moved or dropped.
These permissions don’t exist at runtime, only within the compiler. They describe how the compiler “thinks” about your program before the program is executed.
- By default, a variable has read/own permissions (RO) on its data. If a variable is annotated with let mut, then it also has the write permission (W). The key idea is that references can temporarily remove these permissions.
- The goal of these permissions is to ensure that data cannot be mutated if it is aliased. Creating a reference to data (“borrowing” it) causes that data to be temporarily read-only until the reference is no longer in use.
- Rust uses these permissions in its borrow checker. The borrow checker looks for potentially unsafe operations involving references.
- The references we have seen so far are read-only immutable references (also called shared references). Immutable references permit aliasing but disallow mutation. However, it is also useful to temporarily provide mutable access to data without moving it.
let mut v: Vec<i32> = vec![1, 2, 3]; // v +R +W +O
let num: &i32 = &v[2]; // v R -W -O
// num +R - +O
// *num +R - -
println!("Third element is {}", *num); // v R +W +O
// num -R - -O
// *num -R - -
v.push(4); // v -R -W -O
First, why do you see both num and *num? Because accessing data through a reference is not the same as manipulating the reference itself. For example, note that we can assign a different reference to x_ref but we cannot mutate the data it points.
let x = 0; // x +R - +O
let mut x_ref = &x; // x R - -O
// x_ref +R +W +O
// *x_ref +R - -
More generally, permissions are defined on places and not just variables. A place is anything you can put on the left-hand side of an assignment.
Places include:
- Variables, like a.
- Dereferences of places, like *a.
- Array accesses of places, like a[0].
- Fields of places, like a.0 for tuples or a.field for structs (discussed next chapter).
- Any combination of the above, like *((*a)[0].1).
Second, why do places lose permissions when they become unused? Because some permissions are mutually exclusive. If you write num = &v[2], then v cannot be mutated or dropped while num is in use. But that doesn’t mean it’s invalid to use num again.
Recall the Pointer Safety Principle: data should not be aliased and mutated. The goal of these permissions is to ensure that data cannot be mutated if it is aliased. Creating a reference to data (“borrowing” it) causes that data to be temporarily read-only until the reference is no longer in use.
Rust uses these permissions in its borrow checker. The borrow checker looks for potentially unsafe operations involving references.
let mut v: Vec<i32> = vec![1, 2, 3]; // v +R +W +O
let num: &i32 = &v[2]; // v R -W -O
// num +R - +O
// *num +R - -
v.push(4); // the mutating operation v.push(4) requires that v is readable and writable. However v does not have
// write permission (it is borrowed by num). Trying to compile would result in cannot borrow v as mutable
// because it is also borrowed as immutable. The underlying issue is that num could be invalidated by push.
println!("Third element is {}", *num);
Mutable References Provide Unique and Non-Owning Access to Data. The references we have seen so far are read-only immutable references (also called shared references). Immutable references permit aliasing but disallow mutation. The mechanism for temporarily provide mutable access to data without moving is called mutable references (also called unique references).
let mut v: Vec<i32> = vec![1, 2, 3]; // v +R +W +O
let num: &mut i32 = &mut v[2]; // v -R -W -O
// num +R - +O
// *num +R +W -
*num += 1; // requires R and W
println!("Third element is {}", *num); // v +R +W +O
// num -R - -O
// *num -R -W -
// after the previous statement only v permissions are still required.
println!("Vector is now {:?}", v); // v -R -W -O
// after the previous statement v permissions are no longer required.
A mutable reference is created with the &mut operator. The type of num is written as &mut i32. Compared to immutable references, you can see two important differences in the permissions:
- When num was an immutable reference, v still had the R permission. Now that num is a mutable reference, v has lost all permissions while num is in use.
- When num was an immutable reference, the place *num only had the R permission. Now that num is a mutable reference, *num has also gained the W permission.
The first observation is what makes mutable references safe. Mutable references allow mutation but prevent aliasing. The borrowed place v becomes temporarily unusable, so effectively not an alias.
The second observation is what makes mutable references useful. v[2] can be mutated through *num. For example, *num += 1 mutates v[2]. Note that *num has the W permission, but num does not. num refers to the mutable reference itself, e.g. num cannot be reassigned to a different mutable reference.
Mutable references can also be temporarily “downgraded” to read-only references.
let mut v: Vec<i32> = vec![1, 2, 3]; // v +R +W +O
let num: &mut i32 = &mut v[2]; // v -R -W -O
// *num +R +W -
// num +R - +O
let num2: &i32 = &*num; // *num R -W -
// *num2 +R - -
// num R - -O
// num2 +R - +O
println!("{} {}", *num, *num2);
// after the previous statement no permission is necessary
In this program, the borrow &*num removes the W permission from *num but not the R permission, so println!(..) can read both *num and *num2.
Permission are returned at the end of a reference's lifetime. We said above that a reference changes permissions while it is “in use”. The phrase “in use” is describing a reference’s lifetime, or the range of code spanning from its birth (where the reference is created) to its death (the last time(s) the reference is used). In the previous examples, a lifetime has been a contiguous region of code. However, once we introduce control flow, this is not necessarily the case.
As a part of the Pointer Safety Principle, the borrow checker enforces that data must outlive any references to it. Rust enforces this property in two ways. The first way deals with references that are created and dropped within the scope of a single function. Since Rust knows how long references live, it can check from mismatches in permissions. When references are either input to a function, or output from a function Rust needs a different enforcement mechanism, the flow permission F. The F permission is expected whenever an expression uses an input reference or returns an output reference. Unlike the RWO permissions, F does not change throughout the body of a function. A reference has the F permission if it’s allowed to be used (that is, to flow) in a particular expression. For example, let’s say we change first to a new function first_or that includes a default parameter:
fn first_or(strings: &Vec<String>, default: &String) -> &String {
if strings.len() > 0 { // .len requires R F
&strings[0] // requires R F
} else {
default // requires R F
}
}
This function no longer compiles (expected named lifetime parameter error), because the expressions &strings[0] and default lack the necessary F permission to be returned. The message “missing lifetime specifier” is a bit mysterious, but the help message provides some useful context. If Rust just looks at the function signature, it doesn’t know whether the output &String is a reference to either strings or default.
If Rust didn't check this the following could cause undefined behavior:
fn main() {
let strings = vec![];
let default = String::from("default");
let s = first_or(&strings, &default);
drop(default);
println!("{}", s); // unsafe, first_or allows default to flow into s but default was dropped. Rust would only
// allow this to compile if it was sure default could not flow into the return value.
}
To specify whether default can be returned, Rust provides a mechanism called lifetime parameters. For now, it’s enough to know that: (1) input/output references are treated differently than references within a function body, and (2) Rust uses a different mechanism, the F permission, to check the safety of those references. The flow permission also comes into place if you try to borrow a variable from the stack at the end of a function, that reference would be invalidated at the end of the function so Rust rejects that code with missing lifetime specifier error.
Summarizing:
- References provide the ability to read and write data without consuming ownership of it. References are created with borrows (& and &mut) and used with dereferences (*), often implicitly.
However, references can be easily misused. Rust’s borrow checker enforces a system of permissions that ensures references are used safely:
- All variables can read, own, and (optionally) write their data.
- Creating a reference will transfer permissions from the borrowed place to the reference.
- Permissions are returned once the reference’s lifetime has ended.
- Data must outlive all references that point to it.
fn return_a_string() -> &String {
let s = String::from("Hello world");
&s
}
Why is this program unsafe? If you want to pass around a reference to a string, you have to make sure that the underlying string lives long enough.
Possible solutions:
- Move ownership of the string out of the function changing &String to String.
- Return a string literal which lives forever (indicated by 'static), this applies if we never intend to change the string, and then a heap allocation is unnecessary (this requires changing signature from -> String to -> &'static str.
- Defer borrow-checking to runtime by using garbage collection, for example using a reference-counted pointer (Rc:: clone only clones a pointer to s and not the data itself. At runtime, Rc checks when the last Rc pointing to data has been dropped, and then deallocates the data).
use std::rc:Rc;
fn return_a_string() -> Rc<String> {
let s = Rc::new(String::from("Hello"));
Rc:clone(&s)
}
- We can also have the caller providing a "slot" to put the string using a mutable reference:
fn return_a_string(output: &mut String) {
output.replace_range(.., "Hello");
}
Another common issue is trying to mutate read-only data, or trying to drop data behind a reference. For example, let’s say we tried to write a function stringify_name_with_title. This function is supposed to create a person’s full name from a vector of name parts, including an extra title.
fn stringify_name_with_title(name: &Vec<String>) -> String {
name.push(String::from("Esq."));
let full = name.join(" ");
full
}
// ideally: ["Ferris", "Jr."] => "Ferris Jr. Esq."
This program is rejected since name.push requires W but name is immutable. This code is unsafe because push could invalidate other references to name outside of the function (remember push can cause reallocation).
Possible solutions:
- Change parameter type from &Vec to &mut Vec. But this is not a good solution! Functions should not mutate their inputs if the caller would not expect it.
- Take ownership of the name, by changing &Vec to Vec. But this is not a good solution! Rust functions should avoid taking ownership of heap-owning data structures like Vec and String. This change would make input name unusable.
- We use &Vec but change the body for example by cloning the input name so that we can mutate the local copy.
fn stringify_name_with_title(name: &Vec<String>) -> String {
let mut name_clone = name.clone();
name_clone.push(String::from("Esq."));
let full = name_clone.join(" ");
full
}
A better solution would use Join to avoid copying every string in the input:
fn stringify_name_with_title(name: &Vec<String>) -> String {
let mut full = name.join(" "); // copies the data in name intot he string full
full.push_str(" Esq.");
full
}
Another unsafe operation is using a reference to heap data that gets deallocated by another alias. For example, here’s a function that gets a reference to the largest string in a vector, and then uses it while mutating the vector:
fn add_big_strings(dst: &mut Vec<String>, src: &[String]) {
let largest: &String =
dst.iter().max_by_key(|s| s.len()).unwrap();
for s in src {
if s.len() > largest.len() {
dst.push(s.clone());
}
}
}
This program is rejected by the borrow checker because let largest = .. removes the W permissions on dst. However, dst.push(..) requires the W permission. dst.push(..) could deallocate the contents of dst, invalidating the reference largest.
To fix the program, the key insight is that we need to shorten the lifetime of largest to not overlap with dst.push(..). One possibility is to clone largest:
fn add_big_strings(dst: &mut Vec<String>, src: &[String]) {
let largest: String = dst.iter().max_by_key(|s| s.len()).unwrap().clone();
for s in src {
if s.len() > largest.len() {
dst.push(s.clone()); //
}
}
}
However, this may cause a performance hit for allocating and copying the string data.
Another possibility is to perform all the length comparisons first, and then mutate dst afterwards:
fn add_big_strings(dst: &mut Vec<String>, src: &[String]) {
let largest: &String = dst.iter().max_by_key(|s| s.len()).unwrap();
let to_add: Vec<String> =
src.iter().filter(|s| s.len() > largest.len()).cloned().collect();
dst.extend(to_add);
}
However, this also causes a performance hit for allocating the vector to_add.
A final possibility is to copy out the length of largest, since we don’t actually need the contents of largest, just its length. This solution is arguably the most idiomatic and the most performant:
fn add_big_strings(dst: &mut Vec<String>, src: &[String]) {
let largest_len: usize = dst.iter().max_by_key(|s| s.len()).unwrap().len();
for s in src {
if s.len() > largest_len {
dst.push(s.clone());
}
}
}
A common confusion for Rust learners happens when copying data out of a collection, like a vector. For example, here’s a safe program that copies a number out of a vector:
let v: Vec<i32> = vec![0, 1, 2];
let n_ref: &i32 = &v[0]; // requires read
let n: i32 = *n_ref; // requires read
But what happens if we change the type of elements in the vector from i32 to String? Then it turns out we no longer have the necessary permissions:
let v: Vec<String> =
vec![String::from("Hello world")];
let s_ref: &String = &v[0];
let s: String = *s_ref; // no O permission
The issue is that the vector v owns the string “Hello world”. When we dereference s_ref, that tries to take ownership of the string from the vector. But references are non-owning pointers — we can’t take ownership through a reference. Therefore Rust complains that we “cannot move out of […] a shared reference”.
If this was allowed a double free could happen. After executing let s = *s_ref, both v and s think they own “Hello world”. After s is dropped, “Hello world” is deallocated. Then v is dropped, and undefined behavior happens when the string is freed a second time.
However, this undefined behavior does not happen when the vector contains i32 elements. The difference is that copying a String copies a pointer to heap data. Copying an i32 does not. In technical terms, Rust says that the type i32 implements the Copy trait, while String does not implement Copy.
In sum, if a value does not own heap data, then it can be copied without a move. For example:
- An i32 does not own heap data, so it can be copied without a move.
- A String does own heap data, so it can not be copied without a move.
- An &String does not own heap data, so it can be copied without a move.
How do we safely get access to an element of the vector? Here’s a few different ways to safely do so. First, you can avoid taking ownership of the string and just use an immutable reference:
let v: Vec<String> = vec![String::from("Hello world")];
let s_ref: &String = &v[0];
println!("{s_ref}!");
Second, you can clone the data if you want to get ownership of the string while leaving the vector alone:
let v: Vec<String> = vec![String::from("Hello world")];
let mut s: String = v[0].clone();
s.push('!');
println!("{s}");
Finally, you can use a method like Vec::remove to move the string out of the vector:
let mut v: Vec<String> = vec![String::from("Hello world")];
let mut s: String = v.remove(0);
s.push('!');
println!("{s}");
assert!(v.len() == 0);
Rust may also reject safe programs. One common issue is that Rust tries to track permissions at a fine-grained level. However, Rust may conflate two different places as the same place. Rust only looks at the type signature, which just says “some String in the input gets borrowed”. To solve these types of issues we may need to inline or use cells.
A similar kind of problem arises when we borrow elements of an array. For example, observe what places are borrowed when we take a mutable reference to an array:
let mut a = [0, 1, 2, 3];
let x = &mut a[1];
*x += 1;
println!("{a:?}");
Rust’s borrow checker does not contain different places for a[0], a[1], and so on. It uses a single place a[_] that represents all indexes of a. Rust does this because it cannot always determine the value of an index.
For cases like these, Rust often provides a function in the standard library that can work around the borrow checker. These function are often implemented with unsafe blocks which allow the use of raw pointers which are not checked by the borrow checker.