Introduction
Hello World
Let's get started by comparing a C, C++ and Rust Hello, World!
program.
Hello World in C
Here's a hopefully familiar looking C version of "hello, world":
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}
Compilation
You can compile it with:
gcc -g -o hello hello_world.c
Or
clang -g -o hello hello_world.c
Or you can build a Makefile
:
CC = gcc
CFLAGS = -g
RM = rm -f
default: all
all: hello
hello: hello_world.c
$(CC) $(CFLAGS) -o hello hello_world.c
clean veryclean:
$(RM) hello
And type make
to execute it.
All of these produce a binary, hello
- which prints "Hello, World!".
Takeaways
- The source code is short and to the point.
- You include
stdio.h
to pull inprintf
. This is a "copy paste"---the contents ofstdio.h
are included directly in your compilation. - You either need to create a platform specific build script (specifying your compiler), or use a tool like
configure
(orCMake
---which we'll talk about in C++). - Compilation is really fast.
- It's not specified anywhere, but you are depending upon your platform's C standard library (
libc
,glibc
, etc.). Your program is dynamically linked with it. You need to havelibc
installed to run your program.
Hello World in C++
A "pure" C++ version (C++20; C++23 is adding std::print()
) looks like this:
#include <iostream>
int main() {
std::cout << "Hello, World!" << std::endl;
return 0;
}
It's worth noting that a lot of C++ projects use
printf
anyway.
Compilation
You can also build this from the command-line:
g++ -g -o hello hello.cpp
Or with Clang:
clang++ -g -o hello hello.cpp
With CMake
You can build a Makefile, but a lot of modern C++ projects now use CMake. Here's the CMakeLists.txt
file:
cmake_minimum_required(VERSION 3.5)
project(HelloWorld)
add_executable(hello hello.cpp)
Your build process then becomes:
mkdir build
cd build
cmake ..
make
As long as you have CMake
and a build system it recognizes (Make
, Ninja
, etc.) it will build your project.
Takeaways
All of these produce a binary, hello - which prints "Hello, World!".
- You are including
<iostream>
. No.h
required. It's still a copy/paste into your compiled source. (If you are lucky enough to have modules, that's not quite as true now). - You are using C++'s "streams"
cout
to stream data to the console. - You are dynamically linking to the C++ standard library, it just isn't stated.
CMake
has made it easier to determine what compilers and build systems to use---but it's still not automatic.- Compilation is still pretty fast.
Hello World in Rust
Here's "Hello World" in Rust:
fn main() { println!("Hello, World!"); }
You can actually run that from the Playground. To run it locally, you actually need a full Rust project. The easy way to make this project is to open a terminal and type:
cargo new hello_world
By default, this even auto-generates the body of the "hello, world" program!
Cargo
is Rust's swiss-army knife tool. It's a build system, a dependency manager, wraps a linter, can run unit tests and benchmarks --- it's also extendable. In this case, we're asking Cargo
to make us a new project named hello_world
.
Cargo creates the following structure:
hello_world/
hello_world/src/
hello_world/src/main.rs
hello_world/Cargo.toml
hello_world/.git
hello_world/.gitignore
If you don't want to create a Git project, add the flags
--vcs none
. If you are already inside a git project, it won't try and nest another one inside.
The Cargo.toml
file represents the project manifest---outlining project metadata. It looks like this:
[package]
name = "hello_world"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
- The
package
section represents meta-data about the "crate" itself (Rust packages things into crates, which are handled by Cargo).name
specifies the output name, and also the name by which the project is referenced if other projects link to it.version
is a semantic (major.minor.patch) version number.edition
specifies which editition of the Rust language the project uses. Every 3-4 years, a new edition is released. Previous editions continue to compile, and edition versions are the opportunity for the Rust Core teams to make changes to the base language. Things can/will be deprecated (and eventually removed), added, and syntax may change.
dependencies
can be ignored for now. If you depend on other packages, they are listed here. We'll talk about that later.
So how do we compile our program?
cd hello_world
cargo build
The executable is now in hello_world/target/debug/hello_world
(with a .exe
on the end if you are in Windows).
Takeaways
- You didn't include anything! Well you did, but by default parts of the Rust standard library are automatically included in the current scope.
- Your binary was statically linked. The entire Rust standard library is in there! (That's why it's 4.3 Mb in a debug build on Linux, as opposed to 17k for C++ and C. It's quite large - 2-3 Mb - if you statically link the C or C++, too).
- You didn't need any tools outside of the Rust ecosystem. Cargo took care of it all, and installing with
rustup
gave you the entire package. - Compilation was decently fast---you'd hope so for Hello World.
Let's talk a bit about syntax.
Rust Syntax
fn main() { println!("Hello, World!"); }
I often argue that "Hello, World" is the worst Rust program to start with. println!
is a macro, and doesn't look much like normal Rust code. (In fairness, std::cout << "Hello, World!" << std::endl;
isn't very normal, either).
println!
is a macro. Macros in Rust have a !
at the end because they might be surprising: they are extensions to Rust's normal syntax.
So bear that in mind as we move forward.
Touring the Rust Langauge
Primitive Types
C is notoriously a little vague about type names:
#include <stdio.h>
int main() {
char c = 'a';
unsigned char uc = 12;
int i = 123;
unsigned int ui = 123;
short s = 123;
unsigned short us = 123;
long l = 123;
unsigned long ul = 123;
printf("%d %ld\n", c, sizeof(c));
printf("%d %ld\n", uc, sizeof(uc));
printf("%d %ld\n", i, sizeof(i));
printf("%d %ld\n", ui, sizeof(ui));
printf("%d %ld\n", s, sizeof(s));
printf("%d %ld\n", us, sizeof(us));
printf("%ld %ld\n", l, sizeof(l));
printf("%ld %ld\n", ul, sizeof(ul));
return 0;
}
The output will actually vary by platform. On my 64-bit Linux system I get:
97 1
12 1
123 4
123 4
123 2
123 2
123 8
123 8
Many C programmers prefer to be a bit more specific and use specifically sized types instead:
#include <stdio.h>
#include <stdint.h>
int main() {
int8_t c = 'a';
uint8_t uc = 12;
int32_t i = 123;
uint32_t ui = 123;
int16_t s = 123;
uint16_t us = 123;
int64_t l = 123;
uint64_t ul = 123;
printf("%d %ld\n", c, sizeof(c));
printf("%d %ld\n", uc, sizeof(uc));
printf("%d %ld\n", i, sizeof(i));
printf("%d %ld\n", ui, sizeof(ui));
printf("%d %ld\n", s, sizeof(s));
printf("%d %ld\n", us, sizeof(us));
printf("%ld %ld\n", l, sizeof(l));
printf("%ld %ld\n", ul, sizeof(ul));
return 0;
}
That's a bit more specific. If you're writing cross-platform code, it's often helpful to know exactly how large a variable is!
Additionally, size_t
defines the "integral type"---an integer that matches the size of your platform. So on 64-bit Linux, size_t
is 64-bits. On 32-bit Linux, it is 32-bits.
Rust Primitive Types
Rust only defines the explicitly sized types (and usize
/isize
for the integral type):
use std::mem::size_of; fn main() { let c: i8 = 97; let uc: u8 = 10; let i: i32 = 123; let ui: u32 = 123; let s: i16 = 123; let us: u16 = 123; let l: i64 = 123; let ul: u64 = 123; let vl: i128 = 123; let uvl: u128 = 123; let is: isize = 123; let us: usize = 123; let f: f32 = 123.4; let d: f64 = 123.4; println!("{c}, {}", size_of::<i8>()); println!("{uc}, {}", size_of::<u8>()); println!("{i}, {}", size_of::<i32>()); println!("{ui}, {}", size_of::<u32>()); println!("{s}, {}", size_of::<i16>()); println!("{us}, {}", size_of::<u16>()); println!("{l}, {}", size_of::<i64>()); println!("{ul}, {}", size_of::<u64>()); println!("{vl}, {}", size_of::<i128>()); println!("{uvl}, {}", size_of::<u128>()); println!("{is}, {}", size_of::<isize>()); println!("{us}, {}", size_of::<usize>()); println!("{f}, {}", size_of::<f32>()); println!("{d}, {}", size_of::<f64>()); }
We'll talk about the
use
in more detail later. You are importing a single function from thestd::mem
namespace. There's no copy/paste. You could also typestd::mem::size_of
every time.
Some takeaways:
- Each type is explicitly defined as
i<size>
oru<size>
. - Bytes (
u8/i8
) are not chars! We'll talk aboutchar
in a bit. They are special! println!
can take variable names (but not complex expressions) in{name}
, or you can use{}
as a placeholder and fill in the blanks as a parameter.println!
defies Rust's normal syntax, which is why its a macro!
Auto
C++ introduced auto
. So you can do:
auto i = 5;
That makes the compiler figure out what i
is, based on its usages (or a default). Rust las the same thing, you don't have to specify a type at each declaration. This is perfectly valid (and easier to read/type):
fn main() { let n = 123; }
Rust also supports suffixes for specifying type:
fn main() { let i = 123i32; let f = 123.4f32; let d = 123.4f64; }
Mutability
Coming from C and C++, the following is quite normal:
#include <stdio.h>
int main() {
int i = 5;
i += 1;
printf("%d\n", i);
return 0;
}
However, this doesn't work in Rust (there is no ++
operator):
fn main() { let i = 5; i += 1; println!("{i}"); }
You get the compiler error cannot assign twice to immutable variable i
. That's because let
creates an immutable variable. It's the same as typing const int i = 5;
in C or C++.
Making a Mutable Variable
To mark a variable as mutable, you have to explicitly declare it as such:
fn main() { let mut i = 5; i += 1; println!("{i}"); }
So why wouldn't you define everything as mutable?
- Your program gains some clarity from the knowledge that a variable won't change.
- Functional-style programming tends to prefer not reusing variables.
- If you accidentally mutate a variable later, the compiler will stop you.
You can make everything let mut
, and then use the linter (cargo clippy
or your IDE) to highlight the variables that don't need it---but that's a crutch and should be avoided as you gain experience.
Shadowing
You can also make use of shadowing. This is popular in many functional styles. It can also be confusing. I recommend adopting a style that suits you.
Take the following immutable code:
fn main() { let i = 5; let i_plus_one = i + 1; println!("{i}"); }
Your variables are immutable, and you are making it clear what's going on in your algorithm by naming each subsequent step. That's great until you get to a big algorithm and start running into i_log10_times_3
... so you'll often find that "shadowing" is used to remove previous editions of a variable name from circulation as the calculation progresses:
fn main() { let i = 5; let i = i + 1; println!("{i}"); }
Shadowing is useful with scope (which we'll talk about in a moment). Within a scope, you can shadow a parent-scope's variable names---and get them back at the end of the scope. For example:
fn main() { let i = 5; { let i = i + 1; println!("{i}"); } println!("{i}"); }
Primitive Type Conversion
Take the following C program:
#include <stdio.h>
int main() {
int i = 500;
char c = i;
printf("%d\n", c);
return 0;
}
It compiles with no warnings, and outputs... -12
. C lets you implicitly convert between types---even when doing so loses data.
An equivalent Rust program won't compile:
fn main() { let i: i32 = 500; let c: i8 = i; println!("{c}"); }
Just in case you thought Rust was just protecting you against an overflow, this won't compile either:
fn main() { let i: i32 = 500; let j: i64 = i; println!("{j}"); }
Rust is really explicit about types and type conversion. You can almost never implicitly convert types. That's a good thing for avoiding bugs: it requires that you acknowledge that there is a type mismatch and explicitly handle it.
Brute Force Conversion with as
The lowest level (and most dangerous) form of conversion in Rust is the as
keyword. You can tell Rust that you accept that a conversion is potentially dangerous, and to do it anyway:
fn main() { let i: i32 = 500; let c: i8 = i as i8; println!("{c}"); }
This also prints out -12
---so you have bug compatibility! You also generally don't want to do this unless you are absolutely, positively sure that doing so is safe.
It's always safe to up-convert---you can be sure that the larger type of the same size will be able to hold your data:
fn main() { let i: i32 = 500; let i: i64 = i as i64; println!("{i}"); }
Mixing signed and unsigned, or converting to a smaller type is potentially dangerous. (There's been regular discussion in the Rust world about whether as
should sometimes be unsafe
).
Safe Conversion with into
Safe conversions between primitives are implemented with the into()
function (itself part of the Into
trait---traits are a much later topic). The compiler error messages earlier even suggested using it. Converting with into
is simple:
fn main() { let i: i32 = 500; let i: i64 = i.into(); println!("{i}"); }
into
isn't implemented for the potentially unsafe conversions. This won't compile:
fn main() { let i: i64 = 500; let i: i32 = i.into(); println!("{i}"); }
Fallible Conversion with try_into
Some conversions are possible, but may or may not work. This will work:
use std::convert::TryInto; fn main() { let i: i64 = 500; let i: i32 = i.try_into().unwrap(); println!("{i}"); }
And this will compile but crash at runtime:
use std::convert::TryInto; fn main() { let i: i64 = 2_147_483_648; let i: i32 = i.try_into().unwrap(); println!("{i}"); }
So what's going on with the unwrap
? try_into
returns a Result
type. We'll talk a lot about how they work internally later. A Result
is a Rust enumeration (which are a lot like tagged unions in C or C++) that either contains Ok(..)
or Err(..)
- where the ..
are a generic type. unwrap()
says "give me the value of Ok(x), or crash if it wasn't ok".
Obviously, crashing isn't a great choice---but we'll leave it there for now. Crashing is a better choice than corrupting your company's data because of a type conversion!
Numeric Overflow
Let's take a very simple C program:
#include <stdio.h>
int main() {
char a = 127;
a = a + 1;
printf("a = %d\n", a);
return 0;
}
If you haven't been programming for a while, you may be surprised that the output is -127
. Your 8-bit signed integer (char
) can only hold -128 through 127, in binary two's complement. A binary addition of 1 to 127 gives 10000000
. Since the first digit in two's complemenet binary represents the sign bit---you get -127.
Rust's behavior for this program varies by how you compiled your program. In default debug
mode:
fn main() { let mut a: i8 = 127; a += 1; println!("{a}"); }
Crashes the program with attempt to add with overflow
. (If you compile in release mode with cargo run --release
), it prints the wrapped number.
Always test your builds in `debug`` mode!
Explicitly Handling Wrapping
In C, you can detect this overflow with some additional code:
#include <stdio.h>
#include <limits.h>
int main() {
char a = 127;
char add = 1;
if (a > 0 && add > 0 && a > CHAR_MAX - add) {
printf("Overflow detected\n");
return 1;
}
a = a + add;
printf("a = %d\n", a);
return 0;
}
(You may also want to check for underflow)
Rust includes checked_
arithmetic for this purpose:
fn main() { let a: i8 = 127; let a = a.checked_add(1); println!("{a:?}"); }
This prints None
. That's odd! checked_add
returns an Option
type, which is fundamentally Rust's alternative to null/nullptr
. Just like a Result
, an Option
is a sum type that can either be None
, or Some(x)
.
Notice that I snuck in
:?
in the print. This is "debug printing", and prints the contents of complicated types if they implement the appropriate trait.
You can also unwrap
options:
fn main() { let a: i8 = 127; let a = a.checked_add(1).unwrap(); println!("{a}"); }
But I Want to Wrap!
Sometimes, wrapping is the desired behavior. It's used a lot in cryptographic functions, for example. Rust lets you opt in to the wrapping behavior:
fn main() { let a: i8 = 127; let a = a.wrapping_add(1); println!("{a}"); }
This won't crash on debug or release builds: you've explicitly told Rust (and whomever reads your code later) that wrapping was the intended behavior, and not a bug.
Saturating
Maybe you'd rather saturate at the maximum possible value?
fn main() { let a: i8 = 127; let a = a.saturating_add(1); println!("{a}"); }
This prints 127
.
Other Operations
Checked, saturating and wrapping variants of addition, subtraction, multiplication and division are all provided (division checking checks for divide by zero).
If you are sensing a theme, it's that Rust picks safe by default when possible---and gives you the chance to opt out. C and C++ tend to assume you know what you're doing, and offer the option of adding safety checks.
Control Flow
Rust offers similar control-flow options to C and C++.
If Statements
In C or C++, you are probably used to:
if (i == 5) {
// Do something
} else {
// Do something else
}
The Rust syntax is almost identical:
fn main() { let i = 6; if i == 5 { // Do something println!("5"); } else { // Do something else println!("Other"); } }
Switch Statements
You're probably also used to:
int i = 5;
switch (i) {
case 5: printf("5\n"); break;
case 6: printf("6\n"); break;
default: printf("Something else\n"); break;
}
Rust's equivalent is called match
, and is a little different:
fn main() { let i = 5; match i { 5 => println!("5"), 6 => println!("6"), _ => println!("Something else"), } }
match
can do a lot more than that, but let's focus on what we have here:
- There's no
break;
---matched cases do not fall through. - The syntax is different:
(case) => (expression)
. default
is replaced with_
--- which is Rust's general "something else" symbol.
If you need multiple lines, it's similar to C also:
fn main() { let i = 5; match i { 5 => { println!("5"); println!("5 is a good number."); } 6 => println!("6"), _ => println!("Something else"), } }
There's also a special "one case match" called "if let", but we're going to worry about that later.
Loops
Looping through data is pretty fundamental, so it shouldn't be a surprise that Rust supports loops.
For Loops
Take the following C code:
#include <stdio.h>
int main() {
int i;
for (i = 0; i < 10; i++) {
printf("%d\n", i);
}
return 0;
}
Unsurprisingly: this prints 0
through 9
.
Here's a Rust equivalent:
fn main() { for i in 0..10 { println!("{i}"); } }
The output is the same, but the syntax is quite different:
0..10
is an exclusive range. It provides an iterator over every number in the range, exclusing the last one. We'll worry about iterators later.i
only exists inside the loop scope. (In C++ and later C editions you can dofor (int i=0; i<10; i++)
for the same effect).- You don't have any control over the operation that occurs for each iteration. Rust just ticks through each entry in the range.
If you prefer an inclusive range:
fn main() { for i in 0 ..= 10 { println!("{i}"); } }
We'll look at
for_each
equivalency later.
While Loops
This C should look familiar, too:
#include <stdio.h>
int main() {
int i = 0;
while (i < 10) {
printf("%d\n", i);
i += 1;
}
return 0;
}
Equivalent Rust code looks like this:
fn main() { let mut i = 0; while i < 10 { println!("{i}"); i += 1; } }
Sadly, Rust doesn't protect you from an infinite while loop either!
loop
loops
Rust adds one more type of loop, named loop
. loop
runs forever, or until a break
statement is hit.
fn main() { let mut i = 0; loop { println!("{i}"); i += 1; if i > 9 { break; } } }
Strings
Strings are an area of significant difference between C, C+ and Rust. None of them really agree on how strings really work.
The Basic In-Memory String
Let's start with some C (that also works in C++):
#include <stdio.h>
int main() {
const char * my_string = "Hello, World";
printf("%s\n", my_string);
return 0;
}
- This prints "Hello, World!".
- You are storing
my_string
as aconst char *
. It's set aside as an area of memory, containing 8-bit ASCII for each character---and a zero at the end.
Here's a Rust equivalent:
fn main() { let my_string = "Hello, World"; println!("{my_string}"); }
Or if you want to use a constant, which always explicitly states the type:
fn main() { const MY_STRING: &str = "Hello, World"; println!("{MY_STRING}"); }
What's up with &str
? str
is a type that means "a string of characters in memory". Unlike C, it isn't suffixed with a zero. It is prefixed with the string length.
Let's Throw in Some Unicode!
#include <stdio.h>
int main() {
const char * my_string = "Hello, 🌎";
printf("%s\n", my_string);
return 0;
}
On reasonably recent GCC, this works. The compiler converts 🌎 to the appropriate UTF-8 - a series of bytes.
Rust works the same way:
fn main() { const MY_STRING: &str = "Hello, 🌎"; println!("{MY_STRING}"); }
The only difference being that Rust's char
type is explicitly UTF-8, not ASCII. When you operate on a collection of char
types, they may range from 1 to 8 bytes! That makes handling control-points easier, but also means that strings aren't plain old 8-bit integers anymore.
How about std::string
in C++?
Many C++ programmers have moved towards using std::string
---it's generally easier to work with, and less prone to foot-guns.
#include <string>
#include <iostream>
int main() {
std::string my_string = std::string("Hello, World!");
std::cout << my_string << std::endl;
return 0;
}
This also prints Hello, World!
. Nothing too revolutionary there.
String Concatenation
In C, you might combine two strings into a new string as follows:
#include <stdio.h>
#include <string.h>
int main() {
char buffer[64] = "Hello ";
const char * string2 = "World";
strcat(buffer, string2);
printf("%s", buffer);
return 0;
}
Make
buffer
too small and you are looking at a segmentation fault - or worse!
In C++, you can add some safety and do this:
#include <string>
#include <iostream>
int main() {
std::string my_string = std::string("Hello ");
std::string buffer = my_string + std::string("World");
std::cout << buffer << std::endl;
return 0;
}
No segmentation faults here!
Here's a Rust equivalent:
fn main() { let mut buffer = String::from("Hello "); buffer += "World"; println!("{buffer}"); }
Two Types of String
Just like C++, Rust has two string types (and a few more we won't talk about until we cover FFI):
&str
- a reference to a collection of characters in memory.&str
is immutable.String
- a type holding a collection of characters.String
can be mutated.
You can coerce a String
into an &str
by referencing it: &my_string
.
Functions and Scopes
Functions are a mainstay of structured programming. C and C++ both support them:
#include <stdio.h>
void print() {
printf("Hello, World");
}
int main() {
print();
}
Does exactly what you expect: it prints "Hello, World!". The equivalent Rust is similar:
fn print() { println!("Hello, World!"); } fn main() { print(); }
Returning Data from Functions
#include <stdio.h>
int add_one(int i) {
return i+1;
}
int main() {
int x = add_one(5);
printf("%d", x);
}
Here you are declaring a function named add_one
, with the return type int
. You accept a parameter named i
, and return i+1
.
The Rust syntax is quite different:
fn add_one(i: i32) -> i32 { i + 1 } fn main() { let x = add_one(5); println!("{x}"); }
The syntax differences are quite obvious:
- The return type goes on the end, prefixed with
->
. - Parameters are declared "name: type", rather than "type name".
- There's no
return
statement! By default, Rust functions always return the result of the last expression. In idiomatic Rust, you'll usually see functions declared in this way.
If you miss return
, it's still there:
fn add_one(i: i32) -> i32 { return i + 1; } fn main() { let x = add_one(5); println!("{x}"); }
Notice that to use return
you need to add a semicolon - but the first version didn't have one! Lines with a semicolon are still expressions---but they return the "unit type" (()
). So you can either omit the semicolon to have the expression "fall out" of the function, or you can use return
and a semicolon. That's a little confusing, so let's look at some underlying Rust.
In Rust, everything returns.
fn foo() {} fn main() { let i = foo(); println!("{i:?}"); }
Notice we've used
:?
, the debug print again.
The program prints ()
. That's because ()
is like void
- but it has a value (admittedly not a very useful one). So if you assign the result of a statement that ends in a ;
, you are setting it to the unit type---which is probably not what you wanted.
Rust also supports expression assignment:
fn main() { let i = 5; let i = if i < 5 { 1 } else { 0 }; println!("{i}"); }
Rust doesn't have a ternary operator!
You can assign from an expression or conditional just by returning using the no-semicolon syntax. This works for scopes, too:
fn main() { let i = { let mut accumulator = 0; for i in 0..10 { accumulator += i; } accumulator }; println!("{i}"); }
Note that you can't use the
return
keyword when you do this---return
explicitly returns out of the current function.
How about if I want to return multiple potential values from a function?
You can either make sure that every branch implicitly returns:
fn test(i: i32) -> i32 { if i < 10 { 0 } else { 1 } } fn main() { println!("{}", test(5)); }
Or you can use early return:
fn test(i: i32) -> i32 { if i < 10 { return; } 1 } fn main() { println!("{}", test(5)); }
Structures
In regular C, you are used to grouping data with struct
s:
#include <stdio.h>
struct mystruct_t {
int a;
int b;
};
int main() {
struct mystruct_t value = {
.a=1,
.b=2
};
printf("%d, %d", value.a, value.b);
return 0;
}
C++ is very similar, albeit with more assignment options:
#include <iostream>
struct mystruct_t {
int a;
int b;
};
int main() {
mystruct_t value = { 1, 2 };
std::cout << value.a << ", " << value.b << std::endl;
return 0;
}
Rust is similar, too:
struct MyStruct { a: i32, b: i32, } fn main() { let value = MyStruct { a: 1, b: 2, }; println!("{}, {}", value.a, value.b); }
Rust will let you use a shortcut to debug print, too:
#[derive(Debug)] struct MyStruct { a: i32, b: i32, } fn main() { let value = MyStruct { a: 1, b: 2, }; println!("{value:?}"); }
You can even "pretty print":
#[derive(Debug)] struct MyStruct { a: i32, b: i32, } fn main() { let value = MyStruct { a: 1, b: 2, }; println!("{value:#?}"); }
#derive
is another type of macro. The compiler will iterate through the structure at compile time, generating a trait implementation of fmt::Debug
for you (once again, we'll talk about traits later). It's not quite reflection, but it does a great job of faking it!
Structure Privacy
In C++, struct
defaults to all-public, while class
defaults to all private. You can control individual members' privacy with public
and private
sections:
struct MyStruct {
uint32_t my_public;
private:
uint32_t my_private;
}
Rust doesn't have classes, but all members default to being private unless you mark them as public with the pub
or pub(crate)
markers:
#![allow(unused)] fn main() { struct MyPrivateStruct {} // Structure is private to the module pub struct MyPublicStruct { my_private: u32, pub my_public: u32, pub(crate) my_public_but_not_exported_from_the_crate: u32, } }
Types of Structure
A "Marker Struct" (one you use to mark a type but that doesn't contain data) may be declared as:
#![allow(unused)] fn main() { struct MyMarker; let s = MyMarker; }
A regular structure with named fields:
#![allow(unused)] fn main() { struct Named { my_field: i32, } let s = Named { my_field: 3 }; println!("{}", s.my_field); }
And a tuple-structure:
#![allow(unused)] fn main() { struct TupleStruct(i32); let s = TupleStruct(3); println!("{}", s.0) }
Structure Functions
Functions can be attached to structures, as either methods or associated functions.
Associated Functions
Associated functions use a structure as a namespace. They are similar to static C++ functions in a class/struct, in that they aren't associated with an instance of a structure---you use the structure as a namespace for accessing them.
Functions are associated with a structure in Rust with an impl
block---an implementation block. Associated functions do not take a self
parameter referring to an instance.
struct MyStruct {} impl MyStruct { pub fn do_something() { // Function body } } fn main() { MyStruct::do_something(); }
Equivalent C++ looks like this:
#include <stdio.h>
#include <stdlib.h>
class MyClass {
public:
static void do_something() {
// Function body
}
};
int main() {
MyClass::do_something();
return 0;
}
You can use associated functions as a constructor. Constructors aren't special, and you can define as many of them as you want---there's no rule of 3, 5, etc. A constructor is a convention---it's like any other associated function, and by convention it returns the an instance of the type that houses it. You can also refer to the current type with the syntax sugar Self
:
#![allow(unused)] fn main() { struct MyStruct { value: i32, }; impl MyStruct { fn new() -> MyStruct { Self { value: 3 } } fn with_param(value: i32) -> Self { // Syntax sugar: if you are assigning from a variable of the same name // and type, you don't need to write "value:value". Self { value } } } }
A similar C++ constructor would look like this:
class MyClass {
public:
int value;
MyClass(int n) {
value = n;
}
};
int main() {
auto my_class = MyClass(3);
return 0;
}
There's no such thing as a move constructor, copy constructor. There's also no "default constructor", but there's a trait that accomplishes the same thing. Here's the short version using a "derive" (a macro that writes code for you):
#[derive(Default, Debug)] struct MyStruct { a: i32, b: String } fn main() { println!("{:?}", MyStruct::default()); }
You can also explicitly implement Default
:
#[derive(Debug)] struct MyStruct { a: i32, b: String } impl Default for MyStruct { fn default() -> Self { Self { a: 3, b: "Hello".into(), } } } fn main() { println!("{:?}", MyStruct::default()); }
Methods
You can also define functions that operate on an instance of a structure, just like C++ methods. You annotate the access method of the function as the first parameter:
struct MyClass { a: i32 } impl MyClass { fn print_me(&self) { println!("{}", self.a); } } fn main() { let mc = MyClass { a: 42 }; mc.print_me(); }
This is equivalent to the C++:
#include <stdio.h>
#include <iostream>
class MyClass {
public:
int value;
MyClass(int n) {
value = n;
}
void print_me() {
std::cout << this->value << "\n";
}
};
int main() {
auto my_class = MyClass(3);
my_class.print_me();
return 0;
}
You can replace &self
with different types of access to the instance:
&self
is most common. It provides a read-only (constant) reference that can access the instance but not change it.&mut self
grants mutable access via a reference. Your method can change the instance's contents.self
moves the instance into the function---it will be consumed if you don't return it. This is useful for "builder pattern" setups. We'll talk about that when we get to the Rust memory model.
Destructors - Drop
Rust doesn't explicitly define destructors---there's no need to define a destructor in most cases. So you won't encounter ~MyClass()
functions.
That doesn't mean that Rust has abandoned RAII---Resource Acquisition is Initialiation. Rather, Rust has adopted it wholesale and associated destructors with a trait called Drop
.
Drop
is implemented for all of the container types, smart pointers, etc. Whenever a variable leaves scope, the Drop
trait is called prior to a type being deleted. Let's explicitly implement Drop
to demonstrate this:
struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("I was dropped"); } } fn main() { let a= MyStruct{}; }
Not too surprisingly---the drop function runs at the end of the program. This applies to local variables in a function, too:
struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("I was dropped"); } } fn do_something() { let a = MyStruct{}; } fn main() { println!("Calling function"); do_something(); println!("Returned"); }
Dropping works on variables held by a structure even if the structure doesn't itself explicitly implement Drop
:
struct MyContainer { data: Vec<MyStruct> } struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("I was dropped"); } } fn main() { let mc = MyContainer { data: vec![MyStruct{}, MyStruct{}, MyStruct{}] }; }
So---just like C++---RAII makes it very difficult to accidentally leak memory. We'll go over this in a lot more detail soon.
You can also explicitly drop anything:
struct MyContainer { data: Vec<MyStruct> } struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("I was dropped"); } } fn main() { let mc = MyContainer { data: vec![MyStruct{}, MyStruct{}, MyStruct{}] }; std::mem::drop(mc); // Accessing mc is now a compilation error }
Tuples & Destructuring
Tuples in Rust are a bit easier to use than their C++ cousins. In C++:
std::tuple<std::string, double> tuple = {"Hello", 3.14};
let s = std::get<0>(tuple);
let n = std::get<1>(tuple);
In Rust, you can define a tuple with parentheses:
fn main() { let tuple = ( "Hello".to_string(), 3.14 ); let n = tuple.1; }
Rust also supports destructuring:
fn main() { let tuple = { "Hello".to_string(), 3.14 }; let (name, value) = tuple; }
Enumerations
In C and C++, enums are tied to a value:
enum Level {
Low, Medium, High
}
Level n = Low;
You can do the same thing in Rust:
#![allow(unused)] fn main() { enum Level { Low, Medium, High } let n = Level::Low; }
C and C++ let you assign specific values to enumerations and cast them into numeric types:
enum Level { Low=1, Medium=2, High=3 }
Level n = Medium;
int o = n;
Rust lets you do the same:
#![allow(unused)] fn main() { enum Level { Low = 1, Medium = 2, High = 3 } let n = Level::Medium; let o: u8 = n; }
Rust lets you specify the underlying type for numeric Enums:
#![allow(unused)] fn main() { #[repr(u8)] enum Level { Low = 1, Medium = 2, High = 3 } let n = Level::Medium; }
Enumerations Can Contain Data
Rust enumerations can also contain data. They are effectively a tagged union, or variant. They will be the size of the largest possible member. The match
command is the best way to access data within an enumeration.
#![allow(unused)] fn main() { enum Command { DoNothing, Count(i32), // A tuple-style enumeration entry Search{ term: String, max_depth: i32 }, // A structure-style enumeration entry } let c = Command:DoNothing; let c = Command::Count(12); let c = Command::Search { term: "term".to_string(), max_deptth: 12 }; match c { Command::DoNothing => {} Command::Count(n) => { for i in 0..n { println!("{n}") } } Command::Search{ term, max_depth } => {} } }
It's important to remember that an enumeration only contains data for the assigned value---it is equal to exactly one option.
Remember we've used Option
and Result
? They are enumerations, using generics.
Option
Rust doesn't have null
values (it does in FFI code, but not in "safe" Rust). Whenever a value may or may not be present, it is wrapped in an Option
. Options are generic (we'll talk about that in the generics section)---it can contain pretty much anything. The declaration for Option
looks like this:
#![allow(unused)] fn main() { enum Option<T> { None, Some(T) } }
You've seen unwrap()
(an associated function that returns the option's content or panics). You can also access an Option with a match
statement:
#![allow(unused)] fn main() { match my_option { Some(x) => // Do something with x None => {} } }
The "if let" statement is a one-option match
. You can use it to destructure an Option. It's often preferable, because there are only two possibilities:
#![allow(unused)] fn main() { if let Some(x) = my_option { // You can use x here } else { // If you want to do something else } }
if let
is conditional upon destructuring succeeding. Solet x = Some(x)
is pattern matching. If it succeeded, thenif let
fires. You can also usewhile let
to perform the comparison on an iterator.
Result
Result
is also an enumeration, indicating a fallible action. It's very similar to the new std::expected
in C++. Rust's Results are:
#![allow(unused)] fn main() { enum Result<T, E> { Ok(T) Err(E) } }
They are generic (just like options), effectively templated in C++ parlance. When an operation may fail, it returns either Ok(good_value)
or Err(error type)
. We'll talk a lot more about this in Error Handling.
Enumerations can have Associated Functions, too
You can use impl
blocks with enumerations, too.
enum MyEnum { A, B } impl MyEnum { fn new() -> Self { MyEnum::A } // A constructor fn print_me(&self) { match self { MyEnum::A => println!("The first option!"), MyEnum::B => println!("The second option!"), } } } fn main() { let e = MyEnum::new(); e.print_me(); }
Containers
Rust implements a number of container types, similar to C++ standard library types.
Arrays
An array in Rust can be declared as follows:
fn main() { let my_array = [1, 2, 3, 4]; // Type is inferred let my_array: [u32; 4] = [1, 2, 3, 4]; // Type specified }
Just like C and C++, arrays are stored on the stack. Similar C++:
#include <array>
int main() {
int my_array[4] = {1, 2, 3, 4};
std::array<int, 4> my_array_std = {1, 2, 3, 4};
return 0;
}
Unlike C, Rust stores the length of the array and bounds-checks accesses. Accessing my_array[5]
will panic, rather than exposing memory locations.
#include <stdio.h>
int main() {
int my_array[4] = {1, 2, 3, 4};
printf("%d", my_array[5]);
return 0;
}
Prints "32767" on my test system. The equivalent Rust fails to compile, but we can fool it by adding a little arithmetic:
fn main() { let array = [1, 2, 3, 4]; for index in 2..6 { println!("{}", array[index]); } }
This panics with the error message:
thread 'main' panicked at src/main.rs:4:24:
index out of bounds: the len is 4 but the index is 4
This is good, safe behavior. (A get_unchecked
call exists, and requries an unsafe
block, to elide the bounds checking).
Vectors
C programmers sometimes complain that in Rust, everything looks like a vector. They aren't wrong: vectors are everywhere. C++ programmers tend to have vectors everywhere too!
A vector is like an array, but: it can grow, and it is stored on the heap. A C++ vector is typically a pointer to an area of heap memory, a notation about size and capacity, and a notation about the type stored inside. Rust vectors are exactly the same, including the same growth characteristics: when the capacity is exhausted, they double in size.
Let's add some data to a vector and debug-print it:
fn main() { let mut my_vec = Vec::new(); my_vec.push(1); my_vec.push(2); println!("{my_vec:?}"); }
This is the same as the C++:
#include <stdio.h>
#include <vector>
int main() {
std::vector<int> my_vec;
my_vec.push_back(1);
my_vec.push_back(2);
for (auto val : my_vec) {
printf("%d", val);
}
return 0;
}
Rust is safe by default on bounds-checking:
fn main() { let my_vec = vec![1, 2]; // Helpful macro for initializing vectors println!("{}", my_vec[3]); }
Panics with an out-of-bounds error. The direct-equivalent C++:
#include <stdio.h>
#include <vector>
int main() {
std::vector<int> my_vec;
my_vec.push_back(1);
my_vec.push_back(2);
printf("%d", my_vec[3]);
return 0;
}
Prints "0" and terminates normally on my system. (You can use at
in C++ for a safe version; C++ typically defaults to unsafe, Rust to safe). Just like an array, get_unchecked
is available for unsafe access (with an unsafe
tag) if you really need to skip the bounds check.
If you need to pre-allocate a vector, you can use with_capacity
:
fn main() { let mut my_vec = Vec::with_capacity(100); my_vec.push(1); }
with_capacity
generates an empty vector, but with pre-allocated capacity for the number of elements you want to store.
HashMap
Rust also includes a HashMap
. It doesn't offer any ordering guarantees, and is comparable to std::unordered_map
. It implements "HashBrown" as the default hashing mechanism, which while fast is cryptographically safe---sometimes for performance people use FxHash
instead.
use std::collections::HashMap; fn main() { let mut my_map = HashMap::new(); my_map.insert("Hello".to_string(), 5); my_map.insert("World".to_string(), 6); if let Some(count) = my_map.get("Hello") { println!("{count}"); } }
Other Types
Rust implements many other containers:
VecDeque
- a vector that acts like a queue, similar todeque
in C++.LinkedList
- a premade linked-list type (writing linked lists in Rust is notoriously hard)BTreeMap
- a binary tree map that retains order.HashSet
- a direct equivalent tounordered_set
.BTreeSet
- a set implemented with a binary tree.BinaryHeap
- a heap structure.
There are many more available through the crates infrastructure, which we'll cover when we get to dependencies.
Iterators
Just like C++, Rust uses iterators to provide a rich set of algorithms---and many crates such as IterTools
use this to build even more functionality.
For loops in Rust are iterators:
fn main() { // These are the same: let my_vec = vec![1,2,3,4]; for n in my_vec { println!("{n}"); } let my_vec = vec![1,2,3,4]; for n in my_vec.into_iter() { println!("{n}"); } }
These are also consuming iterators - they don't return a reference to the item in the collection, they move it into the loop scope and it is dropped at the end of the scope. You can't use my_vec
after iterating with a consuming iterator. To iterate references:
fn main() { // These are the same: let my_vec = vec![1,2,3,4]; for n in &my_vec { println!("{n}"); } //let my_vec = vec![1,2,3,4]; // No need to recreate the vector now for n in my_vec.iter() { println!("{n}"); } }
If you prefer a more iterator-based approach, you can also do the following:
fn main() { let my_vec = vec![1,2,3,4]; my_vec.iter().for_each(|n| println!("{n}")); // We pass in a closure }
This is equivalent to the C++:
#include <stdio.h>
#include <vector>
int main() {
std::vector<int> v = {1, 2, 3, 4};
std::for_each(v.begin(), v.end(), [](int const& elem) {
printf("%d\n", elem);
});
return 0;
}
Iterators are frequently chained together (much like the new C++ Ranges system). For example:
fn main() { let my_vec = vec![1,2,3,4]; // Create a vector or strings let my_new_vec: Vec<String> = my_vec .iter() .map(|n| n.to_string()) // Map converts each entry to another type .collect(); let max = my_vec.iter().max(); let min = my_vec.iter().min(); let sum: u32 = my_vec.iter().sum(); let count = my_vec.iter().count(); let sum_with_fold = my_vec.iter().fold(0, |acc, x| acc + x); let all_the_numbers: Vec<(&u32, &String)> = my_vec.iter().zip(my_new_vec.iter()).collect(); println!("{all_the_numbers:?}"); my_vec .iter() .filter(|n| **n > 2) // Dereferencing because we have two layers of reference .for_each(|n| println!("{n}")); }
In other words, most of the algorithms in C++ are implemented. There's a lot more on the Rust documentation site: https://doc.rust-lang.org/std/iter/trait.Iterator.html#
We'll talk about parallel iteration later.
Move by Default
Newcomers to Rust are always surprised by this one!
First, a quick C++ quiz. Do you really know what std::move
does? When I've taught in person, there's a surprising amount of confusion. Does it create a copy? Can you use the value after you've moved out of it? It's not as clear as it could be.
std::move
converts a type to an xvalue
---a value that used to be valid, but may now have been plundered by the moved-to function. It's guaranteed to be valid, and it's undefined behavior to rely on its content. my_function(std::move(x));
leaves x
in a messy state.
Rust is move heavy---so much so that it is the default for all non-primitive types (which automatically get copied, since they fit in a register). In most languages, you'd expect this to compile:
fn do_something(s: String) { // Something happens here } fn main() { let s = "Hello".to_string(); do_something(s); println!("{s}"); }
Instead of compiling, you get a pretty long error message summarized as error[E0382]: borrow of moved value: s
. The full error does a great job of explaining what happened:
error[E0382]: borrow of moved value: `s`
--> src/main.rs:8:15
|
6 | let s = "Hello".to_string();
| - move occurs because `s` has type `String`, which does not implement the `Copy` trait
7 | do_something(s);
| - value moved here
8 | println!("{s}");
| ^^^ value borrowed here after move
|
note: consider changing this parameter type in function `do_something` to borrow instead if owning the value isn't necessary
--> src/main.rs:1:20
|
1 | fn do_something(s: String) {
| ------------ ^^^^^^ this parameter takes ownership of the value
| |
| in this function
= note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider cloning the value if the performance cost is acceptable
|
7 | do_something(s.clone());
| ++++++++
The parameter s: String
doesn't borrow or copy---it moves ownership to the function. The String now belongs to the function. It is dropped as soon as the function ends!
You could simply move it back (return value optimization guarantees from C++ apply to Rust, too):
fn do_something(s: String) -> String { // Something happens here s } fn main() { let s = "Hello".to_string(); let s = do_something(s); println!("{s}"); }
That's fine if its the coding style you want, but it's not overly ergonomic. What you want to do is to borrow the data---make a reference. That's up next.
Borrowing
References in Rust are explicit at both the caller and the callee. Our simple "do something" example with a reference works:
fn do_something(s: &String) { // Something happens here } fn main() { let s = "Hello".to_string(); do_something(&s); println!("{s}"); }
Contrast this with C++ in which you don't specify the
&
on the caller. It's more typing, but you can't accidentally copy your data when the function signature changes.
If you want to allow the function to change/mutate the data, you need to mutably borrow it:
fn do_something(s: &mut String) { // Something happens here *s += " World"; } fn main() { let mut s = "Hello".to_string(); do_something(&mut s); println!("{s}"); }
Notice that this is even more pedantic: you have to make s
mutable before it can be borrowed mutably. And then you have to explicitly mark the borrow &mut
. There's no escaping it---Rust makes you specify your intent when you lend the data to another funciton.
Borrowing Strings
Strings have a special case. A string contains a buffer of characters, and you can immutably refer to the characters as &str
---a static string. So this also works:
fn do_something(s: &str) { // Something happens here } fn main() { let s = "Hello".to_string(); do_something(&s); println!("{s}"); }
As we saw in the strings section, you can't modify an str
buffer---but if all you need to do is print or use the string in a formatting expression (etc.), it can be quicker to just pass the reference.
Slices
Slices are analagous to a C++ "view" type, or "span" type. They refer to a contiguous area of memory, usually inside a collection. You can use iterators on a span without needing to know the underlying details of the collection. For example:
fn sum(values: &[i32]) -> i32 { values.iter().sum() } fn main() { let my_vec = vec![1,2,3,4,5]; println!("{}", sum(&my_vec)); }
Vectors and arrays decay into a slice when you borrow them. You can also use slices to look at just part of a collection of data:
fn sum(values: &[i32]) -> i32 { values.iter().sum() } fn main() { let my_vec = vec![1,2,3,4,5]; println!("{}", sum(&my_vec[0..3])); }
Memory Management
Rust is a real systems language, with proper memory management. No garbage collector here! And with the great power of memory management, comes great responsibility to not accidentally leak your company's secrets or let remote users execute arbitrary code on your system.
Fortunately, Rust prioritizes safety---and also a safety culture, in which Rustaceans strive to create safe code.
C-style allocation and deallocation
Unless you are working on embedded, real-time or other really low-level systems, you probably won't need to manually allocate and de-allocate memory. Rust has very good memory management out-of-the-box, and you can get a long way without needing to worry about it. This section serves:
- To show you what you can do if you need it.
- To help you understand why
Box
,Vec
and other types are so useful---and what they actually do.
The Stack and Primitives
"Primitive" types (such as u32
, i8
and usize
/isize
---whose size is the pointer size of your platform) are natively supported by CPUs. You can store them on the stack, copy them between functions and generally not worry about things like ownership, borrowing and lifetimes. In fact, it's often slower to borrow a u32
than it is to copy it. Borrowing creates a pointer, which might be 64-bits in size, whereas the u32
itself is only 32-bits.
So when you are using primitives, you really don't have to worry. The stack will ensure that when a function ends, any variables on the stack will be cleaned up. Arrays on the stack are cleaned up, too.
The stack is small---64 kb by default on Linux. So you can't put everything in there.
Manually Allocating & De-allocating Memory
The "heap" is a region of memory that is shared by your program, and doesn't have the size-restrictions of the stack. It is always allocated and de-allocated. In "managed" languages, the language runtime is still allocating to the heap---but it uses a garbage collector of some sort to de-allocate memory that is no longer needed. This has the advantage that you don't need to worry about it, and the disadvantages:
- You don't know for sure when memory will be allocated. Is it allocated up-front? That's great for systems with a fixed memory size, but not so good for systems where you want to allocate memory on-demand. Is it allocated on first use? That's great for systems where you don't know how much memory you need up-front, but not so good for systems where you want to allocate memory up-front.
- You don't know for sure when the memory will be de-allocated.
- You get the infamous "GC pauses" where the program stops for a while to do garbage collection. The pauses might be very short, but it's still an insurmountable problem if you are trying to control the braking system on a sports car!
- You often have to jump through hoops to use an exact heap size, causing issues on embedded systems.
On some embedded platforms, you pretty much get to start out with a libc
implementation (that may not be complete). On others, you get a platform definition file and have to do things the hard way --- we're not going that far!
libc_malloc example
This is in the
code/04_mem/libc_malloc
folder.
fn allocate_memory_with_libc() { unsafe { // Allocate memory with libc (one 32-bit integer) let my_num: *mut i32 = libc::malloc(std::mem::size_of::<i32>() as libc::size_t) as *mut i32; if my_num.is_null() { panic!("failed to allocate memory"); } // Set the allocated variable - dereference the pointer and set to 42 *my_num = 42; assert_eq!(42, *my_num); // Free the memory with libc - this is NOT automatic libc::free(my_num as *mut libc::c_void); } } fn main() { allocate_memory_with_libc(); }
So if you find yourself having to use libc
, this is what you can expect: it looks a LOT like C! In your unsafe
block, you are calling malloc
, checking that it gave you the memory you requested, then setting the value of the memory and finally freeing it.
If you forget to call free
, then just like a C program---you leaked memory.
Using Rust's Allocator
Using malloc
isn't always as simple as it sounds, you need to worry about memory alignment (lining up memory blocks with your platform's "word size"). Rust provides an allocator setup that you can use instead. It's similar, and still unsafe
:
#![allow(unused)] fn main() { fn allocate_memory_with_rust() { use std::alloc::{alloc, dealloc, Layout}; unsafe { // Allocate memory with Rust. It's safer to force alignment. let layout = Layout::new::<u16>(); let ptr = alloc(layout); // Set the allocated variable - dereference the pointer and set to 42 *ptr = 42; assert_eq!(42, *ptr); // Free the memory - this is not automatic dealloc(ptr, layout); } } }
You have pretty much everything you expect from C: pointer arithmetic, null
pointers, forgetting to call dealloc
and leaking memory. At this level, it's quite ugly.
RAII - Resource Acquisition is Initialization
This pattern can be combined with resources. Memory, files, etc. Wrapping the resource in a type, and implementing Drop
to close it. C++ invented this paradigm, it led to an immediate improvement over C:
- No more
goto
to cleanup resources. - No more forgetting to cleanup resources.
This is why you haven't had to deal with memory or resource management: the RAII pattern is built into Rust, and every File
, Mutex
, Box
, Drop
, String
(etc.) are implementing Drop
in some way to ensure that you don't leak any memory or resources.
This example code is in
code/04_mem/smart_ptr
.
So let's take the memory allocation example and turn it into a "smart pointer"---a pointer that will clean up after itself.
use std::alloc::{Layout, alloc, dealloc}; struct SmartPointer<T> { ptr: *mut u8, data: *mut T, layout: Layout } impl <T> SmartPointer<T> { fn new() -> SmartPointer<T> { println!("Allocating memory for SmartPointer"); unsafe { let layout = Layout::new::<T>(); let ptr = alloc(layout); SmartPointer { ptr, data: ptr as *mut T, layout } } } fn set(&mut self, val: T) { unsafe { *self.data = val; } } fn get(&self) -> &T { unsafe { self.data.as_ref().unwrap() } } } impl <T> Drop for SmartPointer<T> { fn drop(&mut self) { println!("Deallocating memory from SmartPointer"); unsafe { dealloc(self.ptr, self.layout); } } } fn main() { let mut my_num = SmartPointer::<i32>::new(); my_num.set(12); println!("my_num = {}", my_num.get()); }
Box - Unique Pointer
C++ has the wonderful unique_ptr
type. You heap-allocate a unique_ptr
, it wraps its contents---and is automatically deleted when the pointer leaves scope. Rust has a type called Box
that does the same thing.
struct MyStruct { n: i32, } fn main() { let boxed = Box::new(MyStruct { n: 12 }); }
The Rust Box
type includes a huge number of options. These range from pinning memory in place (so it can't be rearranged) to building from_raw_parts
to wrap an existing pointer in a Box.
Rc and Arc - Shared Pointer
Sometimes, ownership becomes confusing. Particularly if you are sending data off to be processed in more than one thread, you can end up with shared ownership---and exactly when something should be dropped from memory becomes confusing.
Rust has Rc
(for "reference counted") as a wrapper type for this. (There's also Arc
- atomic reference counted - for multi-threaded situations).
You can turn any variable into a reference-counted variable (on the heap) by wrapping it in Rc
:
This is in
projects/part2/refcount
use std::rc::Rc; struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("Dropping"); } } fn move_it(n: Rc<MyStruct>) { println!("Moved"); } fn ref_it(n: &MyStruct) { // Do something } fn main() { let shared = Rc::new(MyStruct{}); move_it(shared.clone()); ref_it(&shared); }
So we take a reference, move a clone (the Rc
type is designed to have clone()
called whenever you want a new shared pointer to the original)---and the data is only dropped once. It is shared between all the functions. You can use this to spread data widely between functions.
You can't mutate the contents of an Rc
without some additional help.
Arc is the same thing, but it replaces the reference counter with an atomic---a guaranteed synchronized (and still very fast) thread-safe counter.
use std::sync::Arc; struct MyStruct {} impl Drop for MyStruct { fn drop(&mut self) { println!("Dropping"); } } fn move_it(n: Arc<MyStruct>) { println!("Moved"); } fn ref_it(n: &MyStruct) { // Do something } fn main() { let shared = Arc::new(MyStruct{}); move_it(shared.clone()); ref_it(&shared); }
The Borrow Checker
The borrow checker gets a bad name from people who run into it and discover "I can't do anything!". The borrow checker does take a bit of getting used to - but in the medium term it really helps.
I went through a cycle going from C++ to Rust, and many people I've talked to went through the same:
- First week or two: I hate the borrow checker! This is awful! I can't do anything!
- Next: I see how to work within what it wants, I can live with this
- Then: Wow, I'm writing Rust-like C++ and Go now - and my code is failing less frequently.
The good news is that if you are familiar with Modern C++, you've run into a lot of the same issues that the borrow checker helps with. Let's work through some examples that show how life with Rust is different.
Immutable by Default
This one trips a few people up when they start with Rust. This won't compile:
fn main() { let i = 5; i += 1; }
Variables are immutable by default. In C++ terms, you just tried to write:
int main() {
const i = 5;
i += 1;
return 0;
}
You can make i
mutable and it works as you'd expect:
fn main() { let mut i = 5; i += 1; }
In other words: C++ and Rust have exactly the opposite defaults. In C++, everything is mutable unless you const
it. Rust, everything is immutable unless you mut
it.
You could simply declare everything to be mutable. The linter will regularly remind you that things can be immutable. It's considered good Rust style to minimize mutability, so you aren't surprised by mutations.
Move by Default
Quick show of hands. Who knows what
std::move
does? Who really likesstd::move
?
This one surprises everyone. The following code does what you'd expect:
fn do_it(a: i32) { // Do something } fn main() { let a = 42; do_it(a); println!("{a}"); }
So why doesn't this work?
fn do_it(a: String) { // Do something } fn main() { let a = String::from("Hello"); do_it(a); println!("{a}"); }
So why did this work with i32
? i32
is a primitive - and implements a trait named Copy
. Types can only implement Copy
if they are equal to or smaller than a register---it's actually faster to just copy them than to use a pointer to their value. This is the same as C++ copying primitive types. When you work with a complex type (String
and C++'s std::string
are very similar; a size, a heap-allocated buffer of characters. In Rust's case they are UTF-8).
The error message borrow of moved value
, with a long explanation isn't as helpful as you might like.
The key is: Rust is move by default, and Rust is more strict about moving than C++. Here's what you wrote in C++ terms:
#include <string>
void do_it(std::string s) {
// Do something
}
int main() {
std::string s = "Hello";
do_it(std::move(s));
// s is now in a valid but unspecified state
return 0;
}
What happens if you use s
? Nobody knows, it's undefined behavior. std::move
in C++ converts an object to an xvalue
---a type that has "been moved out of", and may not may not be in a valid state. Rust takes this to the logical conclusion, and prevents access to a "moved out of" type.
Moving Values Around
If you want to, you can move variables in and out of functions:
fn do_it(a: String) -> String { // Do something a } fn main() { let a = String::from("Hello"); let a = do_it(a); println!("{a}"); }
This code is valid. Moving will generate memcpy
that is usually removed by compiler optimizations, and LLVM applies the same returned-value optimizations as C++ for returning from a function.
Usually, I recommend moving out of a variable if you are genuinely done with it. Conceptually, you are giving ownership of the object to another function - it's not yours anymore, so you can't do much with it.
This is conceptually very similar to using
unique_ptr
in C++. The smart pointer owns the contained data. You can move it between functions, but you can't copy it.
Destructors and Moving
In C++, you can have move constructors---and moving structures around can require some thought as move constructors fire. Rust simplifies this. Moving a structure does not fire any sort of constructor. We haven't talked about destructors yet, so let's do that.
In Rust, destructors are implemented by a trait named Drop
. You an add Drop
to your own types. Let's use this to illustrate the lifetime of a type as we move it around:
The code is in
projects/part2/destructors
struct MyStruct { s: String } impl Drop for MyStruct { fn drop(&mut self) { println!("Dropping: {}", self.s); } } fn do_it(a: MyStruct) { println!("do_it called"); } fn move_it(a: MyStruct) -> MyStruct { println!("move_it called"); a } fn main() { let a = MyStruct { s: "1".to_string() }; do_it(a); // a no longer exists let b = MyStruct { s: "2".to_string() }; let b = move_it(b); println!("{}", b.s); }
As you can see, Drop
is called when the structure ceases to be in scope:
do_it
runs, and receives ownership of the object. The destructor fires as soon as the function exits.move_it
runs, and the object remains in-scope. The destructor fires when the program exits.
RAII is central to Rust's safety model. It's used everywhere. I try to remember to credit C++ with its invention every time I mention it!
Borrowing (aka References)
So with that in mind, what if you don't want to move your data around a lot (and pray that the optimizer removes as many memcpy
calls as possible)? This introduces borrowing. Here's a very simple function that takes a borrowed parameter:
fn do_it(s: &String) { println!("{s}"); } fn main() { let s = "42".to_string(); do_it(&s); }
Predictably, this prints 42
. The semantics are similar to C++: you indicate a borrow/reference with &
. Unlike C++, you have to indicate that you are passing a reference at both the call-site and the function signature---there's no ambiguity (which helps to avoid accidental passing by value/copying). This is the same as the following C++:
#include <string>
#include <iostream>
void do_it(const std::string &s) {
std::cout << s << std::endl;
}
int main() {
std::string s = "42";
do_it(s);
return 0;
}
Once again, notice that the reference is implicitly immutable.
If you want a mutable borrow---permitted to change the borrowed value---you have to indicate so.
fn do_it(s: &mut String) { s.push_str("1"); } fn main() { let mut s = String::from("42"); do_it(&mut s); println!("{s}"); }
Notice that you are:
- Making
s
mutable in thelet mut
declaration. You can't mutably lend an immutable variable. - Explicitly decorating the lend as
&mut
at the call-site. - Explicitly borrowing as mutable in the parameters (
(s: &mut String)
).
Rust doesn't leave any room for ambiguity here. You have to mean it when you allow mutation!
Why Mutability Matters
The borrow checker enforces a very strict rule: a variable can only be borrowed mutably once at a time. You can have as many immutable borrows as you want---but only one current effective owner who can change the variable. This can take a little bit of getting used to.
So this is invalid code:
fn main() { let mut i: i32 = 1; let ref_i = &mut i; let second_ref_i = &mut i; println!("{i}"); println!("{ref_i}"); println!("{second_ref_i}"); }
The print statements are included to prevent the optimizer from realizing that variables are unused and silently removing them.
For example, this is an example of some code that triggers borrow-checker rage:
fn main() { let mut data = vec![1,2,3,4,5]; for (idx, value) in data.iter().enumerate() { if *value > 3 { data[idx] = 3; } } println!("{data:?}"); }
Look at the error message:
error[E0502]: cannot borrow `data` as mutable because it is also borrowed as immutable
--> src/main.rs:5:13
|
3 | for (idx, value) in data.iter().enumerate() {
| -----------------------
| |
| immutable borrow occurs here
| immutable borrow later used here
4 | if *value > 3 {
5 | data[idx] = 3;
| ^^^^ mutable borrow occurs here
Using an iterator (with .iter()
) immutably borrows each record in the vector in turn. But when we index into data[idx]
to change the value, we're mutably borrowing. Since you can't have a mutable borrow and other borrows, this is invalid.
You have to be careful to limit access. You could rewrite this code a few ways. The most Rustacean way is probably:
This is a good thing. Changing an underlying structure while you iterate it risks iterator invalidation.
Option 1: The Rustacean Iterators Way
fn main() { let mut data = vec![1,2,3,4,5]; data.iter_mut().filter(|d| **d > 3).for_each(|d| *d = 3); println!("{data:?}"); }
This is similar to how you'd do it with ranges3
or the C++20 ranges
feature. You are pipelining:
- You obtain a mutable iterator (it will pass an
&mut
reference to each entry in turn). - You filter the target records with a predicate.
|d| **d > 3
is a closure (lambda function) -d
is the parameter, which will arrive as&&mut
because the iterator takes a reference (&mut
) and the filter then passes a reference to the reference. (Good news: the compiler clean that up. I still think its ugly!) - Then you run
for_each
on the remaining entries.
That's great for problems that naturally fit into an iterator solution.
Option 2: Do the two-step
Another option is to separate the operations:
fn main() { let mut data = vec![1,2,3,4,5]; let mut to_fix = Vec::new(); for (idx, value) in data.iter().enumerate() { if *value > 3 { to_fix.push(idx); } } for idx in to_fix { // Note: no .iter(). We're *moving* through each entry, invalidating the vector! data[idx] = 3; } println!("{data:?}"); }
This is pretty typical: you "beat" the borrow checker by breaking your task down into specific stages. In this case, we avoided a potential iterator invalidation. We also made it a lot easier for the compiler to perform static analysis and prevent data races.
Dangling Pointers
The borrow checker prevents a lot of dangling pointer and reference errors. For example:
fn main() { let s = String::from("Hello"); let s_ref = &s; std::mem::drop(s); println!("{s_ref}"); }
Dropping s
terminates its existence (it's the same as delete
, it still calls destructors). Trying to print s
after it is dropped is a compiler error: s
no longer exists. Try the same in C++ and you don't get any warning by default (most static analysis will catch this):
#include <iostream>
int main() {
std::string * s = new std::string("Hello");
delete s;
std::cout << *s << std::endl;
}
Summary
The borrow checker does take some getting used to, but it's surprising how long you can go without running it into if you go with idiomatic, straight-forward code. It's especially hard coming from C++, which allows you to get by with a lot.
In this section, we've covered:
- Move by default, and Rust curing all "use after move" errors.
- Explicit borrowing, and no more "oops, I copied by value by mistake".
- Explicit mutability, to avoid surprises.
- The "one mutable access at a time" rule, which prevents hidden bugs like iterator invalidation.
- No more dangling pointers/references --- but still no garbage collector.
Now let's look at the second half of the borrow checker, lifetimes.
Lifetimes
The borrow checker not only tracks borrows, it attaches a lifetime to every borrow.
In very early versions of Rust, you had to annotate every reference with a lifetime. Be glad you don't have to do this anymore! Code could look like this:
fn do_it<'a>(s: &'a String) { println!("{s}"); } fn main() { let s = String::from("Hello"); do_it(&s); }
This is still valid Rust, but in most cases Rust is able to deduce an "anonymous lifetime" for reference usage. Let's look at the new code:
do_it<'a>
introduces a new lifetime, nameda
. You can name lifetimes whatever you want, but it's common to use short names.- In the arguments,
s: &'a String
states that the borrowedString
adheres to lifetimea
.
What's really happening here? Rust is tracking that when you call do_it
, a lifetime is created. The lifetime must exceed the lifetime of the object being pointed at. Not doing so is a compiler error.
Escaping References
In Go, this is a really common idiom. The Go compiler will detect that you're referencing a local variable (via escape analysis), hoist it to the heap without telling you, and let you have your reference.
This compiles in C++:
#include <iostream>
using namespace std;
int& bar()
{
int n = 10;
return n;
}
int main() {
int& i = bar();
cout<<i<<endl;
return 0;
}
The code does generate a warning, but it actually functioned on 2 of the 3 systems I tried it on! Rust is not so forgiving:
fn do_it() -> &String { let s = String::from("Hello"); &s } fn main() { let s = do_it(); }
Rust starts by telling you that you need a lifetime specifier, and suggests a special lifetime called 'static
. Static is a special lifetime in which you are promising that a reference will live forever, and Rust can not worry about it. So let's try that:
fn do_it() -> &'static String { let s = String::from("Hello"); &s } fn main() { let s = do_it(); }
It still doesn't compile, this time with the correct error: cannot return a reference to local variable
.
The borrow checker prevents this problem.
Returning References
What if you actually do want to return a valid reference? This function won't compile without lifetime specifiers.
fn largest<'a>(a: &'a i32, b: &'a i32) -> &'a i32 { if a > b { &a } else { &b } } fn main() { let a = 1; let b = 2; let ref_to_biggest = largest(&a, &b); println!("{ref_to_biggest}"); }
You have to clarify to Rust that the function can assume that both references will share a lifetime with the function output. So now for the returned reference to remain valid, both inputs also have to remain valid. (In this example, we're using a type that would be better off being copied anyway!)
Keeping References
Life starts to get complicated when you want to keep references around. Rust has to validate the lifetimes of each of these references.
struct Index { selected_string: &String } fn main() { let strings = vec![ String::from("A"), String::from("B"), ]; let index = Index { selected_string: &strings[1] }; println!("{}", index.selected_string); }
This fails to compile, but the compiler error tells you what needs to be done. So we apply its suggestions:
struct Index<'a> { selected_string: &'a String } fn main() { let strings = vec![ String::from("A"), String::from("B"), ]; let index = Index { selected_string: &strings[1] }; println!("{}", index.selected_string); }
And that works! You've tied the structure to the lifetime of the references it holds. If the strings table goes away, then the Index
is invalid. Rust won't let this compile:
struct Index<'a> { selected_string: &'a String } fn main() { let index = { let strings = vec![ String::from("A"), String::from("B"), ]; let index = Index { selected_string: &strings[1] }; index }; println!("{}", index.selected_string); }
The error message helpfully explains that strings does not live long enough
---which is true. This is the primary purpose of the borrow checker: dangling references become a compile-time error, rather than a long head-scratching session at runtime.
Concurrency
Rust makes a big deal about advertising "fearless concurrency". What does this actually mean?
- Concurrency primitives that aren't too painful to work with.
- Data races will not compile.
Data-Race Protection
Rust makes the bold claim that it offers "fearless concurrency" and no more data-races (within a program; it can't do much about remote calls). That's a very bold claim, and one I've found to be true so far---I'm much more likely to contemplate writing multi-threaded (and async) code in Rust now that I understand how it prevents me from shooting myself in the foot.
An Example of a Data Race
Here's a little modern C++ program with a very obvious data-racing problem (it's in the cpp/data_race
directory):
#include <thread>
#include <iostream>
int main() {
int counter = 0;
std::thread t1([&counter]() {
for (int i = 0; i < 1000000; ++i) {
++counter;
}
});
std::thread t2([&counter]() {
for (int i = 0; i < 1000000; ++i) {
++counter;
}
});
t1.join();
t2.join();
std::cout << counter << std::endl;
return 0;
}
The program compiled and ran without any warnings (although additional static analysis programs would probably flag this).
The program fires up two threads. Each loops, incrementing a counter. It joins the threads, and prints the result. The predictable result is that every time I run it, I get a different result: 1015717, 1028094, 1062030 from my runs.
This happens because incrementing an integer isn't a single-step operation:
- The CPU loads the current counter value, into a register.
- The CPU increments the counter.
- The CPU writes the counter back into memory.
There's no guaranty that the two threads won't perform these operations while the other thread is also doing part of the same operation. The result is data corruption.
Let's try the same thing in Rust. We'll use "scoped threads" (we'll be covering threading in a later session) to make life easier for ourselves. Don't worry about the semantics yet:
fn main() { let mut counter = 0; std::thread::scope(|scope| { let t1 = scope.spawn(|| { for _ in 0 .. 1000000 { counter += 1; } }); let t2 = scope.spawn(|| { for _ in 0 .. 1000000 { counter += 1; } }); let _ = t1.join(); let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type }); println!("{counter}"); }
And now you see the beauty behind the "single mutabile access" rule: the borrow checker prevents the program from compiling, because the threads are mutably borrowing the shared variable. No data race here!
Atomics
If you've used std::thread
, you've probably also run into atomic types. An atomic operation is guaranteed to be completed in one CPU operation, and optionally be synchronized between cores. The following C++ program makes use of an std::atomic_int
to always give the correct result:
#include <thread>
#include <iostream>
#include <atomic>
int main() {
std::atomic_int counter = 0;
std::thread t1([&counter]() {
for (int i = 0; i < 1000000; ++i) {
++counter;
}
});
std::thread t2([&counter]() {
for (int i = 0; i < 1000000; ++i) {
++counter;
}
});
t1.join();
t2.join();
std::cout << counter << std::endl;
return 0;
}
Rust gives you a similar option:
This code is in
projects/part2/atomics
use std::sync::atomic::Ordering::Relaxed; use std::sync::atomic::AtomicU32; fn main() { let counter = AtomicU32::new(0); std::thread::scope(|scope| { let t1 = scope.spawn(|| { for _ in 0 .. 1000000 { counter.fetch_add(1, Relaxed); } }); let t2 = scope.spawn(|| { for _ in 0 .. 1000000 { counter.fetch_add(1, Relaxed); } }); let _ = t1.join(); let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type }); println!("{}", counter.load(Relaxed)); }
So Rust and C++ are equivalent in functionality. Rust is a bit more pedantic---making you specify the ordering (which are taken from the C++ standard!). Rust's benefit is that the unsafe version generates an error---otherwise the two are very similar.
Why Does This Work?
So how does Rust know that it isn't safe to share an integer---but it is safe to share an atomic? Rust has two traits that self-implement (and can be overridden in unsafe code): Sync
and Send
.
- A
Sync
type can be modified - it has a synchronization primitive. - A
Send
type can be sent between threads - it isn't going to do bizarre things because it is being accessed from multiple places.
A regular integer is neither. An Atomic integer is both.
Rust provides atomics for all of the primitive types, but does not provide a general Atomic wrapper for other types. Rust's atomic primitives are pretty much a 1:1 match with CPU intrinsics, which don't generally offer sync+send atomic protection for complicated types.
Mutexes
If you want to provide similar thread-safety for complex types, you need a Mutex. Again, this is a familiar concept to C++ users.
Using a Mutex in C++ works like this:
#include <iostream>
#include <thread>
#include <mutex>
int main() {
std::mutex mutex;
int counter = 0;
std::thread t1([&counter, &mutex]() {
for (int i = 0; i < 1000000; ++i) {
std::lock_guard<std::mutex> guard(mutex);
++counter;
}
});
std::thread t2([&counter, &mutex]() {
for (int i = 0; i < 1000000; ++i) {
std::lock_guard<std::mutex> guard(mutex);
++counter;
}
});
t1.join();
t2.join();
std::cout << counter << std::endl;
return 0;
}
Notice how using the Mutex is a two-step process:
- You declare the mutex as a separate variable to the data you are protecting.
- You create a
lock_guard
by initializing the lock withlock_guard
's constructor, taking the mutex as a parameter. - The lock is automatically released when the guard leaves scope, using RAII.
This works, and always gives the correct result. It has one inconvenience that can lead to bugs: there's no enforcement that makes you remember to use the lock. You can get around this by building your own type and enclosing the update inside it---but the compiler won't help you if you forget. For example, commenting out one of the mutex locks won't give any compiler warnings.
Let's build the same thing, in Rust. The Rust version is a bit more complicated:
This code is in
projects/part2/mutex
use std::sync::{Arc, Mutex}; fn main() { let counter = Arc::new(Mutex::new(0)); std::thread::scope(|scope| { let my_counter = counter.clone(); let t1 = scope.spawn(move || { for _ in 0 .. 1000000 { let mut lock = my_counter.lock().unwrap(); *lock += 1; } }); let my_counter = counter.clone(); let t2 = scope.spawn(move || { for _ in 0 .. 1000000 { let mut lock = my_counter.lock().unwrap(); *lock += 1; } }); let _ = t1.join(); let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type }); let lock = counter.lock().unwrap(); println!("{}", *lock); }
Let's work through what's going on here:
let counter = Arc::new(Mutex::new(0));
is a little convoluted.- Mutexes in Rust wrap the data they are protecting, rather than being a separate entity. This makes it impossible to forget to lock the data---you don't have access to the interior without obtaining a lock.
Mutex
only provides theSync
trait---it can be safely accessed from multiple locations, but it doesn't provide any safety for sending the data between threads.- To gain the
Send
trait, we also wrap the whole thing in anArc
.Arc
is "atomic reference count"---it's just like anRc
, but uses an atomic for the reference counter. Using anArc
ensures that there's only a single counter, with safe access to it from the outside. - Note that
counter
isn't mutable---despite the fact that it is mutated. This is called interior mutability. The exterior doesn't change, so it doesn't have to be mutable. The interior can be changed via theArc
and theMutex
---which is protected by theSync+Send
requirement.
- Before each thread is created, we call
let my_counter = counter.clone();
. We're making a clone of theArc
, which increments the reference count and returns a shared pointer to the enclosed data.Arc
is designed to be cloned every time you want another reference to it. - When we start the thread, we use the
let t1 = scope.spawn(move || {
pattern. Notice the move. We're telling the closure not to capture references, but instead to move captured variables into the closure. We've made our own clone of theArc
, and its the only variable we are referencing---so it is moved into the thread's scope. This ensures that the borrow checker doesn't have to worry about trying to track access to the same reference across threads (which won't work).Sync+Send
protections remain in place, and it's impossible to use the underlying data without locking the mutex---so all of the protections are in place. let mut lock = my_counter.lock().unwrap();
locks the mutex. It returns aResult
, so we're unwrapping it (we'll talk about why later). The lock itself is mutable, because we'll be changing its contents.- We access the interior variable by dereferencing the lock:
*lock += 1;
So C++ wins slightly on ergonomics, and Rust wins on preventing you from making mistakes!
Summary
Rust's data race protection is very thorough. The borrow-checker prevents multiple mutable accesses to a variable, and the Sync+Send
system ensures that variables that are accessed in a threaded context can both be sent between threads and safely mutated from multiple locations. It's extremely hard to create a data race in safe Rust (you can use the unsafe
tag and turn off protections if you need to)---and if you succeed in making one, the Rust core team will file it as a bug.
All of these safety guarantees add up to create an environment in which common bugs are hard to create. You do have to jump through a few hoops, but once you are used to them---you can fearlessly write concurrent code knowing that Rust will make the majority of multi-threaded bugs a compilation error rather than a difficult debugging session.
Spawning Threads
In main.rs
, replace the contents with the following:
fn hello_thread() { println!("Hello from thread!"); } fn main() { println!("Hello from main thread!"); let thread_handle = std::thread::spawn(hello_thread); thread_handle.join().unwrap(); }
Now run the program:
Hello from main thread!
Hello from thread!
So what's going on here? Let's break it down:
- The program starts in the main thread.
- The main thread prints a message.
- We create a thread using
std::thread::spawn
and tell it to run the functionhello_thread
. - The return value is a "thread handle". You can use these to "join" threads---wait for them to finish.
- We call
join
on the thread handle, which waits for the thread to finish.
What happens if we don't join the thread?
Run the program a few times. Sometimes the secondary thread finishes, sometimes it doesn't. Threads don't outlive the main program, so if the main program exits before the thread finishes, the thread is killed.
Spawning Threads with Parameters
The spawn
function takes a function without parameters. What if we want to pass parameters to the thread? We can use a closure:
fn hello_thread(n: u32) { println!("Hello from thread {n}!"); } fn main() { let mut thread_handles = Vec::new(); for i in 0 .. 5 { let thread_handle = std::thread::spawn(move || hello_thread(i)); thread_handles.push(thread_handle); } thread_handles.into_iter().for_each(|h| h.join().unwrap()); }
Notice three things:
- We're using a closure---an inline function that can capture variables from the surrounding scope.
- We've used the shorthand format for closure:
|| code
- parameters live in the||
(there aren't any), and a single statement goes after the||
. You can use complex closures with a scope:|x,y| { code block }
. - The closure says
move
. Remember when we talked about ownership? You have to move variables into the closure, so the closure gains ownership of them. The ownership is then passed to the thread. Otherwise, you have to use some form of synchronization to ensure that data is independently accessed---to avoid race conditions.
The output will look something like this (the order of the threads will vary):
Hello from thread 0!
Hello from thread 2!
Hello from thread 1!
Hello from thread 4!
Hello from thread 3!
In this case, as we talked about last week in Rust Fundamentals integers are copyable. So you don't have to do anything too fancy to share them.
Returning Data from Threads
The thread handle will return any value returned by the thread. It's generic, so it can be of any type (that supports sync+send; we'll cover that later). Each thread has its own stack, and can make normal variables inside the thread---and they won't be affected by other threads.
Let's build an example:
fn do_math(i: u32) -> u32 { let mut n = i+1; for _ in 0 .. 10 { n *= 2; } n } fn main() { let mut thread_handles = Vec::new(); for i in 0..10 { thread_handles.push(std::thread::spawn(move || { do_math(i) })); } for handle in thread_handles { println!("Thread returned: {}", handle.join().unwrap()); } }
This returns:
Thread returned: 1024
Thread returned: 2048
Thread returned: 3072
Thread returned: 4096
Thread returned: 5120
Thread returned: 6144
Thread returned: 7168
Thread returned: 8192
Thread returned: 9216
Thread returned: 10240
Notice that each thread is doing its own math, and returning its own value. The join
function waits for the thread to finish, and returns the value from the thread.
Dividing Workloads
We can use threads to divide up a workload. Let's say we have a vector of numbers, and we want to add them all up. We can divide the vector into chunks, and have each thread add up its own chunk. Then we can add up the results from each thread.
fn main() { const N_THREADS: usize = 8; let to_add: Vec<u32> = (0..5000).collect(); // Shorthand for building a vector [0,1,2 .. 4999] let mut thread_handles = Vec::new(); let chunks = to_add.chunks(N_THREADS); // Notice that each chunk is a *slice* - a reference - to part of the array. for chunk in chunks { // So we *move* the chunk into its own vector, taking ownership and // passing that ownership to the thread. This adds a `memcpy` call // to your code, but avoids ownership issues. let my_chunk = chunk.to_owned(); // Each thread sums its own chunk. You could use .sum() for this! thread_handles.push(std::thread::spawn(move || { let mut sum = 0; for i in my_chunk { sum += i; } sum })); } // Sum the sums from each thread. let mut sum = 0; for handle in thread_handles { sum += handle.join().unwrap(); } println!("Sum is {sum}"); }
There's a lot to unpack here, so I've added comments:
- We use a constant to define how many threads we want to use. This is a good idea, because it makes it easy to change the number of threads later. We'll use 8 threads, because my laptop happens to have 8 cores.
- We create a vector of numbers to add up. We use the
collect
function to build a vector from an iterator. We'll cover iterators later, but for now, just know thatcollect
builds a vector from a range. This is a handy shorthand for turning any range into a vector. - We create a vector of thread handles. We'll use this to join the threads later.
- We use the
chunks
function to divide the vector into chunks. This returns an iterator, so we can use it in afor
loop. Chunks aren't guaranteed to be of equal size, but they're guaranteed to be as close to equal as possible. The last chunk will be smaller than the others. - Now we hit a problem:
chunks
is a vector owned by the main thread.- Each chunk is a slice --- a borrowed reference --- to part of the vector.
- We can't pass a borrowed reference to a thread, because the thread might outlive the main thread. There's no guarantee that the order of execution will ensure that the data is destroyed in a safe order.
- Instead, we use
to_owned
which creates an owned copy of each chunk. This is amemcpy
operation, so it's not free, but it's safe.
This is a common pattern when working with threads. You'll often need to move data into the thread, rather than passing references.
Moving chunks like this works fine, but if you are using threads to divide up a heavy workload with a single answer --- there's an easier way!
Scoped Threads
In the previous example we divided our workload into chunks and then took a copy of each chunk. That works, but it adds some overhead. Rust has a mechanism to assist with this pattern (it's a very common pattern): scoped threads.
Let's build an example:
use std::thread; fn main() { const N_THREADS: usize = 8; let to_add: Vec<u32> = (0..5000).collect(); let chunks = to_add.chunks(N_THREADS); let sum = thread::scope(|s| { let mut thread_handles = Vec::new(); for chunk in chunks { let thread_handle = s.spawn(move || { let mut sum = 0; for i in chunk { sum += i; } sum }); thread_handles.push(thread_handle); } thread_handles .into_iter() .map(|handle| handle.join().unwrap()) .sum::<u32>() }); println!("Sum is {sum}"); }
This is quite similar to the previous example, but we're using scoped threads. When you use thread::scope
you are creating a thread scope. Any threads you spawn with the s
parameter are guaranteed to end when the scope ends. You can still treat each scope just like a thread.
Because the threads are guaranteed to terminate, you can safely borrow data from the parent scope. This is a lifetime issue: a normal thread could keep running for a long time, past the time the scope that launched it ends---so borrowing data from that scope would be a bug (and a common cause of crashes and data corruption in other languages). Rust won't let you do that. But since you have the guarantee of lifetime, you can borrow data from the parent scope without having to worry about it.
This pattern is perfect for when you want to fan out a workload to a set of calculation threads, and wait to combine them into an answer.
Making it Easy with Rayon
A library named "Rayon" is the gold-standard for easy thread-based concurrency in Rust. It actually uses another crate (crossbeam
) under the hood, but it provides a much simpler interface for the most common use cases. Rayon can help you with a lot of tasks. Let's work through using it.
Parallel Iterators
Let's start by adding Rayon to the project:
cargo add rayon
Probably the nicest addition Rayon bring is par_iter
. The majority of things you can do with an iterator, you can auto-parallelize with par_iter
. For example:
use rayon::prelude::*; fn main() { let numbers: Vec<u64> = (0 .. 1_000_000).collect(); let sum = numbers.par_iter().sum::<u64>(); println!("{sum}"); }
Rayon creates a thread-pool (1 thread per CPU), with a job queue. The queue implements work-stealing (no idle threads), and supports "sub-tasks" - a task can wait for another task to complete. It really is as simple as using par_iter()
(for an iterator of references), par_iter_mut()
(for an iterator of mutable references), or into_par_iter()
(for an iterator of values that moves the values).
Let's do another test, this time with nested tasks. We'll use a really inefficient function for finding prime numbers:
use std::time::Instant; use rayon::prelude::*; fn is_prime(n: u32) -> bool { (2 ..= n/2).into_par_iter().all(|i| n % i != 0 ) } fn main() { // Print primes below 1,000 let now = Instant::now(); let numbers: Vec<u64> = (2 .. 10_000).collect(); let elapsed = now.elapsed(); let mut primes: Vec<&u64> = numbers.par_iter().filter(|&n| is_prime(*n as u32)).collect(); primes.sort(); println!("{primes:?}"); println!("It took {} us to find {} primes", elapsed.as_micros(), primes.len()); }
Workspaces, Crates, Programs, Libraries and Modules
Let's talk about some terminology:
- A
crate
is a Rust package. It can either be a program or a library---it's a package of code managed by Cargo. - A
program
is an executable program. Acrate
produces a program if it has amain.rs
file, and usually amain
function (you can change the main function name, but it does need an entry point) - A
library
is a crate with alib.rs
file. It compiles as a static library by default, you can override this if you need dynamic libraries (Rust is very much oriented towards self-contained statically linked systems). - A
module
is a unit-of-work for the compiler. Programs and libraries are divided into modules. - A
workspace
is a Cargo helper that lets you include multiple crates in one environment with a shared compilation target directory and better incremental compilation.
This is quite unlike C++'s system. #include
is almost a cut-and-paste; the new C++20 modules system is a bit more similar--but I had troubles getting it to work consistently across platforms.
Workspaces
The example code uses a workspace, and I'd encourage you to do the same. Workspaces are a great mechanism for storing related code together.
Let's create a workspace.
cd
to your parent directory.- Create a new Rust project with
cargo new my_workspace
. cd
intomy_workspace
.- Edit
src/main.rs
to change "Hello, World!" to something like "You probably intended to run a workspace member". This is optional, but helps avoid confusion. - While in
my_workspace
, create a new project.cargo new hello
. - Edit
my_workspace/Cargo.toml
:
[workspace]
members = [ "hello" ]
Now change directory to my_workspace/hello
and run the program with cargo run
.
Take a look at my_workspace
and you will see that a target
directory has appeared. Within a workspace, all compiler artifacts are shared. For large projects, this can save a huge amount of disk space. It can also save on re-downloading dependencies, and will only recompile portions of the workspace that have changed.
While working on Hands-on Rust, I initially had 55 projects in separate crates without a workspace. I noticed that my book's
code
folder was using nearly 6 gigabytes of disk space, which was crazy. So I added a workspace, and that shrunk to a few hundred megabytes. Every single project was downloading all of the dependencies and building them separately.
Workspaces are safe to upload to github
or your preferred Git repo. You can even access dependencies within a workspace remotely (we'll cover that in dependencies).
Libraries
Let's workshop through creating our first library. Keep the my_workspace
and hello
projects.
Change directory back to the workspace root (my_workspace/
). Create a new library project;
cargo new hello_library --lib
Notice the
--lib
flag. You are creating a library.
Open my_workspace/Cargo.toml
and add hello_library
as a workspace member:
[workspace]
members = [ "hello", "hello_library" ]
Now open hello_library/src/lib.rs
. Notice that Rust has auto-generated an example unit test system. We'll cover that in unit tests shortly. For now, delete it all and replace with the following code:
#![allow(unused)] fn main() { pub fn say_hello() { println!("Hello, world!"); } }
The pub
marks the function as "public"---available from outside the current module. Since it is in lib.rs
, it will be exported in the library.
Now open hello/Cargo.toml
and we'll add a dependency:
[dependencies]
hello_libary = { path = "../hello_library" }
And open hello/src/main.rs
and we'll use the dependency. Replace the default code with:
use hello_library::say_hello; fn main() { say_hello(); }
Congratulations! You've made your first statically linked library.
Modules and Access
Rust can subdivide code into modules, which can both be and contain public
and private
(private being the default). Coming from C++, I found this a little confusing. You can also create modules in-place (as namespaces) or in separate files. This can be confusing, so let's work through some examples.
Inline Module (Namespace)
Open hello_library/src/lib.rs
. Let's add a private module:
#![allow(unused)] fn main() { mod private { fn hi() { println!("Say Hi!"); } } pub fn say_hello() { println!("Hello, world!"); } }
If you try to use private::hi()
in your hello/src/main.rs
program---it won't work. The module and the function are both private:
use hello_library::say_hello; fn main() { say_hello(); say_hello_library::private::hi(); // Will not compile }
You can fix this by changing the module to be public:
#![allow(unused)] fn main() { pub mod private { fn hi() { println!("Say Hi!"); } } pub fn say_hello() { println!("Hello, world!"); } }
And it still doesn't work! That's because making a module public only exposes the public members of the module. So you also need to decorate the function as public:
#![allow(unused)] fn main() { pub mod private { pub fn hi() { println!("Say Hi!"); } } pub fn say_hello() { println!("Hello, world!"); } }
So that allows you to make a public namespace---and include private parts in the namespace that aren't exposed to the world. What if you want to write a function in a module, and expose it in a different namespace?
#![allow(unused)] fn main() { pub mod private { pub fn hi() { println!("Say Hi!"); } } pub use private::hi; pub fn say_hello() { println!("Hello, world!"); } }
The use
statement---importing something into the current namespace---can also be decorated with pub
to re-export that import. You can use this with dependencies or your modules. (It's common to make a prelude
module and import all of the most-likely to be useful functions and types into it for re-rexport). Now your program can refer to hello_library::hi
directly.
File-based modules
If you're working in a team, it's usually a good idea to not all be trying to edit the same file at once. There are other advantages to using multiple files:
- Rust can compile multiple files at the same time.
- Organizing your code with files makes it a lot easier to find things.
- You can use conditional compilation to include different files based on compilation constraints.
Let's make a one-file module. In hello_library/src
create a new file named goodbye.rs
. In that file, write:
#![allow(unused)] fn main() { pub fn bye() { println!("Goodbye"); } }
Simply having the file doesn't make it do anything, or part of your project. In hello_library/src/lib.rs
add a line to include the module:
#![allow(unused)] fn main() { mod goodbye; }
The module is now private, even though the bye
function is public! You will be able to access bye
elsewhere in your library, but not from consumer applications. You can use the same mechanisms as for inline modules to change that. pub mod
exports it as a hello_library::goodbye
(the filename is the namespace). Or you can pub use goodbye::bye
.
Directory modules
The final type of module places the module in a directory. The directory must contain a mod.rs
file to act as the module root---and can include other files or inline modules as above.
Create a new directory, hello_library/src/dirmod
. In that directory, create mod.rs
:
#![allow(unused)] fn main() { pub fn dir_hello() { println!("Hello from dir module"); } }
Now in hello_library/src/lib.rs
include the new module:
#![allow(unused)] fn main() { pub mod dirmod; }
You can now access the module in your hello
project, with hello_library::dirmod::dir_hello()
.
Traits
You've used traits a lot---they are an important part of Rust. But we haven't really talked about them.
Implementing Traits
Whenever you've used #[derive(Debug, Clone, Serialize)]
and similar---you are using procedural macros to implement traits. We're not going to dig into procedural macros---they are worthy of their own class---but we will look at what they are doing.
Debug
is a trait. The derive macro is implementing the trait for you (including identifying all of the fields to output). You can implement it yourself:
#![allow(unused)] fn main() { use std::fmt; struct Point { x: i32, y: i32, } impl fmt::Debug for Point { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { f.debug_struct("Point") .field("x", &self.x) .field("y", &self.y) .finish() } } }
Traits are an interface. Each trait defines functions that must be implemented to apply the trait to a type. Once you implement the trait, you can use the trait's functions on the type---and you can also use the trait as a type.
Making a Trait
The code for this is in
code/04_mem/make_trait
.
Let's create a very simple trait:
#![allow(unused)] fn main() { trait Animal { fn speak(&self); } }
This trait has one function: speak
. It takes a reference to self
(the type implementing the trait) and returns nothing.
Note: trait parameters are also part of the interface, so if a trait entry needs
&self
---all implementations of it will need&self
.
Now we can make a cat:
#![allow(unused)] fn main() { struct Cat; impl Animal for Cat { fn speak(&self) { println!("Meow"); } } }
Now you can run speak()
on any Cat
:
fn main() { let cat = Cat; cat.speak(); }
You could go on and implement as many speaking animals as you like.
Traits as Function Parameters
You can also create functions that require that a parameter implement a trait:
#![allow(unused)] fn main() { fn speak_twice(animal: &impl Animal) { animal.speak(); animal.speak(); } }
You can call it with speak_twice(&cat)
---and it runs the trait's function twice.
Traits as Return Types
You can also return a trait from a function:
#![allow(unused)] fn main() { fn get_animal() -> impl Animal { Cat } }
The fun part here is that you no-longer know the concrete type of the returned type---you know for sure that it implements Animal
. So you can call speak
on it, but if Cat
implements other traits or functions, you can't call those functions.
Traits that Require Other Traits
You could require that all Animal
types require Debug
be also implemented:
#![allow(unused)] fn main() { trait Animal: Debug { fn speak(&self); } }
Now Cat
won't compile until you derive (or implement) `Debug).
You can keep piling on the requirements:
#![allow(unused)] fn main() { trait DebuggableClonableAnimal: Animal+Debug+Clone {} }
Let's make a Dog that complies with these rules:
#![allow(unused)] fn main() { #[derive(Debug, Clone)] struct Dog; impl Animal for Dog { fn speak(&self) { println!("Woof"); } } impl DebuggableClonableAnimal for Dog {} }
Now you can make a dog and call speak
on it. You can also use DebuggableCloneableAnimal
as a parameter or return type, and be sure that all of the trait functions are available.
Dynamic Dispatch
All of the examples above can be resolved at compile time. The compiler knows the concrete type of the trait, and can generate the code for it. But what if you want to store a bunch of different types in a collection, and call a trait function on all of them?
You might want to try this:
#![allow(unused)] fn main() { let animals: Vec<impl Animal> = vec![Cat, Dog]; }
And it won't work. The reason it won't work is that Vec
stores identical entries for each record. That means it needs to know the size of the entry. Since cats and dogs might be of different sizes, Vec
can't store them.
You can get around this with dynamic dispatch. You've seen this once before, with type GenericResult<T> = std::result::Result<T, Box<dyn std::error::Error>>;
. The dyn
keyword means that the type is dynamic---it can be different sizes.
Now think back to boxes. Boxes are a smart-pointer. That means they occupy the size of a pointer in memory, and that pointer tells you where the data actually is in the heap. So you can make a vector of dynamic, boxed traits:
#![allow(unused)] fn main() { let animals: Vec<Box<dyn Animal>> = vec![Box::new(Cat), Box::new(Dog)]; }
Each vector entry is a pointer (with a type hint) to a trait. The trait itself is stored in the heap. Accessing each entry requires a pointer dereference and a virtual function call. (A vtable
will be implemented, but often optimized away---LLVM is very good at avoiding making vtables when it can).
In the threads class, someone asked if you could "send interfaces to channels". And yes, you can---you have to use dynamic dispatch to do it. This is valid:
#![allow(unused)] fn main() { let (tx, rx) = std::sync::mpsc::channel::<Box<dyn Animal>>(); }
This works with other pointer types like
Rc
, andArc
, too. You can have a reference-counted, dynamic dispatch pointer to a trait.
Using dynamic dispatch won't perform as well as static dispatch, because of pointer chasing (which reduces the likelihood of a memory cache hit).
The Any
Type
If you really, really need to find out the concrete type of a dynamically dispatched trait, you can use the std::any::Any
trait. It's not the most efficient design, but it's there if you really need it.
The easiest way to "downcast" is to require Any
in your type and an as_any
function:
#![allow(unused)] fn main() { struct Tortoise; impl Animal for Tortoise { fn speak(&self) { println!("What noise does a tortoise make anyway?"); } } impl DowncastableAnimal for Tortoise { fn as_any(&self) -> &dyn Any { self } } }
Then you can "downcast" to the concrete type:
#![allow(unused)] fn main() { let more_animals : Vec<Box<dyn DowncastableAnimal>> = vec![Box::new(Tortoise)]; for animal in more_animals.iter() { if let Some(cat) = animal.as_any().downcast_ref::<Tortoise>() { println!("We have access to the tortoise"); } animal.speak(); } }
If you can avoid this pattern, you should. It's not very Rusty---it's pretending to be an object-oriented language. But it's there if you need it.
Implementing Operators
"Operator overloading" got a bad name from C++. You can abuse it, and decide that operators do bizarre things. Please don't. If you allow two types to be added together, please use an operation that makes sense to the code reader!
See the
04_mem/operator_overload
project.
You can implement operators for your types. Let's make a Point
type that can be added together:
use std::ops::Add; struct Point { x: f32, y: f32, } impl Add for Point { type Output = Point; fn add(self, rhs: Self) -> Self::Output { Point { x: self.x + rhs.x, y: self.y + rhs.y } } } fn main() { let a = Point { x: 1.0, y: 2.0 }; let b = Point { x: 3.0, y: 4.0 }; let c = a + b; println!("c.x = {}, c.y = {}", c.x, c.y); }
There's a full range of operators you can overload. You can also overload the +=
, /
, *
operators, and so on. This is very powerful for letting you express functions (rather than remembering to add x
and y
each time)---but it can be abused horribly if you decide that +
should mean "subtract" or something. Don't do that. Please.
Generics
Generics are very closely tied to traits. "Generics" are meta-programming: a way to write "generic" code that works for multiple types. Traits are a way to specify the requirements for a generic type.
The simplest generic is a function that takes a generic type. Who'se sick of typing to_string()
all the time? I am! You can write a generic function that accepts any type that implements ToString
---even &str
(bare strings) implement ToString
:
#![allow(unused)] fn main() { fn print_it<T: ToString>(x: T) { println!("{}", x.to_string()); } }
So now you can call print_it
with print_it("Hello")
, print_it(my_string)
or even print_it(42)
(because integers implement ToString
).
There's a second format for generics that's a bit longer but more readable when you start piling on the requirements:
#![allow(unused)] fn main() { fn print_it<T>(x: T) where T: ToString, { println!("{}", x.to_string()); } }
You can combine requirements with +
:
#![allow(unused)] fn main() { fn print_it<T>(x: T) where T: ToString + Debug, { println!("{:?}", x); println!("{}", x.to_string()); } }
You can have multiple generic types:
#![allow(unused)] fn main() { fn print_it<T, U>(x: T, y: U) where T: ToString + Debug, U: ToString + Debug, { println!("{:?}", x); println!("{}", x.to_string()); println!("{:?}", y); println!("{}", y.to_string()); } }
The generics system is almost a programming language in and of itself---you really can build most things with it.
Traits with Generics
See the
04_mem/trait_generic
project.
Some traits use generics in their implementation. The From
trait is particularly useful, so let's take a look at it:
#![allow(unused)] fn main() { struct Degrees(f32); struct Radians(f32); impl From<Radians> for Degrees { fn from(rad: Radians) -> Self { Degrees(rad.0 * 180.0 / std::f32::consts::PI) } } impl From<Degrees> for Radians { fn from(deg: Degrees) -> Self { Radians(deg.0 * std::f32::consts::PI / 180.0) } } }
Here we've defined a type for Degrees, and a type for Radians. Then we've implemented From
for each of them, allowing them to be converted from the other. This is a very common pattern in Rust. From
is also one of the few surprises in Rust
, because it also implements Into
for you. So you can use any of the following:
#![allow(unused)] fn main() { let behind_you = Degrees(180.0); let behind_you_radians = Radians::from(behind_you); let behind_you_radians2: Radians = Degrees(180.0).into(); }
You can even define a function that requires that an argument be convertible to a type:
#![allow(unused)] fn main() { fn sin(angle: impl Into<Radians>) -> f32 { let angle: Radians = angle.into(); angle.0.sin() } }
And you've just made it impossible to accidentally use degrees for a calculation that requires Radians. This is called a "new type" pattern, and it's a great way to add constraints to prevent bugs.
You can also make the sin
function with generics:
#![allow(unused)] fn main() { fn sin<T: Into<Radians>>(angle: T) -> f32 { let angle: Radians = angle.into(); angle.0.sin() } }
The impl
syntax is a bit newer, so you'll see the generic syntax more often.
Generics and Structs
You can make generic structs and enums, too. In fact, you've seen lots of generic enum
types already: Option<T>
, Result<T, E>
. You've seen plenty of generic structs, too: Vec<T>
, HashMap<K,V>
etc.
Let's build a useful example. How often have you wanted to add entries to a HashMap
, and instead of replacing whatever was there, you wanted to keep a list of all of the provided values that match a key.
The code for this is in
04_mem/hashmap_bucket
.
Let's start by defining the basic type:
#![allow(unused)] fn main() { use std::collections::HashMap; struct HashMapBucket<K,V> { map: HashMap<K, Vec<V>> } }
The type contains a HashMap
, each key (of type K
) referencing a vector of values (of type V
). Let's make a constructor:
#![allow(unused)] fn main() { impl <K,V> HashMapBucket<K,V> { fn new() -> Self { HashMapBucket { map: HashMap::new() } } } So far, so good. Let's add an `insert` function (inside the implementation block): ```rust fn insert(&mut self, key: K, value: V) { let values = self.map.entry(key).or_insert(Vec::new()); values.push(value); } }
Uh oh, that shows us an error. Fortunately, the error tells us exactly what to do---the key has to support Eq
(for comparison) and Hash
(for hashing). Let's add those requirements to the struct:
#![allow(unused)] fn main() { impl <K,V> HashMapBucket<K,V> where K: Eq + std::hash::Hash { fn new() -> Self { HashMapBucket { map: HashMap::new() } } fn insert(&mut self, key: K, value: V) { let values = self.map.entry(key).or_insert(Vec::new()); values.push(value); } } }
So now we can insert into the map and print the results:
fn main() { let mut my_buckets = HashMapBucket::new(); my_buckets.insert("hello", 1); my_buckets.insert("hello", 2); my_buckets.insert("goodbye", 3); println!("{:#?}", my_buckets.map); }
In 21 lines of code, you've implemented a type that can store multiple values for a single key. That's pretty cool. Generics are a little tricky to get used to, but they can really supercharge your productivity.
Amazing Complexity
If you look at the Bevy
game engine, or the Axum
webserver, you'll find the most mind-boggling combinations of generics and traits. It's not uncommon to see a type that looks like this:
Remember how in Axum you could do dependency injection by adding a layer containing a connection pool, and then every route could magically obtain one by supporting it as a parameter? That's generics and traits at work.
In both cases:
- A function accepts a type that meets certain criteria. Axum layers are cloneable, and can be sent between threads.
- The function stores the layers as a generic type.
- Routes are also generic, and parameters match against a generic+trait requirement. The route is then stored as a generic function pointer.
There's even code that handles <T1>
, <T1, T2>
and other lists of parameters (up to 16) with separate implementations to handle whatever you may have put in there!
It's beyond the scope of a foundations class to really dig into how that works---but you have the fundamentals.
Error Handling
Much of this section applies to both async and non-async code. Async code has a few extra considerations: you are probably managing large amounts of IO, and really don't want to stop the world when an error occurs!
Rust Error Handling
In previous examples, we've used unwrap()
or expect("my message")
to get the value out of a Result
. If an error occurred, your program (or thread) crashes. That's not great for production code!
Aside: Sometimes, crashing is the right thing to do. If you can't recover from an error, crashing is preferable to trying to continue and potentially corrupting data.
So what is a Result?
A Result
is an enum
, just like we covered in week 1. It's a "sum type"---it can be one of two things---and never both. A Result
is either Ok(T)
or Err(E)
. It's deliberately hard to ignore errors!
This differs from other languages:
Language | Description | Error Types |
---|---|---|
C | Errors are returned as a number, or even NULL. It's up to you to decipher what the library author meant. Convention indicates that returning <0 is an error, and >=0 is success. | int |
C++ | Exceptions, which are thrown and "bubble up the stack" until they are caught in a catch block. If an exception is uncaught, the program crashes. Exceptions can have performance problems. Many older C++ programs use the C style of returning an error code. Some newer C++ programs use std::expected and std::unexpected to make it easier to handle errors without exceptions. | std::exception , expected , int , anything you like! |
Java | Checked exceptions---which are like exceptions, but handling them is mandatory. Every function must declare what exceptions it can throw, and every caller must handle them. This is a great way to make sure you don't ignore errors, but it's also a great way to make sure you have a lot of boilerplate code. This can get a little silly, so you find yourself re-throwing exceptions to turn them into types you can handle. Java is also adding the Optional type to make it easier to handle errors without exceptions. | Exception , Optional |
Go | Functions can return both an error type and a value. The compiler won't let you forget to check for errors, but it's up to you to handle them. In-memory, you are often returning both the value and an empty error structure. | error |
Rust | Functions return an enum that is either Ok(T) or Err(E) . The compiler won't let you forget to check for errors, and it's up to you to handle them. Result is not an exception type, so it doesn't incur the overhead of throwing. You're always returning a value or an error, never both. | Result<T, E> |
So there's a wide range of ways to handle errors across the language spectrum. Rust's goal is to make it easy to work with errors, and hard to ignore them - without incurring the overhead of exceptions. However (there's always a however!), default standard-library Rust makes it harder than it should be.
Strongly Typed Errors: A Blessing and a Curse!
The code for this is in the
03_async/rust_errors1
directory.
Rust's errors are very specific, and can leave you with a lot of things to match. Let's look at a simple example:
use std::path::Path; fn main() { let my_file = Path::new("mytile.txt"); // This yields a Result type of String or an error let contents = std::fs::read_to_string(my_file); // Let's just handle the error by printing it out match contents { Ok(contents) => println!("File contents: {contents}"), Err(e) => println!("ERROR: {e:#?}"), } }
This prints out the details of the error:
ERROR: Os {
code: 2,
kind: NotFound,
message: "The system cannot find the file specified.",
}
That's great, but what if we want to do something different for different errors? We can match on the error type:
#![allow(unused)] fn main() { match contents { Ok(contents) => println!("File contents: {contents}"), Err(e) => match e.kind() { std::io::ErrorKind::NotFound => println!("File not found"), std::io::ErrorKind::PermissionDenied => println!("Permission denied"), _ => println!("ERROR: {e:#?}"), }, } }
The _
is there because otherwise you end up with a remarkably exhaustive list:
#![allow(unused)] fn main() { match contents { Ok(contents) => println!("File contents: {contents}"), Err(e) => match e.kind() { std::io::ErrorKind::NotFound => println!("File not found"), std::io::ErrorKind::PermissionDenied => println!("Permission denied"), std::io::ErrorKind::ConnectionRefused => todo!(), std::io::ErrorKind::ConnectionReset => todo!(), std::io::ErrorKind::ConnectionAborted => todo!(), std::io::ErrorKind::NotConnected => todo!(), std::io::ErrorKind::AddrInUse => todo!(), std::io::ErrorKind::AddrNotAvailable => todo!(), std::io::ErrorKind::BrokenPipe => todo!(), std::io::ErrorKind::AlreadyExists => todo!(), std::io::ErrorKind::WouldBlock => todo!(), std::io::ErrorKind::InvalidInput => todo!(), std::io::ErrorKind::InvalidData => todo!(), std::io::ErrorKind::TimedOut => todo!(), std::io::ErrorKind::WriteZero => todo!(), std::io::ErrorKind::Interrupted => todo!(), std::io::ErrorKind::Unsupported => todo!(), std::io::ErrorKind::UnexpectedEof => todo!(), std::io::ErrorKind::OutOfMemory => todo!(), std::io::ErrorKind::Other => todo!(), _ => todo!(), }, } }
Many of those errors aren't even relevant to opening a file! Worse, as the Rust standard library grows, more errors can appear---meaning a rustup update
run could break your program. That's not great! So when you are handling individual errors, you should always use the _
to catch any new errors that might be added in the future.
Pass-Through Errors
The code for this is in the
03_async/rust_errors2
directory.
If you are just wrapping some very simple functionality, you can make your function signature match the function you are wrapping:
use std::path::Path; fn maybe_read_a_file() -> Result<String, std::io::Error> { let my_file = Path::new("mytile.txt"); std::fs::read_to_string(my_file) } fn main() { match maybe_read_a_file() { Ok(text) => println!("File contents: {text}"), Err(e) => println!("An error occurred: {e:?}"), } }
No need to worry about re-throwing, you can just return the result of the function you are wrapping.
The ?
Operator
We mentioned earlier that Rust doesn't have exceptions. It does have the ability to pass errors up the call stack---but because they are handled explicitly in return
statements, they don't have the overhead of exceptions. This is done with the ?
operator.
Let's look at an example:
#![allow(unused)] fn main() { fn file_to_uppercase() -> Result<String, std::io::Error> { let contents = maybe_read_a_file()?; Ok(contents.to_uppercase()) } }
This calls our maybe_read_a_file
function and adds a ?
to the end. What does the ?
do?
- If the
Result
type isOk
, it extracts the wrapped value and returns it---in this case tocontents
. - If an error occurred, it returns the error to the caller.
This is great for function readability---you don't lose the "flow" of the function amidst a mass of error handling. It's also good for performance, and if you prefer the "top down" error handling approach it's nice and clean---the error gets passed up to the caller, and they can handle it.
What if I just want to ignore the error?
You must handle the error in some way. You can just call the function:
#![allow(unused)] fn main() { file_to_uppercase(); }
This will generate a compiler warning that there's a Result
type that must be used. You can silence the warning with an underscore:
#![allow(unused)] fn main() { let _ = file_to_uppercase(); }
_
is the placeholder symbol - you are telling Rust that you don't care. But you are explicitly not caring---you've told the compiler that ignoring the error is a conscious decision!
You can also use the if let
pattern and simply not add an error handler:
#![allow(unused)] fn main() { if let Ok(contents) = file_to_uppercase() { println!("File contents: {contents}"); } }
What About Different Errors?
The ?
operator is great, but it requires that the function support exactly the type of error that you are passing upwards. Otherwise, in a strong-typed language you won't be able to ensure that errors are being handled.
Let's take an example that draws a bit from our code on day 1.
The code for this is in the
03_async/rust_errors3
directory.
Let's add Serde and Serde_JSON to our project:
cargo add serde -F derive
cargo add serde_json
And we'll quickly define a deserializable struct:
#![allow(unused)] fn main() { use std::path::Path; use serde::Deserialize; #[derive(Deserialize)] struct User { name: String, password: String, } fn load_users() { let my_file = Path::new("users.json"); let raw_text = std::fs::read_to_string(my_file)?; let users: Vec<User> = serde_json::from_str(&raw_text)?; Ok(users) } }
This isn't going to compile yet, because we aren't returning a type from the function. So we add a Result
:
#![allow(unused)] fn main() { fn load_users() -> Result<Vec<User>, Error> { }
Oh no! What do we put for Error
? We have a problem! read_to_string
returns an std::io::Error
type, and serde_json::from_str
returns a serde_json::Error
type. We can't return both!
Boxing Errors
There's a lot of typing for a generic error type, but it works:
#![allow(unused)] fn main() { type GenericResult<T> = std::result::Result<T, Box<dyn std::error::Error>>; fn load_users() -> GenericResult<Vec<User>> { let my_file = Path::new("users.json"); let raw_text = std::fs::read_to_string(my_file)?; let users: Vec<User> = serde_json::from_str(&raw_text)?; Ok(users) } }
This works with every possible type of error. Let's add a main
function and see what happens:
fn main() { let users = load_users(); match users { Ok(users) => { for user in users { println!("User: {}, {}", user.name, user.password); } }, Err(err) => { println!("Error: {err}"); } } }
The result prints:
Error: The system cannot find the file specified. (os error 2)
You have the exact error message, but you really don't have any way to tell what went wrong programmatically. That may be ok for a simple program.
Easy Boxing with Anyhow
There's a crate named anyhow
that makes it easy to box errors. Let's add it to our project:
cargo add anyhow
Then you can replace the Box
definition with anyhow::Error
:
#![allow(unused)] fn main() { fn anyhow_load_users() -> anyhow::Result<Vec<User>> { let my_file = Path::new("users.json"); let raw_text = std::fs::read_to_string(my_file)?; let users: Vec<User> = serde_json::from_str(&raw_text)?; Ok(users) } }
It still functions the same way:
Error: The system cannot find the file specified. (os error 2)
In fact, anyhow
is mostly just a convenience wrapper around Box
and dyn
. But it's a very convenient wrapper!
Anyhow does make it a little easier to return your own error:
#![allow(unused)] fn main() { #[allow(dead_code)] fn anyhow_load_users2() -> anyhow::Result<Vec<User>> { let my_file = Path::new("users.json"); let raw_text = std::fs::read_to_string(my_file)?; let users: Vec<User> = serde_json::from_str(&raw_text)?; if users.is_empty() { anyhow::bail!("No users found"); } if users.len() > 10 { return Err(anyhow::Error::msg("Too many users")); } Ok(users) } }
I've included the short-way and the long-way - they do the same thing. bail!
is a handy macro for "error out with this message". If you miss Go-like "send any error you like", anyhow
has your back!
As a rule of thumb:
anyhow
is great in client code, or code where you don't really care what went wrong---you care that an error occurred and should be reported.
Touring the Rust Ecosystem
Rust and C++ Tooling Equivalencies
This is a cheat sheet for you to refer to later.
Using Cargo
The cargo
command is a swiss-army knife that handles building projects, testing them, controlling dependencies and more. It is extensible,
you can add more features to it and use it to install programs.
Cargo Command | C++ Equivalent | Purpose |
---|---|---|
Package Commands | ||
cargo init | ||
Compilation | ||
cargo build | make | Builds your project, placing the output in the target directory. |
cargo run | make ; ./my_program | Runs cargo build , and then runs the resulting executable. |
cargo check | Build only the source, and skip assembly and linking for a quick check of syntax. | |
cargo clean | make clean | Removes all build artefacts and empties the target directory. |
cargo rustc | Pass extra rustc commands to the build process | |
Formatting | ||
cargo fmt | Formats your source code according to the Rust defaults. | |
Testing | ||
cargo test | make test | Executes all unit tests in the current project |
cargo bench | Executes all benchmarks in the current project. | |
Linting | ||
cargo clippy | Runs the Clippy linter | |
cargo fix | Applies all Clippy suggestions | |
Documentation | ||
cargo doc | Builds a documentation website from the current project's sourcecode. | |
cargo rustdoc | Run the documentation builder with extra command options. | |
Dependencies | ||
cargo fetch | Downloads all dependencies listed in Cargo.toml from the Internet. | |
cargo add | Add a dependency to the current project's Cargo.toml | |
cargo remove | Remove a dependency from the current project's Cargo.toml file | |
cargo update | Update dependencies to the latest version in Cargo.toml | |
cargo tree | Draw a tree displaying all dependencies, and each dependency's dependencies | |
cargo vendor | Download all dependencies, and provide instructions to modify your Cargo.toml to use the downloaded dependencies. |
Unit Tests
You saw an example unit test when you created a library. Rust/Cargo has a built-in unit testing system. Let's explore it a bit.
Let's build a very simple example, and examine how it works:
The code for this is in
projects/part2/unit_test
#![allow(unused)] fn main() { fn double(n: i32) -> i32 { n * 2 } #[cfg(test)] // Conditional compilation: only build in `test` mode mod test { // Create a module to hold the tests use super::*; // Include everything from the parent module/namespace #[test] // This is a test, we want to include in our unit test runs fn two_times() { assert_eq!(4, double(2)); // Assert that 2*2 = 4 assert!(5 != double(2)); // Assert that it doesn't equal 5 } } }
You can run tests for the current project with cargo test
. You can append --all
to include all projects in the current workspace.
We'll talk about more complicated tests later.
Benchmarking
Cargo has built-in benchmarking, but using it requires the nightly unstable code channel. I generally don't recommend relying on nightly code! If you are writing performance-critical code, benchmarking is essential. Fortunately, Rust makes it relatively straightforward to include benchmarks with a bit of boilerplate.
Quick and Dirty Benchmarks
This example is in
project/simple_bench
A quick and dirty way to benchmark operations is to use Instant
and Duration
:
use std::time::Instant; fn main() { let now = Instant::now(); let mut i = 0; for j in 0 .. 1_000 { i += j*j; } let elapsed = now.elapsed(); println!("Time elapsed: {} nanos", elapsed.as_nanos()); println!("{i}"); }
Criterion
This project is in
projects/part2/criterion_bench
In Cargo.toml
, add:
[dev-dependencies]
criterion = { version = "0.4", features = [ "html_reports" ] }
[[bench]]
name = "my_benchmark"
harness = false
[dev-dependencies]
is new! This is a dependency that is only loaded by development tools, and isn't integrated into your final program. No space is wasted.
Create <project>/benches/my_benchmark.rs
:
#![allow(unused)] fn main() { use criterion::{black_box, criterion_group, criterion_main, Criterion}; fn fibonacci(n: u64) -> u64 { match n { 0 => 1, 1 => 1, n => fibonacci(n-1) + fibonacci(n-2), } } fn criterion_benchmark(c: &mut Criterion) { c.bench_function("fib 20", |b| b.iter(|| fibonacci(black_box(20)))); } criterion_group!(benches, criterion_benchmark); criterion_main!(benches); }
Run cargo bench
and see the result.
Go to target/criterion
and you have a full HTML report with statistics.
Flamegraphs
It pretty much requires Linux (and the perf
infrastructure), but it's worth looking at Cargo Flamegraphs if you are developing on that platform. It's an easy wrapper around perf
for generating flamegraphs to find your hotspots.
FFI: Linking Rust and C or C++
Rust behaves very well when talking to other languages---both as a library for other languages to consume, and as a consumer of other languages' libraries.
We'll refer to "C Libraries"---but we really mean any language that compiles to a C-friendly library format. C, C++, Go, Fortran, Haskell, and many others can all be consumed by Rust.
Consuming C Libraries
The code for this is in
04_mem/c_rust
(C Rust)
Let's start with a tiny C library:
// A simple function that doubles a number
int double_it(int x) {
return x * 2;
}
We'd like to compile this and include it in a Rust program. We can automate compilation by including the ability to compile C (and C++) libraries as part of our build process with the cc
crate. Rather than adding it with cargo add
, we want to add it as a build dependency. It won't be included in the final program, it's just used during compilation. Open Cargo.toml
:
[package]
name = "c_rust"
version = "0.1.0"
edition = "2021"
[dependencies]
[build-dependencies]
cc = "1"
Now we can create a build.rs
file in the root of our project (not the src
directory). This file will be run as part of the build process, and can be used to compile C libraries. We'll use the cc
crate to do this:
fn main() { cc::Build::new() .file("src/crust.c") .compile("crust"); }
build.rs
is automatically compiled and executed when your Rust program builds. You can use it to automate any build-time tasks you want. The cc
calls will build the listed files and include the linked result in your final program as a static library.
Lastly, let's create some Rust to call the C:
#![allow(unused)] fn main() { // Do it by hand extern "C" { fn double_it(x: i32) -> i32; } mod rust { pub fn double_it(x: i32) -> i32 { x * 2 } } }
We've used an extern "C"
to specify linkage to an external C library. We've also created a Rust version of the same function, so we can compare the two.
Now let's use some unit tests to prove that it works:
#![allow(unused)] fn main() { #[cfg(test)] mod test { use super::*; #[test] fn test_double_it() { assert_eq!(unsafe { double_it(2) }, 4); } #[test] fn test_c_rust() { assert_eq!(unsafe { double_it(2) }, rust::double_it(2)); } } }
And it works when we run cargo test
.
Header files and BindGen
You need LLVM installed (clang 5 or greater) to use this. On Windows,
winget install LLVM.LLVM
will work. Also set an environment variableLIBCLANG_PATH
to the location of the Clang install. On Windows,$Env:LIBCLANG_PATH="C:\Program Files\LLVM\bin"
Larger C examples will include header files. Let's add crust.h
:
int double_it(int x);
And add C to require it:
#include "crust.h"
// A simple function that doubles a number
int double_it(int x) {
return x * 2;
}
We can add it to the build.rs
file, but it will be ignored (it's just a forward declaration).
Writing the extern "C"
for a large library could be time consuming. Let's use bindgen
to do it for us.
Add another build-dependency:
[build-dependencies]
cc = "1"
bindgen = "0"
Now in build.rs
we'll add some calls to use it:
#![allow(unused)] fn main() { let bindings = bindgen::Builder::default() .header("src/crust.h") .parse_callbacks(Box::new(bindgen::CargoCallbacks)) .generate() .expect("Unable to generate bindings"); // Write the bindings to the $OUT_DIR/bindings.rs file. let out_path = PathBuf::from(env::var("OUT_DIR").unwrap()); bindings .write_to_file(out_path.join("bindings.rs")) .expect("Couldn't write bindings!"); }
This is pretty much standard boilerplate, but there are a lot of options available.
Now run cargo build
. You'll see a new file in target/debug/build/c_rust-*/out/bindings.rs
. This is the automatically generated bindings file. Let's use it:
#![allow(unused)] fn main() { include!(concat!(env!("OUT_DIR"), "/bindings.rs")); }
Your compile time has suffered, but now the header is parsed and Rust bindings are generated automatically. The unit tests should still work.
Calling Rust from Other Languages
The code for this is in
04_mem/rust_c
(Rust C)
You can also setup Rust functions and structures for export via a C API. You lose some of the richness of the Rust language ---everything has to be C compatible---but you can still use Rust's safety and performance.
Start with some Cargo.toml
entries:
[package]
name = "rust_c"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["staticlib"]
[dependencies]
libc = "0.2"
Providing a lib
and crate-type
section lets you change compilation behavior. We're instructing Rust to build a C-compatible static library (it can also take a dynlib
for dynamic linkage).
Next, we'll build a single Rust function to export:
#![allow(unused)] fn main() { use std::ffi::CStr; /// # Safety /// Use a valid C-String! #[no_mangle] pub unsafe extern "C" fn hello(name: *const libc::c_char) { let name_cstr = unsafe { CStr::from_ptr(name) }; let name = name_cstr.to_str().unwrap(); println!("Hello {name}"); } }
Notice that we're using c_char
as an array---just like the C ABI. CStr
and CString
provide Rust friendly layers between string types, allowing you to convert back and forth. C strings will never be as safe as Rust strings, but this is a good compromise.
We've turned off name mangling, making it easy for the linker to find the function.
The function is also "unsafe"---because it receives an unsafe C string type.
Build the project with cargo build
, and you'll see that target/debug/rust_c.lib
(on Windows, .a
on Linux) has been created. This is the static library that we can link to from C.
Linkage via C requires a header file. In this case, it's pretty easy to just write one:
void hello(char *name);
You can now use this in C or another language. In Go, it looks like this:
package main
/*
#cgo LDFLAGS: ./rust_c.a -ldl
#include "./lib/rust_c.h"
*/
import "C"
import "fmt"
import "time"
func main() {
start := time.Now()
fmt.Println("Hello from GoLang!")
duration := time.Since(start)
fmt.Println(duration)
start2 := time.Now()
C.hello(C.CString("from Rust!"))
duration2 := time.Since(start2)
fmt.Println(duration2)
}
(There's a few microseconds delay in the Rust call, but it's pretty fast! Marshaling the C string in Go is the slowest part).
Using CBindGen to Write the Header For You
Setup cbindgen
as a build dependency:
[package]
name = "rust_c"
version = "0.1.0"
edition = "2021"
[lib]
crate-type = ["staticlib"]
[dependencies]
libc = "0.2"
[build-dependencies]
cbindgen = "0.24"
And once again, add a build.rs
file:
use std::env; use std::path::PathBuf; use cbindgen::Config; fn main() { let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap(); let package_name = env::var("CARGO_PKG_NAME").unwrap(); let output_file = target_dir() .join(format!("{}.hpp", package_name)) .display() .to_string(); let config = Config { //namespace: Some(String::from("ffi")), ..Default::default() }; cbindgen::generate_with_config(&crate_dir, config) .unwrap() .write_to_file(&output_file); } /// Find the location of the `target/` directory. Note that this may be /// overridden by `cmake`, so we also need to check the `CARGO_TARGET_DIR` /// variable. fn target_dir() -> PathBuf { if let Ok(target) = env::var("CARGO_TARGET_DIR") { PathBuf::from(target) } else { PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap()).join("target") } }
This is boilerplate from this guide
Now run cargo build
and a target
directory appears - with a header file.