Introduction

Hello World

Let's get started by comparing a C, C++ and Rust Hello, World! program.

Hello World in C

Here's a hopefully familiar looking C version of "hello, world":

#include <stdio.h>
int main() {
   printf("Hello, World!\n");
   return 0;
}

Compilation

You can compile it with:

gcc -g -o hello hello_world.c

Or

clang -g -o hello hello_world.c

Or you can build a Makefile:

CC      = gcc
CFLAGS  = -g
RM      = rm -f

default: all

all: hello

hello: hello_world.c
	$(CC) $(CFLAGS) -o hello hello_world.c

clean veryclean:
	$(RM) hello

And type make to execute it.

All of these produce a binary, hello - which prints "Hello, World!".

Takeaways

  • The source code is short and to the point.
  • You include stdio.h to pull in printf. This is a "copy paste"---the contents of stdio.h are included directly in your compilation.
  • You either need to create a platform specific build script (specifying your compiler), or use a tool like configure (or CMake---which we'll talk about in C++).
  • Compilation is really fast.
  • It's not specified anywhere, but you are depending upon your platform's C standard library (libc, glibc, etc.). Your program is dynamically linked with it. You need to have libc installed to run your program.

Hello World in C++

A "pure" C++ version (C++20; C++23 is adding std::print()) looks like this:

#include <iostream>

int main() {
    std::cout << "Hello, World!" << std::endl;
    return 0;
}

It's worth noting that a lot of C++ projects use printf anyway.

Compilation

You can also build this from the command-line:

g++ -g -o hello hello.cpp

Or with Clang:

clang++ -g -o hello hello.cpp

With CMake

You can build a Makefile, but a lot of modern C++ projects now use CMake. Here's the CMakeLists.txt file:

cmake_minimum_required(VERSION 3.5)
project(HelloWorld)

add_executable(hello hello.cpp)

Your build process then becomes:

mkdir build
cd build
cmake ..
make

As long as you have CMake and a build system it recognizes (Make, Ninja, etc.) it will build your project.

Takeaways

All of these produce a binary, hello - which prints "Hello, World!".

  • You are including <iostream>. No .h required. It's still a copy/paste into your compiled source. (If you are lucky enough to have modules, that's not quite as true now).
  • You are using C++'s "streams" cout to stream data to the console.
  • You are dynamically linking to the C++ standard library, it just isn't stated.
  • CMake has made it easier to determine what compilers and build systems to use---but it's still not automatic.
  • Compilation is still pretty fast.

Hello World in Rust

Here's "Hello World" in Rust:

fn main() {
    println!("Hello, World!");
}

You can actually run that from the Playground. To run it locally, you actually need a full Rust project. The easy way to make this project is to open a terminal and type:

cargo new hello_world

By default, this even auto-generates the body of the "hello, world" program!

Cargo is Rust's swiss-army knife tool. It's a build system, a dependency manager, wraps a linter, can run unit tests and benchmarks --- it's also extendable. In this case, we're asking Cargo to make us a new project named hello_world.

Cargo creates the following structure:

hello_world/
hello_world/src/
hello_world/src/main.rs
hello_world/Cargo.toml
hello_world/.git
hello_world/.gitignore

If you don't want to create a Git project, add the flags --vcs none. If you are already inside a git project, it won't try and nest another one inside.

The Cargo.toml file represents the project manifest---outlining project metadata. It looks like this:

[package]
name = "hello_world"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
  • The package section represents meta-data about the "crate" itself (Rust packages things into crates, which are handled by Cargo).
    • name specifies the output name, and also the name by which the project is referenced if other projects link to it.
    • version is a semantic (major.minor.patch) version number.
    • edition specifies which editition of the Rust language the project uses. Every 3-4 years, a new edition is released. Previous editions continue to compile, and edition versions are the opportunity for the Rust Core teams to make changes to the base language. Things can/will be deprecated (and eventually removed), added, and syntax may change.
  • dependencies can be ignored for now. If you depend on other packages, they are listed here. We'll talk about that later.

So how do we compile our program?

cd hello_world
cargo build

The executable is now in hello_world/target/debug/hello_world (with a .exe on the end if you are in Windows).

Takeaways

  • You didn't include anything! Well you did, but by default parts of the Rust standard library are automatically included in the current scope.
  • Your binary was statically linked. The entire Rust standard library is in there! (That's why it's 4.3 Mb in a debug build on Linux, as opposed to 17k for C++ and C. It's quite large - 2-3 Mb - if you statically link the C or C++, too).
  • You didn't need any tools outside of the Rust ecosystem. Cargo took care of it all, and installing with rustup gave you the entire package.
  • Compilation was decently fast---you'd hope so for Hello World.

Let's talk a bit about syntax.

Rust Syntax

fn main() {
    println!("Hello, World!");
}

I often argue that "Hello, World" is the worst Rust program to start with. println! is a macro, and doesn't look much like normal Rust code. (In fairness, std::cout << "Hello, World!" << std::endl; isn't very normal, either).

println! is a macro. Macros in Rust have a ! at the end because they might be surprising: they are extensions to Rust's normal syntax.

So bear that in mind as we move forward.

Touring the Rust Langauge

Primitive Types

C is notoriously a little vague about type names:

#include <stdio.h>

int main() {
    char c = 'a';
    unsigned char uc = 12;
    int i = 123;
    unsigned int ui = 123;
    short s = 123;
    unsigned short us = 123;
    long l = 123;
    unsigned long ul = 123;

    printf("%d    %ld\n", c, sizeof(c));
    printf("%d    %ld\n", uc, sizeof(uc));
    printf("%d    %ld\n", i, sizeof(i));
    printf("%d    %ld\n", ui, sizeof(ui));
    printf("%d    %ld\n", s, sizeof(s));
    printf("%d    %ld\n", us, sizeof(us));
    printf("%ld    %ld\n", l, sizeof(l));
    printf("%ld    %ld\n", ul, sizeof(ul));
    return 0;
}

The output will actually vary by platform. On my 64-bit Linux system I get:

97    1
12    1
123    4
123    4
123    2
123    2
123    8
123    8

Many C programmers prefer to be a bit more specific and use specifically sized types instead:

#include <stdio.h>
#include <stdint.h>

int main() {
    int8_t c = 'a';
    uint8_t uc = 12;
    int32_t i = 123;
    uint32_t ui = 123;
    int16_t s = 123;
    uint16_t us = 123;
    int64_t l = 123;
    uint64_t ul = 123;

    printf("%d    %ld\n", c, sizeof(c));
    printf("%d    %ld\n", uc, sizeof(uc));
    printf("%d    %ld\n", i, sizeof(i));
    printf("%d    %ld\n", ui, sizeof(ui));
    printf("%d    %ld\n", s, sizeof(s));
    printf("%d    %ld\n", us, sizeof(us));
    printf("%ld    %ld\n", l, sizeof(l));
    printf("%ld    %ld\n", ul, sizeof(ul));
    return 0;
}

That's a bit more specific. If you're writing cross-platform code, it's often helpful to know exactly how large a variable is!

Additionally, size_t defines the "integral type"---an integer that matches the size of your platform. So on 64-bit Linux, size_t is 64-bits. On 32-bit Linux, it is 32-bits.

Rust Primitive Types

Rust only defines the explicitly sized types (and usize/isize for the integral type):

use std::mem::size_of;

fn main() {
    let c: i8 = 97;
    let uc: u8 = 10;
    let i: i32 = 123;
    let ui: u32 = 123;
    let s: i16 = 123;
    let us: u16 = 123;
    let l: i64 = 123;
    let ul: u64 = 123;
    let vl: i128 = 123;
    let uvl: u128 = 123;
    let is: isize = 123;
    let us: usize = 123;
    let f: f32 = 123.4;
    let d: f64 = 123.4;

    println!("{c}, {}", size_of::<i8>());
    println!("{uc}, {}", size_of::<u8>());
    println!("{i}, {}", size_of::<i32>());
    println!("{ui}, {}", size_of::<u32>());
    println!("{s}, {}", size_of::<i16>());
    println!("{us}, {}", size_of::<u16>());
    println!("{l}, {}", size_of::<i64>());
    println!("{ul}, {}", size_of::<u64>());
    println!("{vl}, {}", size_of::<i128>());
    println!("{uvl}, {}", size_of::<u128>());
    println!("{is}, {}", size_of::<isize>());
    println!("{us}, {}", size_of::<usize>());
    println!("{f}, {}", size_of::<f32>());
    println!("{d}, {}", size_of::<f64>());
}

We'll talk about the use in more detail later. You are importing a single function from the std::mem namespace. There's no copy/paste. You could also type std::mem::size_of every time.

Some takeaways:

  • Each type is explicitly defined as i<size> or u<size>.
  • Bytes (u8/i8) are not chars! We'll talk about char in a bit. They are special!
  • println! can take variable names (but not complex expressions) in {name}, or you can use {} as a placeholder and fill in the blanks as a parameter. println! defies Rust's normal syntax, which is why its a macro!

Auto

C++ introduced auto. So you can do:

auto i = 5;

That makes the compiler figure out what i is, based on its usages (or a default). Rust las the same thing, you don't have to specify a type at each declaration. This is perfectly valid (and easier to read/type):

fn main() {
    let n = 123;
}

Rust also supports suffixes for specifying type:

fn main() {
    let i = 123i32;
    let f = 123.4f32;
    let d = 123.4f64;
}

Mutability

Coming from C and C++, the following is quite normal:

#include <stdio.h>
int main() {
   int i = 5;
   i += 1;
   printf("%d\n", i);
   return 0;
}

However, this doesn't work in Rust (there is no ++ operator):

fn main() {
    let i = 5;
    i += 1;
    println!("{i}");
}

You get the compiler error cannot assign twice to immutable variable i. That's because let creates an immutable variable. It's the same as typing const int i = 5; in C or C++.

Making a Mutable Variable

To mark a variable as mutable, you have to explicitly declare it as such:

fn main() {
    let mut i = 5;
    i += 1;
    println!("{i}");
}

So why wouldn't you define everything as mutable?

  • Your program gains some clarity from the knowledge that a variable won't change.
  • Functional-style programming tends to prefer not reusing variables.
  • If you accidentally mutate a variable later, the compiler will stop you.

You can make everything let mut, and then use the linter (cargo clippy or your IDE) to highlight the variables that don't need it---but that's a crutch and should be avoided as you gain experience.

Shadowing

You can also make use of shadowing. This is popular in many functional styles. It can also be confusing. I recommend adopting a style that suits you.

Take the following immutable code:

fn main() {
    let i = 5;
    let i_plus_one = i + 1;
    println!("{i}");
}

Your variables are immutable, and you are making it clear what's going on in your algorithm by naming each subsequent step. That's great until you get to a big algorithm and start running into i_log10_times_3... so you'll often find that "shadowing" is used to remove previous editions of a variable name from circulation as the calculation progresses:

fn main() {
    let i = 5;
    let i = i + 1;
    println!("{i}");
}

Shadowing is useful with scope (which we'll talk about in a moment). Within a scope, you can shadow a parent-scope's variable names---and get them back at the end of the scope. For example:

fn main() {
    let i = 5;
    {
        let i = i + 1;
        println!("{i}");
    }
    println!("{i}");
}

Primitive Type Conversion

Take the following C program:

#include <stdio.h>

int main() {
   int i = 500;
   char c = i;
   printf("%d\n", c);
   return 0;
}

It compiles with no warnings, and outputs... -12. C lets you implicitly convert between types---even when doing so loses data.

An equivalent Rust program won't compile:

fn main() {
    let i: i32 = 500;
    let c: i8 = i;
    println!("{c}");
}

Just in case you thought Rust was just protecting you against an overflow, this won't compile either:

fn main() {
    let i: i32 = 500;
    let j: i64 = i;
    println!("{j}");
}

Rust is really explicit about types and type conversion. You can almost never implicitly convert types. That's a good thing for avoiding bugs: it requires that you acknowledge that there is a type mismatch and explicitly handle it.

Brute Force Conversion with as

The lowest level (and most dangerous) form of conversion in Rust is the as keyword. You can tell Rust that you accept that a conversion is potentially dangerous, and to do it anyway:

fn main() {
    let i: i32 = 500;
    let c: i8 = i as i8;
    println!("{c}");
}

This also prints out -12---so you have bug compatibility! You also generally don't want to do this unless you are absolutely, positively sure that doing so is safe.

It's always safe to up-convert---you can be sure that the larger type of the same size will be able to hold your data:

fn main() {
    let i: i32 = 500;
    let i: i64 = i as i64;
    println!("{i}");
}

Mixing signed and unsigned, or converting to a smaller type is potentially dangerous. (There's been regular discussion in the Rust world about whether as should sometimes be unsafe).

Safe Conversion with into

Safe conversions between primitives are implemented with the into() function (itself part of the Into trait---traits are a much later topic). The compiler error messages earlier even suggested using it. Converting with into is simple:

fn main() {
    let i: i32 = 500;
    let i: i64 = i.into();
    println!("{i}");
}

into isn't implemented for the potentially unsafe conversions. This won't compile:

fn main() {
    let i: i64 = 500;
    let i: i32 = i.into();
    println!("{i}");
}

Fallible Conversion with try_into

Some conversions are possible, but may or may not work. This will work:

use std::convert::TryInto;

fn main() {
    let i: i64 = 500;
    let i: i32 = i.try_into().unwrap();
    println!("{i}");
}

And this will compile but crash at runtime:

use std::convert::TryInto;

fn main() {
    let i: i64 = 2_147_483_648;
    let i: i32 = i.try_into().unwrap();
    println!("{i}");
}

So what's going on with the unwrap? try_into returns a Result type. We'll talk a lot about how they work internally later. A Result is a Rust enumeration (which are a lot like tagged unions in C or C++) that either contains Ok(..) or Err(..) - where the .. are a generic type. unwrap() says "give me the value of Ok(x), or crash if it wasn't ok".

Obviously, crashing isn't a great choice---but we'll leave it there for now. Crashing is a better choice than corrupting your company's data because of a type conversion!

Numeric Overflow

Let's take a very simple C program:

#include <stdio.h>

int main() {
   char a = 127;
   a = a + 1;
   printf("a = %d\n", a);
   return 0;
}

If you haven't been programming for a while, you may be surprised that the output is -127. Your 8-bit signed integer (char) can only hold -128 through 127, in binary two's complement. A binary addition of 1 to 127 gives 10000000. Since the first digit in two's complemenet binary represents the sign bit---you get -127.

Rust's behavior for this program varies by how you compiled your program. In default debug mode:

fn main() {
    let mut a: i8 = 127;
    a += 1;
    println!("{a}");
}

Crashes the program with attempt to add with overflow. (If you compile in release mode with cargo run --release), it prints the wrapped number.

Always test your builds in `debug`` mode!

Explicitly Handling Wrapping

In C, you can detect this overflow with some additional code:

#include <stdio.h>
#include <limits.h>

int main() {
   char a = 127;
   char add = 1;
   if (a > 0 && add > 0 && a > CHAR_MAX - add) {
      printf("Overflow detected\n");
      return 1;
   }
   a = a + add;
   printf("a = %d\n", a);
   return 0;
}

(You may also want to check for underflow)

Rust includes checked_ arithmetic for this purpose:

fn main() {
    let a: i8 = 127;
    let a = a.checked_add(1);
    println!("{a:?}");
}

This prints None. That's odd! checked_add returns an Option type, which is fundamentally Rust's alternative to null/nullptr. Just like a Result, an Option is a sum type that can either be None, or Some(x).

Notice that I snuck in :? in the print. This is "debug printing", and prints the contents of complicated types if they implement the appropriate trait.

You can also unwrap options:

fn main() {
    let a: i8 = 127;
    let a = a.checked_add(1).unwrap();
    println!("{a}");
}

But I Want to Wrap!

Sometimes, wrapping is the desired behavior. It's used a lot in cryptographic functions, for example. Rust lets you opt in to the wrapping behavior:

fn main() {
    let a: i8 = 127;
    let a = a.wrapping_add(1);
    println!("{a}");
}

This won't crash on debug or release builds: you've explicitly told Rust (and whomever reads your code later) that wrapping was the intended behavior, and not a bug.

Saturating

Maybe you'd rather saturate at the maximum possible value?

fn main() {
    let a: i8 = 127;
    let a = a.saturating_add(1);
    println!("{a}");
}

This prints 127.

Other Operations

Checked, saturating and wrapping variants of addition, subtraction, multiplication and division are all provided (division checking checks for divide by zero).

If you are sensing a theme, it's that Rust picks safe by default when possible---and gives you the chance to opt out. C and C++ tend to assume you know what you're doing, and offer the option of adding safety checks.

Control Flow

Rust offers similar control-flow options to C and C++.

If Statements

In C or C++, you are probably used to:

if (i == 5) {
    // Do something
} else {
    // Do something else
}

The Rust syntax is almost identical:

fn main() {
    let i = 6;
    if i == 5 {
        // Do something
        println!("5");
    } else {
        // Do something else
        println!("Other");
    }
}

Switch Statements

You're probably also used to:

int i = 5;
switch (i) {
    case 5: printf("5\n"); break;
    case 6: printf("6\n"); break;
    default: printf("Something else\n"); break; 
}

Rust's equivalent is called match, and is a little different:

fn main() {
    let i = 5;
    match i {
        5 => println!("5"),
        6 => println!("6"),
        _ => println!("Something else"),
    }
}

match can do a lot more than that, but let's focus on what we have here:

  • There's no break;---matched cases do not fall through.
  • The syntax is different: (case) => (expression).
  • default is replaced with _ --- which is Rust's general "something else" symbol.

If you need multiple lines, it's similar to C also:

fn main() {
    let i = 5;
    match i {
        5 => {
            println!("5");
            println!("5 is a good number.");
        }
        6 => println!("6"),
        _ => println!("Something else"),
    }
}

There's also a special "one case match" called "if let", but we're going to worry about that later.

Loops

Looping through data is pretty fundamental, so it shouldn't be a surprise that Rust supports loops.

For Loops

Take the following C code:

#include <stdio.h>

int main() {
    int i;
    for (i = 0; i < 10; i++) {
        printf("%d\n", i);
    }
    return 0;
}

Unsurprisingly: this prints 0 through 9.

Here's a Rust equivalent:

fn main() {
    for i in 0..10 {
        println!("{i}");
    }
}

The output is the same, but the syntax is quite different:

  • 0..10 is an exclusive range. It provides an iterator over every number in the range, exclusing the last one. We'll worry about iterators later.
  • i only exists inside the loop scope. (In C++ and later C editions you can do for (int i=0; i<10; i++) for the same effect).
  • You don't have any control over the operation that occurs for each iteration. Rust just ticks through each entry in the range.

If you prefer an inclusive range:

fn main() {
    for i in 0 ..= 10 {
        println!("{i}");
    }
}

We'll look at for_each equivalency later.

While Loops

This C should look familiar, too:

#include <stdio.h>

int main() {
    int i = 0;
    while (i < 10) {
        printf("%d\n", i);
        i += 1;
    }
    return 0;
}

Equivalent Rust code looks like this:

fn main() {
    let mut i = 0;
    while i < 10 {
        println!("{i}");
        i += 1;
    }
}

Sadly, Rust doesn't protect you from an infinite while loop either!

loop loops

Rust adds one more type of loop, named loop. loop runs forever, or until a break statement is hit.

fn main() {
    let mut i = 0;
    loop {
        println!("{i}");
        i += 1;
        if i > 9 {
            break;
        }
    }
}

Strings

Strings are an area of significant difference between C, C+ and Rust. None of them really agree on how strings really work.

The Basic In-Memory String

Let's start with some C (that also works in C++):

#include <stdio.h>

int main() {
    const char * my_string = "Hello, World";
    printf("%s\n", my_string);
    return 0;
}
  • This prints "Hello, World!".
  • You are storing my_string as a const char *. It's set aside as an area of memory, containing 8-bit ASCII for each character---and a zero at the end.

Here's a Rust equivalent:

fn main() {
    let my_string = "Hello, World";
    println!("{my_string}");
}

Or if you want to use a constant, which always explicitly states the type:

fn main() {
    const MY_STRING: &str = "Hello, World";
    println!("{MY_STRING}");
}

What's up with &str? str is a type that means "a string of characters in memory". Unlike C, it isn't suffixed with a zero. It is prefixed with the string length.

Let's Throw in Some Unicode!

#include <stdio.h>

int main() {
    const char * my_string = "Hello, 🌎";
    printf("%s\n", my_string);
    return 0;
}

On reasonably recent GCC, this works. The compiler converts 🌎 to the appropriate UTF-8 - a series of bytes.

Rust works the same way:

fn main() {
    const MY_STRING: &str = "Hello, 🌎";
    println!("{MY_STRING}");
}

The only difference being that Rust's char type is explicitly UTF-8, not ASCII. When you operate on a collection of char types, they may range from 1 to 8 bytes! That makes handling control-points easier, but also means that strings aren't plain old 8-bit integers anymore.

How about std::string in C++?

Many C++ programmers have moved towards using std::string---it's generally easier to work with, and less prone to foot-guns.

#include <string>
#include <iostream>

int main() {
    std::string my_string = std::string("Hello, World!");
    std::cout << my_string << std::endl;
    return 0;
}

This also prints Hello, World!. Nothing too revolutionary there.

String Concatenation

In C, you might combine two strings into a new string as follows:

#include <stdio.h>
#include <string.h>

int main() {
    char buffer[64] = "Hello ";
    const char * string2 = "World";

    strcat(buffer, string2);
    printf("%s", buffer);
    return 0;
}

Make buffer too small and you are looking at a segmentation fault - or worse!

In C++, you can add some safety and do this:

#include <string>
#include <iostream>

int main() {
    std::string my_string = std::string("Hello ");
    std::string buffer = my_string + std::string("World");
    std::cout << buffer << std::endl;
    return 0;
}

No segmentation faults here!

Here's a Rust equivalent:

fn main() {
    let mut buffer = String::from("Hello ");
    buffer += "World";
    println!("{buffer}");
}

Two Types of String

Just like C++, Rust has two string types (and a few more we won't talk about until we cover FFI):

  • &str - a reference to a collection of characters in memory. &str is immutable.
  • String - a type holding a collection of characters. String can be mutated.

You can coerce a String into an &str by referencing it: &my_string.

Functions and Scopes

Functions are a mainstay of structured programming. C and C++ both support them:

#include <stdio.h>

void print() {
    printf("Hello, World");
}

int main() {
    print();
}

Does exactly what you expect: it prints "Hello, World!". The equivalent Rust is similar:

fn print() {
    println!("Hello, World!");
}

fn main() {
    print();
}

Returning Data from Functions

#include <stdio.h>

int add_one(int i) {
    return i+1;
}

int main() {
    int x = add_one(5);
    printf("%d", x);
}

Here you are declaring a function named add_one, with the return type int. You accept a parameter named i, and return i+1.

The Rust syntax is quite different:

fn add_one(i: i32) -> i32 {
    i + 1
}

fn main() {
    let x = add_one(5);
    println!("{x}");
}

The syntax differences are quite obvious:

  • The return type goes on the end, prefixed with ->.
  • Parameters are declared "name: type", rather than "type name".
  • There's no return statement! By default, Rust functions always return the result of the last expression. In idiomatic Rust, you'll usually see functions declared in this way.

If you miss return, it's still there:

fn add_one(i: i32) -> i32 {
    return i + 1;
}

fn main() {
    let x = add_one(5);
    println!("{x}");
}

Notice that to use return you need to add a semicolon - but the first version didn't have one! Lines with a semicolon are still expressions---but they return the "unit type" (()). So you can either omit the semicolon to have the expression "fall out" of the function, or you can use return and a semicolon. That's a little confusing, so let's look at some underlying Rust.

In Rust, everything returns.

fn foo() {}

fn main() {
    let i = foo();
    println!("{i:?}");
}

Notice we've used :?, the debug print again.

The program prints (). That's because () is like void - but it has a value (admittedly not a very useful one). So if you assign the result of a statement that ends in a ;, you are setting it to the unit type---which is probably not what you wanted.

Rust also supports expression assignment:

fn main() {
    let i = 5;
    let i = if i < 5 {
            1
        } else {
            0
        };
    println!("{i}");
}

Rust doesn't have a ternary operator!

You can assign from an expression or conditional just by returning using the no-semicolon syntax. This works for scopes, too:

fn main() {
    let i = {
        let mut accumulator = 0;
        for i in 0..10 {
            accumulator += i;
        }
        accumulator
    };
    println!("{i}");
}

Note that you can't use the return keyword when you do this---return explicitly returns out of the current function.

How about if I want to return multiple potential values from a function?

You can either make sure that every branch implicitly returns:

fn test(i: i32) -> i32 {
    if i < 10 {
        0
    } else {
        1
    }
}

fn main() {
    println!("{}", test(5));
}

Or you can use early return:

fn test(i: i32) -> i32 {
    if i < 10 {
        return;
    }
    1
}

fn main() {
    println!("{}", test(5));
}

Structures

In regular C, you are used to grouping data with structs:

#include <stdio.h>

struct mystruct_t {
    int a;
    int b;
};

int main() {
    struct mystruct_t value = {
        .a=1,
        .b=2
    };
    printf("%d, %d", value.a, value.b);
    return 0;
}

C++ is very similar, albeit with more assignment options:

#include <iostream>

struct mystruct_t {
    int a;
    int b;
};

int main() {
    mystruct_t value = { 1, 2 };
    std::cout << value.a << ", " << value.b << std::endl;
    return 0;
}

Rust is similar, too:

struct MyStruct {
    a: i32,
    b: i32,
}

fn main() {
    let value = MyStruct {
        a: 1,
        b: 2,
    };
    println!("{}, {}", value.a, value.b);
}

Rust will let you use a shortcut to debug print, too:

#[derive(Debug)]
struct MyStruct {
    a: i32,
    b: i32,
}

fn main() {
    let value = MyStruct {
        a: 1,
        b: 2,
    };
    println!("{value:?}");
}

You can even "pretty print":

#[derive(Debug)]
struct MyStruct {
    a: i32,
    b: i32,
}

fn main() {
    let value = MyStruct {
        a: 1,
        b: 2,
    };
    println!("{value:#?}");
}

#derive is another type of macro. The compiler will iterate through the structure at compile time, generating a trait implementation of fmt::Debug for you (once again, we'll talk about traits later). It's not quite reflection, but it does a great job of faking it!

Structure Privacy

In C++, struct defaults to all-public, while class defaults to all private. You can control individual members' privacy with public and private sections:

struct MyStruct {
    uint32_t my_public;
private:
    uint32_t my_private;
}

Rust doesn't have classes, but all members default to being private unless you mark them as public with the pub or pub(crate) markers:

#![allow(unused)]
fn main() {
struct MyPrivateStruct {} // Structure is private to the module
pub struct MyPublicStruct {
    my_private: u32,
    pub my_public: u32,
    pub(crate) my_public_but_not_exported_from_the_crate: u32,
}
}

Types of Structure

A "Marker Struct" (one you use to mark a type but that doesn't contain data) may be declared as:

#![allow(unused)]
fn main() {
struct MyMarker;

let s = MyMarker;
}

A regular structure with named fields:

#![allow(unused)]
fn main() {
struct Named {
    my_field: i32,
}

let s = Named { my_field: 3 };
println!("{}", s.my_field);
}

And a tuple-structure:

#![allow(unused)]
fn main() {
struct TupleStruct(i32);

let s = TupleStruct(3);
println!("{}", s.0)
}

Structure Functions

Functions can be attached to structures, as either methods or associated functions.

Associated Functions

Associated functions use a structure as a namespace. They are similar to static C++ functions in a class/struct, in that they aren't associated with an instance of a structure---you use the structure as a namespace for accessing them.

Functions are associated with a structure in Rust with an impl block---an implementation block. Associated functions do not take a self parameter referring to an instance.

struct MyStruct {}

impl MyStruct {
    pub fn do_something() {
        // Function body
    }
}

fn main() {
    MyStruct::do_something();
}

Equivalent C++ looks like this:

#include <stdio.h>
#include <stdlib.h>

class MyClass {
    public:
    
    static void do_something() {
        // Function body
    }
};

int main() {
    MyClass::do_something();
    return 0;
}

You can use associated functions as a constructor. Constructors aren't special, and you can define as many of them as you want---there's no rule of 3, 5, etc. A constructor is a convention---it's like any other associated function, and by convention it returns the an instance of the type that houses it. You can also refer to the current type with the syntax sugar Self:

#![allow(unused)]
fn main() {
struct MyStruct {
    value: i32,
};

impl MyStruct {
    fn new() -> MyStruct {
        Self { value: 3 }
    }

    fn with_param(value: i32) -> Self {
        // Syntax sugar: if you are assigning from a variable of the same name
        // and type, you don't need to write "value:value".
        Self { value }
    }
}
}

A similar C++ constructor would look like this:

class MyClass {
    public:
    
    int value;
    
    MyClass(int n) {
        value = n;
    }
};

int main() {
    auto my_class = MyClass(3);
    return 0;
}

There's no such thing as a move constructor, copy constructor. There's also no "default constructor", but there's a trait that accomplishes the same thing. Here's the short version using a "derive" (a macro that writes code for you):

#[derive(Default, Debug)]
struct MyStruct {
    a: i32,
    b: String
}

fn main() {
    println!("{:?}", MyStruct::default());
}

You can also explicitly implement Default:

#[derive(Debug)]
struct MyStruct {
    a: i32,
    b: String
}

impl Default for MyStruct {
    fn default() -> Self {
        Self {
            a: 3,
            b: "Hello".into(),
        }
    }
}

fn main() {
    println!("{:?}", MyStruct::default());
}

Methods

You can also define functions that operate on an instance of a structure, just like C++ methods. You annotate the access method of the function as the first parameter:

struct MyClass {
    a: i32
}

impl MyClass {
    fn print_me(&self) {
        println!("{}", self.a);
    }
}

fn main() {
    let mc = MyClass { a: 42 };
    mc.print_me();
}

This is equivalent to the C++:

#include <stdio.h>
#include <iostream>

class MyClass {
    public:
    
    int value;
    
    MyClass(int n) {
        value = n;
    }
    
    void print_me() {
        std::cout << this->value << "\n";
    }
};

int main() {
    auto my_class = MyClass(3);
    my_class.print_me();
    return 0;
}

You can replace &self with different types of access to the instance:

  • &self is most common. It provides a read-only (constant) reference that can access the instance but not change it.
  • &mut self grants mutable access via a reference. Your method can change the instance's contents.
  • self moves the instance into the function---it will be consumed if you don't return it. This is useful for "builder pattern" setups. We'll talk about that when we get to the Rust memory model.

Destructors - Drop

Rust doesn't explicitly define destructors---there's no need to define a destructor in most cases. So you won't encounter ~MyClass() functions.

That doesn't mean that Rust has abandoned RAII---Resource Acquisition is Initialiation. Rather, Rust has adopted it wholesale and associated destructors with a trait called Drop.

Drop is implemented for all of the container types, smart pointers, etc. Whenever a variable leaves scope, the Drop trait is called prior to a type being deleted. Let's explicitly implement Drop to demonstrate this:

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("I was dropped");
    }
}

fn main() {
    let a= MyStruct{};
}

Not too surprisingly---the drop function runs at the end of the program. This applies to local variables in a function, too:

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("I was dropped");
    }
}

fn do_something() {
    let a = MyStruct{};
}

fn main() {
    println!("Calling function");
    do_something();
    println!("Returned");
}

Dropping works on variables held by a structure even if the structure doesn't itself explicitly implement Drop:

struct MyContainer { data: Vec<MyStruct> }

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("I was dropped");
    }
}

fn main() {
    let mc = MyContainer {
        data: vec![MyStruct{}, MyStruct{}, MyStruct{}]
    };
}

So---just like C++---RAII makes it very difficult to accidentally leak memory. We'll go over this in a lot more detail soon.

You can also explicitly drop anything:

struct MyContainer { data: Vec<MyStruct> }

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("I was dropped");
    }
}

fn main() {
    let mc = MyContainer {
        data: vec![MyStruct{}, MyStruct{}, MyStruct{}]
    };
    std::mem::drop(mc);
    // Accessing mc is now a compilation error
}

Tuples & Destructuring

Tuples in Rust are a bit easier to use than their C++ cousins. In C++:

std::tuple<std::string, double> tuple = {"Hello", 3.14};
let s = std::get<0>(tuple);
let n = std::get<1>(tuple);

In Rust, you can define a tuple with parentheses:

fn main() {
    let tuple = ( "Hello".to_string(), 3.14 );
    let n = tuple.1;
}

Rust also supports destructuring:

fn main() {
    let tuple = { "Hello".to_string(), 3.14 };
    let (name, value) = tuple;
}

Enumerations

In C and C++, enums are tied to a value:

enum Level {
    Low, Medium, High
}
Level n = Low;

You can do the same thing in Rust:

#![allow(unused)]
fn main() {
enum Level {
    Low, Medium, High
}
let n = Level::Low;
}

C and C++ let you assign specific values to enumerations and cast them into numeric types:

enum Level { Low=1, Medium=2, High=3 }
Level n = Medium;
int o = n;

Rust lets you do the same:

#![allow(unused)]
fn main() {
enum Level {
    Low = 1, Medium = 2, High = 3
}
let n = Level::Medium;
let o: u8 = n;
}

Rust lets you specify the underlying type for numeric Enums:

#![allow(unused)]
fn main() {
#[repr(u8)]
enum Level {
    Low = 1, Medium = 2, High = 3
}
let n = Level::Medium;
}

Enumerations Can Contain Data

Rust enumerations can also contain data. They are effectively a tagged union, or variant. They will be the size of the largest possible member. The match command is the best way to access data within an enumeration.

#![allow(unused)]
fn main() {
enum Command {
    DoNothing,
    Count(i32), // A tuple-style enumeration entry
    Search{ term: String, max_depth: i32 }, // A structure-style enumeration entry
}

let c = Command:DoNothing;
let c = Command::Count(12);
let c = Command::Search { term: "term".to_string(), max_deptth: 12 };

match c {
    Command::DoNothing => {}
    Command::Count(n) => {
        for i in 0..n { println!("{n}") }
    }
    Command::Search{ term, max_depth } => {}
}
}

It's important to remember that an enumeration only contains data for the assigned value---it is equal to exactly one option.

Remember we've used Option and Result? They are enumerations, using generics.

Option

Rust doesn't have null values (it does in FFI code, but not in "safe" Rust). Whenever a value may or may not be present, it is wrapped in an Option. Options are generic (we'll talk about that in the generics section)---it can contain pretty much anything. The declaration for Option looks like this:

#![allow(unused)]
fn main() {
enum Option<T> { None, Some(T) }
}

You've seen unwrap() (an associated function that returns the option's content or panics). You can also access an Option with a match statement:

#![allow(unused)]
fn main() {
match my_option {
    Some(x) => // Do something with x
    None => {}
}
}

The "if let" statement is a one-option match. You can use it to destructure an Option. It's often preferable, because there are only two possibilities:

#![allow(unused)]
fn main() {
if let Some(x) = my_option {
    // You can use x here
} else {
    // If you want to do something else
}
}

if let is conditional upon destructuring succeeding. So let x = Some(x) is pattern matching. If it succeeded, then if let fires. You can also use while let to perform the comparison on an iterator.

Result

Result is also an enumeration, indicating a fallible action. It's very similar to the new std::expected in C++. Rust's Results are:

#![allow(unused)]
fn main() {
enum Result<T, E> {
    Ok(T)
    Err(E)
}
}

They are generic (just like options), effectively templated in C++ parlance. When an operation may fail, it returns either Ok(good_value) or Err(error type). We'll talk a lot more about this in Error Handling.

Enumerations can have Associated Functions, too

You can use impl blocks with enumerations, too.

enum MyEnum { A, B }

impl MyEnum {
    fn new() -> Self { MyEnum::A } // A constructor
    fn print_me(&self) {
        match self {
            MyEnum::A => println!("The first option!"),
            MyEnum::B => println!("The second option!"),
        }
    }
}

fn main() {
    let e = MyEnum::new();
    e.print_me();
}

Containers

Rust implements a number of container types, similar to C++ standard library types.

Arrays

An array in Rust can be declared as follows:

fn main() {
    let my_array = [1, 2, 3, 4]; // Type is inferred
    let my_array: [u32; 4] = [1, 2, 3, 4]; // Type specified
}

Just like C and C++, arrays are stored on the stack. Similar C++:

#include <array>

int main() {
    int my_array[4] = {1, 2, 3, 4};
    std::array<int, 4> my_array_std = {1, 2, 3, 4};
    return 0;
}

Unlike C, Rust stores the length of the array and bounds-checks accesses. Accessing my_array[5] will panic, rather than exposing memory locations.

#include <stdio.h>

int main() {
    int my_array[4] = {1, 2, 3, 4};
    printf("%d", my_array[5]);
    return 0;
}

Prints "32767" on my test system. The equivalent Rust fails to compile, but we can fool it by adding a little arithmetic:

fn main() {
    let array = [1, 2, 3, 4];
    for index in 2..6 {
        println!("{}", array[index]);
    }
}

This panics with the error message:

thread 'main' panicked at src/main.rs:4:24:
index out of bounds: the len is 4 but the index is 4

This is good, safe behavior. (A get_unchecked call exists, and requries an unsafe block, to elide the bounds checking).

Vectors

C programmers sometimes complain that in Rust, everything looks like a vector. They aren't wrong: vectors are everywhere. C++ programmers tend to have vectors everywhere too!

A vector is like an array, but: it can grow, and it is stored on the heap. A C++ vector is typically a pointer to an area of heap memory, a notation about size and capacity, and a notation about the type stored inside. Rust vectors are exactly the same, including the same growth characteristics: when the capacity is exhausted, they double in size.

Let's add some data to a vector and debug-print it:

fn main() {
    let mut my_vec = Vec::new();
    my_vec.push(1);
    my_vec.push(2);
    println!("{my_vec:?}");
}

This is the same as the C++:

#include <stdio.h>
#include <vector>

int main() {
    std::vector<int> my_vec;
    my_vec.push_back(1);
    my_vec.push_back(2);
    for (auto val : my_vec) {
        printf("%d", val);
    }
    return 0;
}

Rust is safe by default on bounds-checking:

fn main() {
    let my_vec = vec![1, 2]; // Helpful macro for initializing vectors
    println!("{}", my_vec[3]);
}

Panics with an out-of-bounds error. The direct-equivalent C++:

#include <stdio.h>
#include <vector>

int main() {
    std::vector<int> my_vec;
    my_vec.push_back(1);
    my_vec.push_back(2);
    printf("%d", my_vec[3]);
    return 0;
}

Prints "0" and terminates normally on my system. (You can use at in C++ for a safe version; C++ typically defaults to unsafe, Rust to safe). Just like an array, get_unchecked is available for unsafe access (with an unsafe tag) if you really need to skip the bounds check.

If you need to pre-allocate a vector, you can use with_capacity:

fn main() {
    let mut my_vec = Vec::with_capacity(100);
    my_vec.push(1);
}

with_capacity generates an empty vector, but with pre-allocated capacity for the number of elements you want to store.

HashMap

Rust also includes a HashMap. It doesn't offer any ordering guarantees, and is comparable to std::unordered_map. It implements "HashBrown" as the default hashing mechanism, which while fast is cryptographically safe---sometimes for performance people use FxHash instead.

use std::collections::HashMap;

fn main() {
    let mut my_map = HashMap::new();
    my_map.insert("Hello".to_string(), 5);
    my_map.insert("World".to_string(), 6);

    if let Some(count) = my_map.get("Hello") {
        println!("{count}");
    }
}

Other Types

Rust implements many other containers:

  • VecDeque - a vector that acts like a queue, similar to deque in C++.
  • LinkedList - a premade linked-list type (writing linked lists in Rust is notoriously hard)
  • BTreeMap - a binary tree map that retains order.
  • HashSet - a direct equivalent to unordered_set.
  • BTreeSet - a set implemented with a binary tree.
  • BinaryHeap - a heap structure.

There are many more available through the crates infrastructure, which we'll cover when we get to dependencies.

Iterators

Just like C++, Rust uses iterators to provide a rich set of algorithms---and many crates such as IterTools use this to build even more functionality.

For loops in Rust are iterators:

fn main() {
    // These are the same:
    let my_vec = vec![1,2,3,4];
    for n in my_vec {
        println!("{n}");
    }

    let my_vec = vec![1,2,3,4];
    for n in my_vec.into_iter() {
        println!("{n}");
    }
}

These are also consuming iterators - they don't return a reference to the item in the collection, they move it into the loop scope and it is dropped at the end of the scope. You can't use my_vec after iterating with a consuming iterator. To iterate references:

fn main() {
    // These are the same:
    let my_vec = vec![1,2,3,4];
    for n in &my_vec {
        println!("{n}");
    }

    //let my_vec = vec![1,2,3,4];
    // No need to recreate the vector now
    for n in my_vec.iter() {
        println!("{n}");
    }
}

If you prefer a more iterator-based approach, you can also do the following:

fn main() {
    let my_vec = vec![1,2,3,4];
    my_vec.iter().for_each(|n| println!("{n}"));
    // We pass in a closure
}

This is equivalent to the C++:

#include <stdio.h>
#include <vector>

int main() {
    std::vector<int> v = {1, 2, 3, 4};
    std::for_each(v.begin(), v.end(), [](int const& elem) {
        printf("%d\n", elem);
    });
    return 0;
}

Iterators are frequently chained together (much like the new C++ Ranges system). For example:

fn main() {
    let my_vec = vec![1,2,3,4];

    // Create a vector or strings
    let my_new_vec: Vec<String> = my_vec
        .iter()
        .map(|n| n.to_string()) // Map converts each entry to another type
        .collect();

    let max = my_vec.iter().max();
    let min = my_vec.iter().min();
    let sum: u32 = my_vec.iter().sum();
    let count = my_vec.iter().count();

    let sum_with_fold = my_vec.iter().fold(0, |acc, x| acc + x);

    let all_the_numbers: Vec<(&u32, &String)> = my_vec.iter().zip(my_new_vec.iter()).collect();
    println!("{all_the_numbers:?}");

    my_vec
        .iter()
        .filter(|n| **n > 2) // Dereferencing because we have two layers of reference
        .for_each(|n| println!("{n}"));
}

In other words, most of the algorithms in C++ are implemented. There's a lot more on the Rust documentation site: https://doc.rust-lang.org/std/iter/trait.Iterator.html#

We'll talk about parallel iteration later.

Move by Default

Newcomers to Rust are always surprised by this one!

First, a quick C++ quiz. Do you really know what std::move does? When I've taught in person, there's a surprising amount of confusion. Does it create a copy? Can you use the value after you've moved out of it? It's not as clear as it could be.

std::move converts a type to an xvalue---a value that used to be valid, but may now have been plundered by the moved-to function. It's guaranteed to be valid, and it's undefined behavior to rely on its content. my_function(std::move(x)); leaves x in a messy state.

Rust is move heavy---so much so that it is the default for all non-primitive types (which automatically get copied, since they fit in a register). In most languages, you'd expect this to compile:

fn do_something(s: String) {
    // Something happens here
}

fn main() {
    let s = "Hello".to_string();
    do_something(s);
    println!("{s}");
}

Instead of compiling, you get a pretty long error message summarized as error[E0382]: borrow of moved value: s. The full error does a great job of explaining what happened:

error[E0382]: borrow of moved value: `s`
 --> src/main.rs:8:15
  |
6 |     let s = "Hello".to_string();
  |         - move occurs because `s` has type `String`, which does not implement the `Copy` trait
7 |     do_something(s);
  |                  - value moved here
8 |     println!("{s}");
  |               ^^^ value borrowed here after move
  |
note: consider changing this parameter type in function `do_something` to borrow instead if owning the value isn't necessary
 --> src/main.rs:1:20
  |
1 | fn do_something(s: String) {
  |    ------------    ^^^^^^ this parameter takes ownership of the value
  |    |
  |    in this function
  = note: this error originates in the macro `$crate::format_args_nl` which comes from the expansion of the macro `println` (in Nightly builds, run with -Z macro-backtrace for more info)
help: consider cloning the value if the performance cost is acceptable
  |
7 |     do_something(s.clone());
  |                   ++++++++

The parameter s: String doesn't borrow or copy---it moves ownership to the function. The String now belongs to the function. It is dropped as soon as the function ends!

You could simply move it back (return value optimization guarantees from C++ apply to Rust, too):

fn do_something(s: String) -> String {
    // Something happens here
    s
}

fn main() {
    let s = "Hello".to_string();
    let s = do_something(s);
    println!("{s}");
}

That's fine if its the coding style you want, but it's not overly ergonomic. What you want to do is to borrow the data---make a reference. That's up next.

Borrowing

References in Rust are explicit at both the caller and the callee. Our simple "do something" example with a reference works:

fn do_something(s: &String) {
    // Something happens here
}

fn main() {
    let s = "Hello".to_string();
    do_something(&s);
    println!("{s}");
}

Contrast this with C++ in which you don't specify the & on the caller. It's more typing, but you can't accidentally copy your data when the function signature changes.

If you want to allow the function to change/mutate the data, you need to mutably borrow it:

fn do_something(s: &mut String) {
    // Something happens here
    *s += " World";
}

fn main() {
    let mut s = "Hello".to_string();
    do_something(&mut s);
    println!("{s}");
}

Notice that this is even more pedantic: you have to make s mutable before it can be borrowed mutably. And then you have to explicitly mark the borrow &mut. There's no escaping it---Rust makes you specify your intent when you lend the data to another funciton.

Borrowing Strings

Strings have a special case. A string contains a buffer of characters, and you can immutably refer to the characters as &str---a static string. So this also works:

fn do_something(s: &str) {
    // Something happens here
}

fn main() {
    let s = "Hello".to_string();
    do_something(&s);
    println!("{s}");
}

As we saw in the strings section, you can't modify an str buffer---but if all you need to do is print or use the string in a formatting expression (etc.), it can be quicker to just pass the reference.

Slices

Slices are analagous to a C++ "view" type, or "span" type. They refer to a contiguous area of memory, usually inside a collection. You can use iterators on a span without needing to know the underlying details of the collection. For example:

fn sum(values: &[i32]) -> i32 {
    values.iter().sum()
}

fn main() {
    let my_vec = vec![1,2,3,4,5];
    println!("{}", sum(&my_vec));
}

Vectors and arrays decay into a slice when you borrow them. You can also use slices to look at just part of a collection of data:

fn sum(values: &[i32]) -> i32 {
    values.iter().sum()
}

fn main() {
    let my_vec = vec![1,2,3,4,5];
    println!("{}", sum(&my_vec[0..3]));
}

Memory Management

Rust is a real systems language, with proper memory management. No garbage collector here! And with the great power of memory management, comes great responsibility to not accidentally leak your company's secrets or let remote users execute arbitrary code on your system.

Fortunately, Rust prioritizes safety---and also a safety culture, in which Rustaceans strive to create safe code.

C-style allocation and deallocation

Unless you are working on embedded, real-time or other really low-level systems, you probably won't need to manually allocate and de-allocate memory. Rust has very good memory management out-of-the-box, and you can get a long way without needing to worry about it. This section serves:

  • To show you what you can do if you need it.
  • To help you understand why Box, Vec and other types are so useful---and what they actually do.

The Stack and Primitives

"Primitive" types (such as u32, i8 and usize/isize---whose size is the pointer size of your platform) are natively supported by CPUs. You can store them on the stack, copy them between functions and generally not worry about things like ownership, borrowing and lifetimes. In fact, it's often slower to borrow a u32 than it is to copy it. Borrowing creates a pointer, which might be 64-bits in size, whereas the u32 itself is only 32-bits.

So when you are using primitives, you really don't have to worry. The stack will ensure that when a function ends, any variables on the stack will be cleaned up. Arrays on the stack are cleaned up, too.

The stack is small---64 kb by default on Linux. So you can't put everything in there.

Manually Allocating & De-allocating Memory

The "heap" is a region of memory that is shared by your program, and doesn't have the size-restrictions of the stack. It is always allocated and de-allocated. In "managed" languages, the language runtime is still allocating to the heap---but it uses a garbage collector of some sort to de-allocate memory that is no longer needed. This has the advantage that you don't need to worry about it, and the disadvantages:

  • You don't know for sure when memory will be allocated. Is it allocated up-front? That's great for systems with a fixed memory size, but not so good for systems where you want to allocate memory on-demand. Is it allocated on first use? That's great for systems where you don't know how much memory you need up-front, but not so good for systems where you want to allocate memory up-front.
  • You don't know for sure when the memory will be de-allocated.
  • You get the infamous "GC pauses" where the program stops for a while to do garbage collection. The pauses might be very short, but it's still an insurmountable problem if you are trying to control the braking system on a sports car!
  • You often have to jump through hoops to use an exact heap size, causing issues on embedded systems.

On some embedded platforms, you pretty much get to start out with a libc implementation (that may not be complete). On others, you get a platform definition file and have to do things the hard way --- we're not going that far!

libc_malloc example

This is in the code/04_mem/libc_malloc folder.

fn allocate_memory_with_libc() {
    unsafe {
        // Allocate memory with libc (one 32-bit integer)
        let my_num: *mut i32 = libc::malloc(std::mem::size_of::<i32>() as libc::size_t) as *mut i32;
        if my_num.is_null() {
            panic!("failed to allocate memory");
        }

        // Set the allocated variable - dereference the pointer and set to 42
        *my_num = 42;
        assert_eq!(42, *my_num);

        // Free the memory with libc - this is NOT automatic
        libc::free(my_num as *mut libc::c_void);
    }
}

fn main() {
    allocate_memory_with_libc();
}

So if you find yourself having to use libc, this is what you can expect: it looks a LOT like C! In your unsafe block, you are calling malloc, checking that it gave you the memory you requested, then setting the value of the memory and finally freeing it.

If you forget to call free, then just like a C program---you leaked memory.

Using Rust's Allocator

Using malloc isn't always as simple as it sounds, you need to worry about memory alignment (lining up memory blocks with your platform's "word size"). Rust provides an allocator setup that you can use instead. It's similar, and still unsafe:

#![allow(unused)]
fn main() {
fn allocate_memory_with_rust() {
    use std::alloc::{alloc, dealloc, Layout};

    unsafe {
        // Allocate memory with Rust. It's safer to force alignment.
        let layout = Layout::new::<u16>();
        let ptr = alloc(layout);

        // Set the allocated variable - dereference the pointer and set to 42
        *ptr = 42;
        assert_eq!(42, *ptr);

        // Free the memory - this is not automatic
        dealloc(ptr, layout);
    }
}
}

You have pretty much everything you expect from C: pointer arithmetic, null pointers, forgetting to call dealloc and leaking memory. At this level, it's quite ugly.

RAII - Resource Acquisition is Initialization

This pattern can be combined with resources. Memory, files, etc. Wrapping the resource in a type, and implementing Drop to close it. C++ invented this paradigm, it led to an immediate improvement over C:

  • No more goto to cleanup resources.
  • No more forgetting to cleanup resources.

This is why you haven't had to deal with memory or resource management: the RAII pattern is built into Rust, and every File, Mutex, Box, Drop, String (etc.) are implementing Drop in some way to ensure that you don't leak any memory or resources.

This example code is in code/04_mem/smart_ptr.

So let's take the memory allocation example and turn it into a "smart pointer"---a pointer that will clean up after itself.

use std::alloc::{Layout, alloc, dealloc};

struct SmartPointer<T> {
    ptr: *mut u8,
    data: *mut T,
    layout: Layout
}

impl <T> SmartPointer<T> {
    fn new() -> SmartPointer<T> {
        println!("Allocating memory for SmartPointer");

        unsafe {
            let layout = Layout::new::<T>();
            let ptr = alloc(layout);

            SmartPointer {
                ptr,
                data: ptr as *mut T,
                layout
            }
        }
    }

    fn set(&mut self, val: T) {
        unsafe {
            *self.data = val;
        }
    }

    fn get(&self) -> &T {
        unsafe {
            self.data.as_ref().unwrap()
        }
    }
}

impl <T> Drop for SmartPointer<T> {
    fn drop(&mut self) {
        println!("Deallocating memory from SmartPointer");
        unsafe {
            dealloc(self.ptr, self.layout);
        }
    }
}

fn main() {
    let mut my_num = SmartPointer::<i32>::new();
    my_num.set(12);
    println!("my_num = {}", my_num.get());
}

Box - Unique Pointer

C++ has the wonderful unique_ptr type. You heap-allocate a unique_ptr, it wraps its contents---and is automatically deleted when the pointer leaves scope. Rust has a type called Box that does the same thing.

struct MyStruct {
    n: i32,
}

fn main() {
    let boxed = Box::new(MyStruct { n: 12 });
}

The Rust Box type includes a huge number of options. These range from pinning memory in place (so it can't be rearranged) to building from_raw_parts to wrap an existing pointer in a Box.

Rc and Arc - Shared Pointer

Sometimes, ownership becomes confusing. Particularly if you are sending data off to be processed in more than one thread, you can end up with shared ownership---and exactly when something should be dropped from memory becomes confusing.

Rust has Rc (for "reference counted") as a wrapper type for this. (There's also Arc - atomic reference counted - for multi-threaded situations).

You can turn any variable into a reference-counted variable (on the heap) by wrapping it in Rc:

This is in projects/part2/refcount

use std::rc::Rc;

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("Dropping");
    }
}

fn move_it(n: Rc<MyStruct>) {
    println!("Moved");
}

fn ref_it(n: &MyStruct) {
    // Do something
}

fn main() {
    let shared = Rc::new(MyStruct{});
    move_it(shared.clone());
    ref_it(&shared);
}

So we take a reference, move a clone (the Rc type is designed to have clone() called whenever you want a new shared pointer to the original)---and the data is only dropped once. It is shared between all the functions. You can use this to spread data widely between functions.

You can't mutate the contents of an Rc without some additional help.

Arc is the same thing, but it replaces the reference counter with an atomic---a guaranteed synchronized (and still very fast) thread-safe counter.

use std::sync::Arc;

struct MyStruct {}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("Dropping");
    }
}

fn move_it(n: Arc<MyStruct>) {
    println!("Moved");
}

fn ref_it(n: &MyStruct) {
    // Do something
}

fn main() {
    let shared = Arc::new(MyStruct{});
    move_it(shared.clone());
    ref_it(&shared);
}

The Borrow Checker

The borrow checker gets a bad name from people who run into it and discover "I can't do anything!". The borrow checker does take a bit of getting used to - but in the medium term it really helps.

I went through a cycle going from C++ to Rust, and many people I've talked to went through the same:

  • First week or two: I hate the borrow checker! This is awful! I can't do anything!
  • Next: I see how to work within what it wants, I can live with this
  • Then: Wow, I'm writing Rust-like C++ and Go now - and my code is failing less frequently.

The good news is that if you are familiar with Modern C++, you've run into a lot of the same issues that the borrow checker helps with. Let's work through some examples that show how life with Rust is different.

Immutable by Default

This one trips a few people up when they start with Rust. This won't compile:

fn main() {
    let i = 5;
    i += 1;
}

Variables are immutable by default. In C++ terms, you just tried to write:

int main() {
    const i = 5;
    i += 1;
    return 0;
}

You can make i mutable and it works as you'd expect:

fn main() {
    let mut i = 5;
    i += 1;
}

In other words: C++ and Rust have exactly the opposite defaults. In C++, everything is mutable unless you const it. Rust, everything is immutable unless you mut it.

You could simply declare everything to be mutable. The linter will regularly remind you that things can be immutable. It's considered good Rust style to minimize mutability, so you aren't surprised by mutations.

Move by Default

Quick show of hands. Who knows what std::move does? Who really likes std::move?

This one surprises everyone. The following code does what you'd expect:

fn do_it(a: i32) {
    // Do something
}

fn main() {
    let a = 42;
    do_it(a);
    println!("{a}");
}

So why doesn't this work?

fn do_it(a: String) {
    // Do something
}

fn main() {
    let a = String::from("Hello");
    do_it(a);
    println!("{a}");
}

So why did this work with i32? i32 is a primitive - and implements a trait named Copy. Types can only implement Copy if they are equal to or smaller than a register---it's actually faster to just copy them than to use a pointer to their value. This is the same as C++ copying primitive types. When you work with a complex type (String and C++'s std::string are very similar; a size, a heap-allocated buffer of characters. In Rust's case they are UTF-8).

The error message borrow of moved value, with a long explanation isn't as helpful as you might like.

The key is: Rust is move by default, and Rust is more strict about moving than C++. Here's what you wrote in C++ terms:

#include <string>

void do_it(std::string s) {
    // Do something
}

int main() {
    std::string s = "Hello";
    do_it(std::move(s));
    // s is now in a valid but unspecified state
    return 0;
}

What happens if you use s? Nobody knows, it's undefined behavior. std::move in C++ converts an object to an xvalue---a type that has "been moved out of", and may not may not be in a valid state. Rust takes this to the logical conclusion, and prevents access to a "moved out of" type.

Moving Values Around

If you want to, you can move variables in and out of functions:

fn do_it(a: String) -> String {
    // Do something
    a
}

fn main() {
    let a = String::from("Hello");
    let a = do_it(a);
    println!("{a}");
}

This code is valid. Moving will generate memcpy that is usually removed by compiler optimizations, and LLVM applies the same returned-value optimizations as C++ for returning from a function.

Usually, I recommend moving out of a variable if you are genuinely done with it. Conceptually, you are giving ownership of the object to another function - it's not yours anymore, so you can't do much with it.

This is conceptually very similar to using unique_ptr in C++. The smart pointer owns the contained data. You can move it between functions, but you can't copy it.

Destructors and Moving

In C++, you can have move constructors---and moving structures around can require some thought as move constructors fire. Rust simplifies this. Moving a structure does not fire any sort of constructor. We haven't talked about destructors yet, so let's do that.

In Rust, destructors are implemented by a trait named Drop. You an add Drop to your own types. Let's use this to illustrate the lifetime of a type as we move it around:

The code is in projects/part2/destructors

struct MyStruct {
    s: String
}

impl Drop for MyStruct {
    fn drop(&mut self) {
        println!("Dropping: {}", self.s);
    }
}

fn do_it(a: MyStruct) {
    println!("do_it called");
}

fn move_it(a: MyStruct) -> MyStruct {
    println!("move_it called");
    a
}

fn main() {
    let a = MyStruct { s: "1".to_string() };
    do_it(a);
    // a no longer exists

    let b = MyStruct { s: "2".to_string() };
    let b = move_it(b);
    println!("{}", b.s);
}

As you can see, Drop is called when the structure ceases to be in scope:

  • do_it runs, and receives ownership of the object. The destructor fires as soon as the function exits.
  • move_it runs, and the object remains in-scope. The destructor fires when the program exits.

RAII is central to Rust's safety model. It's used everywhere. I try to remember to credit C++ with its invention every time I mention it!

Borrowing (aka References)

So with that in mind, what if you don't want to move your data around a lot (and pray that the optimizer removes as many memcpy calls as possible)? This introduces borrowing. Here's a very simple function that takes a borrowed parameter:

fn do_it(s: &String) {
    println!("{s}");
}

fn main() {
    let s = "42".to_string();
    do_it(&s);
}

Predictably, this prints 42. The semantics are similar to C++: you indicate a borrow/reference with &. Unlike C++, you have to indicate that you are passing a reference at both the call-site and the function signature---there's no ambiguity (which helps to avoid accidental passing by value/copying). This is the same as the following C++:

#include <string>
#include <iostream>

void do_it(const std::string &s) {
    std::cout << s << std::endl;
}

int main() {
    std::string s = "42";
    do_it(s);
    return 0;
}

Once again, notice that the reference is implicitly immutable.

If you want a mutable borrow---permitted to change the borrowed value---you have to indicate so.

fn do_it(s: &mut String) {
    s.push_str("1");
}

fn main() {
    let mut s = String::from("42");
    do_it(&mut s);
    println!("{s}");
}

Notice that you are:

  • Making s mutable in the let mut declaration. You can't mutably lend an immutable variable.
  • Explicitly decorating the lend as &mut at the call-site.
  • Explicitly borrowing as mutable in the parameters ((s: &mut String)).

Rust doesn't leave any room for ambiguity here. You have to mean it when you allow mutation!

Why Mutability Matters

The borrow checker enforces a very strict rule: a variable can only be borrowed mutably once at a time. You can have as many immutable borrows as you want---but only one current effective owner who can change the variable. This can take a little bit of getting used to.

So this is invalid code:

fn main() {
    let mut i: i32 = 1;
    let ref_i = &mut i;
    let second_ref_i = &mut i;
    println!("{i}");
    println!("{ref_i}");
    println!("{second_ref_i}");
}

The print statements are included to prevent the optimizer from realizing that variables are unused and silently removing them.

For example, this is an example of some code that triggers borrow-checker rage:

fn main() {
    let mut data = vec![1,2,3,4,5];
    for (idx, value) in data.iter().enumerate() {
        if *value > 3 {
            data[idx] = 3;
        }
    }
    println!("{data:?}");
}

Look at the error message:

error[E0502]: cannot borrow `data` as mutable because it is also borrowed as immutable
 --> src/main.rs:5:13
  |
3 |     for (idx, value) in data.iter().enumerate() {
  |                         -----------------------
  |                         |
  |                         immutable borrow occurs here
  |                         immutable borrow later used here
4 |         if *value > 3 {
5 |             data[idx] = 3;
  |             ^^^^ mutable borrow occurs here

Using an iterator (with .iter()) immutably borrows each record in the vector in turn. But when we index into data[idx] to change the value, we're mutably borrowing. Since you can't have a mutable borrow and other borrows, this is invalid.

You have to be careful to limit access. You could rewrite this code a few ways. The most Rustacean way is probably:

This is a good thing. Changing an underlying structure while you iterate it risks iterator invalidation.

Option 1: The Rustacean Iterators Way

fn main() {
    let mut data = vec![1,2,3,4,5];
    data.iter_mut().filter(|d| **d > 3).for_each(|d| *d = 3);
    println!("{data:?}");
}

This is similar to how you'd do it with ranges3 or the C++20 ranges feature. You are pipelining:

  • You obtain a mutable iterator (it will pass an &mut reference to each entry in turn).
  • You filter the target records with a predicate. |d| **d > 3 is a closure (lambda function) - d is the parameter, which will arrive as &&mut because the iterator takes a reference (&mut) and the filter then passes a reference to the reference. (Good news: the compiler clean that up. I still think its ugly!)
  • Then you run for_each on the remaining entries.

That's great for problems that naturally fit into an iterator solution.

Option 2: Do the two-step

Another option is to separate the operations:

fn main() {
    let mut data = vec![1,2,3,4,5];
    let mut to_fix = Vec::new();
    for (idx, value) in data.iter().enumerate() {
        if *value > 3 {
            to_fix.push(idx);
        }
    }
    for idx in to_fix { // Note: no .iter(). We're *moving* through each entry, invalidating the vector!
        data[idx] = 3;
    }
    println!("{data:?}");
}

This is pretty typical: you "beat" the borrow checker by breaking your task down into specific stages. In this case, we avoided a potential iterator invalidation. We also made it a lot easier for the compiler to perform static analysis and prevent data races.

Dangling Pointers

The borrow checker prevents a lot of dangling pointer and reference errors. For example:

fn main() {
    let s = String::from("Hello");
    let s_ref = &s;
    std::mem::drop(s);
    println!("{s_ref}");
}

Dropping s terminates its existence (it's the same as delete, it still calls destructors). Trying to print s after it is dropped is a compiler error: s no longer exists. Try the same in C++ and you don't get any warning by default (most static analysis will catch this):

#include <iostream>

int main() {
    std::string * s = new std::string("Hello");
    delete s;
    std::cout << *s << std::endl;
}

Summary

The borrow checker does take some getting used to, but it's surprising how long you can go without running it into if you go with idiomatic, straight-forward code. It's especially hard coming from C++, which allows you to get by with a lot.

In this section, we've covered:

  • Move by default, and Rust curing all "use after move" errors.
  • Explicit borrowing, and no more "oops, I copied by value by mistake".
  • Explicit mutability, to avoid surprises.
  • The "one mutable access at a time" rule, which prevents hidden bugs like iterator invalidation.
  • No more dangling pointers/references --- but still no garbage collector.

Now let's look at the second half of the borrow checker, lifetimes.

Lifetimes

The borrow checker not only tracks borrows, it attaches a lifetime to every borrow.

In very early versions of Rust, you had to annotate every reference with a lifetime. Be glad you don't have to do this anymore! Code could look like this:

fn do_it<'a>(s: &'a String) {
    println!("{s}");
}

fn main() {
    let s = String::from("Hello");
    do_it(&s);
}

This is still valid Rust, but in most cases Rust is able to deduce an "anonymous lifetime" for reference usage. Let's look at the new code:

  • do_it<'a> introduces a new lifetime, named a. You can name lifetimes whatever you want, but it's common to use short names.
  • In the arguments, s: &'a String states that the borrowed String adheres to lifetime a.

What's really happening here? Rust is tracking that when you call do_it, a lifetime is created. The lifetime must exceed the lifetime of the object being pointed at. Not doing so is a compiler error.

Escaping References

In Go, this is a really common idiom. The Go compiler will detect that you're referencing a local variable (via escape analysis), hoist it to the heap without telling you, and let you have your reference.

This compiles in C++:

#include <iostream>
using namespace std;

int& bar()
{
    int n = 10;
    return n;
}

int main() {
    int& i = bar();
    cout<<i<<endl;
    return 0;
}

The code does generate a warning, but it actually functioned on 2 of the 3 systems I tried it on! Rust is not so forgiving:

fn do_it() -> &String {
    let s = String::from("Hello");
    &s
}

fn main() {
    let s = do_it();
}

Rust starts by telling you that you need a lifetime specifier, and suggests a special lifetime called 'static. Static is a special lifetime in which you are promising that a reference will live forever, and Rust can not worry about it. So let's try that:

fn do_it() -> &'static String {
    let s = String::from("Hello");
    &s
}

fn main() {
    let s = do_it();
}

It still doesn't compile, this time with the correct error: cannot return a reference to local variable.

The borrow checker prevents this problem.

Returning References

What if you actually do want to return a valid reference? This function won't compile without lifetime specifiers.

fn largest<'a>(a: &'a i32, b: &'a i32) -> &'a i32 {
    if a > b {
        &a
    } else {
        &b
    }
}

fn main() {
    let a = 1;
    let b = 2;
    let ref_to_biggest = largest(&a, &b);
    println!("{ref_to_biggest}");
}

You have to clarify to Rust that the function can assume that both references will share a lifetime with the function output. So now for the returned reference to remain valid, both inputs also have to remain valid. (In this example, we're using a type that would be better off being copied anyway!)

Keeping References

Life starts to get complicated when you want to keep references around. Rust has to validate the lifetimes of each of these references.

struct Index {
    selected_string: &String
}

fn main() {
    let strings = vec![
        String::from("A"),
        String::from("B"),
    ];
    let index = Index {
        selected_string: &strings[1]
    };
    println!("{}", index.selected_string);
}

This fails to compile, but the compiler error tells you what needs to be done. So we apply its suggestions:

struct Index<'a> {
    selected_string: &'a String
}

fn main() {
    let strings = vec![
        String::from("A"),
        String::from("B"),
    ];
    let index = Index {
        selected_string: &strings[1]
    };
    println!("{}", index.selected_string);
}

And that works! You've tied the structure to the lifetime of the references it holds. If the strings table goes away, then the Index is invalid. Rust won't let this compile:

struct Index<'a> {
    selected_string: &'a String
}

fn main() {
    let index = {
        let strings = vec![
            String::from("A"),
            String::from("B"),
        ];
        let index = Index {
            selected_string: &strings[1]
        };
        index
    };
    println!("{}", index.selected_string);
}

The error message helpfully explains that strings does not live long enough---which is true. This is the primary purpose of the borrow checker: dangling references become a compile-time error, rather than a long head-scratching session at runtime.

Concurrency

Rust makes a big deal about advertising "fearless concurrency". What does this actually mean?

  • Concurrency primitives that aren't too painful to work with.
  • Data races will not compile.

Data-Race Protection

Rust makes the bold claim that it offers "fearless concurrency" and no more data-races (within a program; it can't do much about remote calls). That's a very bold claim, and one I've found to be true so far---I'm much more likely to contemplate writing multi-threaded (and async) code in Rust now that I understand how it prevents me from shooting myself in the foot.

An Example of a Data Race

Here's a little modern C++ program with a very obvious data-racing problem (it's in the cpp/data_race directory):

#include <thread>
#include <iostream>

int main() {
    int counter = 0;
    std::thread t1([&counter]() {
        for (int i = 0; i < 1000000; ++i) {
            ++counter;
        }
    });
    std::thread t2([&counter]() {
        for (int i = 0; i < 1000000; ++i) {
            ++counter;
        }
    });
    t1.join();
    t2.join();

    std::cout << counter << std::endl;

    return 0;
}

The program compiled and ran without any warnings (although additional static analysis programs would probably flag this).

The program fires up two threads. Each loops, incrementing a counter. It joins the threads, and prints the result. The predictable result is that every time I run it, I get a different result: 1015717, 1028094, 1062030 from my runs.

This happens because incrementing an integer isn't a single-step operation:

  1. The CPU loads the current counter value, into a register.
  2. The CPU increments the counter.
  3. The CPU writes the counter back into memory.

There's no guaranty that the two threads won't perform these operations while the other thread is also doing part of the same operation. The result is data corruption.

Let's try the same thing in Rust. We'll use "scoped threads" (we'll be covering threading in a later session) to make life easier for ourselves. Don't worry about the semantics yet:

fn main() {
    let mut counter = 0;
    std::thread::scope(|scope| {
        let t1 = scope.spawn(|| {
            for _ in 0 .. 1000000 {
                counter += 1;
            }
        });
        let t2 = scope.spawn(|| {
            for _ in 0 .. 1000000 {
                counter += 1;
            }
        });
        let _ = t1.join();
        let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type
    });
    println!("{counter}");
}

And now you see the beauty behind the "single mutabile access" rule: the borrow checker prevents the program from compiling, because the threads are mutably borrowing the shared variable. No data race here!

Atomics

If you've used std::thread, you've probably also run into atomic types. An atomic operation is guaranteed to be completed in one CPU operation, and optionally be synchronized between cores. The following C++ program makes use of an std::atomic_int to always give the correct result:

#include <thread>
#include <iostream>
#include <atomic>

int main() {
    std::atomic_int counter = 0;
    std::thread t1([&counter]() {
        for (int i = 0; i < 1000000; ++i) {
            ++counter;
        }
    });
    std::thread t2([&counter]() {
        for (int i = 0; i < 1000000; ++i) {
            ++counter;
        }
    });
    t1.join();
    t2.join();

    std::cout << counter << std::endl;

    return 0;
}

Rust gives you a similar option:

This code is in projects/part2/atomics

use std::sync::atomic::Ordering::Relaxed;
use std::sync::atomic::AtomicU32;

fn main() {
    let counter = AtomicU32::new(0);
    std::thread::scope(|scope| {
        let t1 = scope.spawn(|| {
            for _ in 0 .. 1000000 {
                counter.fetch_add(1, Relaxed);
            }
        });
        let t2 = scope.spawn(|| {
            for _ in 0 .. 1000000 {
                counter.fetch_add(1, Relaxed);
            }
        });
        let _ = t1.join();
        let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type
    });
    println!("{}", counter.load(Relaxed));
}

So Rust and C++ are equivalent in functionality. Rust is a bit more pedantic---making you specify the ordering (which are taken from the C++ standard!). Rust's benefit is that the unsafe version generates an error---otherwise the two are very similar.

Why Does This Work?

So how does Rust know that it isn't safe to share an integer---but it is safe to share an atomic? Rust has two traits that self-implement (and can be overridden in unsafe code): Sync and Send.

  • A Sync type can be modified - it has a synchronization primitive.
  • A Send type can be sent between threads - it isn't going to do bizarre things because it is being accessed from multiple places.

A regular integer is neither. An Atomic integer is both.

Rust provides atomics for all of the primitive types, but does not provide a general Atomic wrapper for other types. Rust's atomic primitives are pretty much a 1:1 match with CPU intrinsics, which don't generally offer sync+send atomic protection for complicated types.

Mutexes

If you want to provide similar thread-safety for complex types, you need a Mutex. Again, this is a familiar concept to C++ users.

Using a Mutex in C++ works like this:

#include <iostream>
#include <thread>
#include <mutex>

int main() {
    std::mutex mutex;
    int counter = 0;
    std::thread t1([&counter, &mutex]() {
        for (int i = 0; i < 1000000; ++i) {
            std::lock_guard<std::mutex> guard(mutex);
            ++counter;
        }
    });
    std::thread t2([&counter, &mutex]() {
        for (int i = 0; i < 1000000; ++i) {
            std::lock_guard<std::mutex> guard(mutex);
            ++counter;
        }
    });
    t1.join();
    t2.join();

    std::cout << counter << std::endl;

    return 0;
}

Notice how using the Mutex is a two-step process:

  1. You declare the mutex as a separate variable to the data you are protecting.
  2. You create a lock_guard by initializing the lock with lock_guard's constructor, taking the mutex as a parameter.
  3. The lock is automatically released when the guard leaves scope, using RAII.

This works, and always gives the correct result. It has one inconvenience that can lead to bugs: there's no enforcement that makes you remember to use the lock. You can get around this by building your own type and enclosing the update inside it---but the compiler won't help you if you forget. For example, commenting out one of the mutex locks won't give any compiler warnings.

Let's build the same thing, in Rust. The Rust version is a bit more complicated:

This code is in projects/part2/mutex

use std::sync::{Arc, Mutex};

fn main() {
    let counter = Arc::new(Mutex::new(0));
    std::thread::scope(|scope| {
        let my_counter = counter.clone();
        let t1 = scope.spawn(move || {
            for _ in 0 .. 1000000 {
                let mut lock = my_counter.lock().unwrap();
                *lock += 1;
            }
        });

        let my_counter = counter.clone();
        let t2 = scope.spawn(move || {
            for _ in 0 .. 1000000 {
                let mut lock = my_counter.lock().unwrap();
                *lock += 1;
            }
        });
        let _ = t1.join();
        let _ = t2.join(); // let _ means "ignore" - we're ignoring the result type
    });
    let lock = counter.lock().unwrap();
    println!("{}", *lock);
}

Let's work through what's going on here:

  1. let counter = Arc::new(Mutex::new(0)); is a little convoluted.
    1. Mutexes in Rust wrap the data they are protecting, rather than being a separate entity. This makes it impossible to forget to lock the data---you don't have access to the interior without obtaining a lock.
    2. Mutex only provides the Sync trait---it can be safely accessed from multiple locations, but it doesn't provide any safety for sending the data between threads.
    3. To gain the Send trait, we also wrap the whole thing in an Arc. Arc is "atomic reference count"---it's just like an Rc, but uses an atomic for the reference counter. Using an Arc ensures that there's only a single counter, with safe access to it from the outside.
    4. Note that counter isn't mutable---despite the fact that it is mutated. This is called interior mutability. The exterior doesn't change, so it doesn't have to be mutable. The interior can be changed via the Arc and the Mutex---which is protected by the Sync+Send requirement.
  2. Before each thread is created, we call let my_counter = counter.clone();. We're making a clone of the Arc, which increments the reference count and returns a shared pointer to the enclosed data. Arc is designed to be cloned every time you want another reference to it.
  3. When we start the thread, we use the let t1 = scope.spawn(move || { pattern. Notice the move. We're telling the closure not to capture references, but instead to move captured variables into the closure. We've made our own clone of the Arc, and its the only variable we are referencing---so it is moved into the thread's scope. This ensures that the borrow checker doesn't have to worry about trying to track access to the same reference across threads (which won't work). Sync+Send protections remain in place, and it's impossible to use the underlying data without locking the mutex---so all of the protections are in place.
  4. let mut lock = my_counter.lock().unwrap(); locks the mutex. It returns a Result, so we're unwrapping it (we'll talk about why later). The lock itself is mutable, because we'll be changing its contents.
  5. We access the interior variable by dereferencing the lock: *lock += 1;

So C++ wins slightly on ergonomics, and Rust wins on preventing you from making mistakes!

Summary

Rust's data race protection is very thorough. The borrow-checker prevents multiple mutable accesses to a variable, and the Sync+Send system ensures that variables that are accessed in a threaded context can both be sent between threads and safely mutated from multiple locations. It's extremely hard to create a data race in safe Rust (you can use the unsafe tag and turn off protections if you need to)---and if you succeed in making one, the Rust core team will file it as a bug.

All of these safety guarantees add up to create an environment in which common bugs are hard to create. You do have to jump through a few hoops, but once you are used to them---you can fearlessly write concurrent code knowing that Rust will make the majority of multi-threaded bugs a compilation error rather than a difficult debugging session.

Spawning Threads

In main.rs, replace the contents with the following:

fn hello_thread() {
    println!("Hello from thread!");
}

fn main() {
    println!("Hello from main thread!");

    let thread_handle = std::thread::spawn(hello_thread);
    thread_handle.join().unwrap();
}

Now run the program:

Hello from main thread!
Hello from thread!

So what's going on here? Let's break it down:

  1. The program starts in the main thread.
  2. The main thread prints a message.
  3. We create a thread using std::thread::spawn and tell it to run the function hello_thread.
  4. The return value is a "thread handle". You can use these to "join" threads---wait for them to finish.
  5. We call join on the thread handle, which waits for the thread to finish.

What happens if we don't join the thread?

Run the program a few times. Sometimes the secondary thread finishes, sometimes it doesn't. Threads don't outlive the main program, so if the main program exits before the thread finishes, the thread is killed.

Spawning Threads with Parameters

The spawn function takes a function without parameters. What if we want to pass parameters to the thread? We can use a closure:

fn hello_thread(n: u32) {
    println!("Hello from thread {n}!");
}

fn main() {
    let mut thread_handles = Vec::new();
    for i in 0 .. 5 {
        let thread_handle = std::thread::spawn(move || hello_thread(i));
        thread_handles.push(thread_handle);
    }
    thread_handles.into_iter().for_each(|h| h.join().unwrap());
}

Notice three things:

  • We're using a closure---an inline function that can capture variables from the surrounding scope.
  • We've used the shorthand format for closure: || code - parameters live in the || (there aren't any), and a single statement goes after the ||. You can use complex closures with a scope: |x,y| { code block }.
  • The closure says move. Remember when we talked about ownership? You have to move variables into the closure, so the closure gains ownership of them. The ownership is then passed to the thread. Otherwise, you have to use some form of synchronization to ensure that data is independently accessed---to avoid race conditions.

The output will look something like this (the order of the threads will vary):

Hello from thread 0!
Hello from thread 2!
Hello from thread 1!
Hello from thread 4!
Hello from thread 3!

In this case, as we talked about last week in Rust Fundamentals integers are copyable. So you don't have to do anything too fancy to share them.

Returning Data from Threads

The thread handle will return any value returned by the thread. It's generic, so it can be of any type (that supports sync+send; we'll cover that later). Each thread has its own stack, and can make normal variables inside the thread---and they won't be affected by other threads.

Let's build an example:

fn do_math(i: u32) -> u32 {
    let mut n = i+1;
    for _ in 0 .. 10 {
        n *= 2;
    }
    n
}

fn main() {
    let mut thread_handles = Vec::new();
    for i in 0..10 {
        thread_handles.push(std::thread::spawn(move || {
            do_math(i)
        }));
    }

    for handle in thread_handles {
        println!("Thread returned: {}", handle.join().unwrap());
    }
}

This returns:

Thread returned: 1024
Thread returned: 2048
Thread returned: 3072
Thread returned: 4096
Thread returned: 5120
Thread returned: 6144
Thread returned: 7168
Thread returned: 8192
Thread returned: 9216
Thread returned: 10240

Notice that each thread is doing its own math, and returning its own value. The join function waits for the thread to finish, and returns the value from the thread.

Dividing Workloads

We can use threads to divide up a workload. Let's say we have a vector of numbers, and we want to add them all up. We can divide the vector into chunks, and have each thread add up its own chunk. Then we can add up the results from each thread.

fn main() {
    const N_THREADS: usize = 8;

    let to_add: Vec<u32> = (0..5000).collect(); // Shorthand for building a vector [0,1,2 .. 4999]
    let mut thread_handles = Vec::new();
    let chunks = to_add.chunks(N_THREADS);

    // Notice that each chunk is a *slice* - a reference - to part of the array.    
    for chunk in chunks {
        // So we *move* the chunk into its own vector, taking ownership and
        // passing that ownership to the thread. This adds a `memcpy` call
        // to your code, but avoids ownership issues.
        let my_chunk = chunk.to_owned();

        // Each thread sums its own chunk. You could use .sum() for this!
        thread_handles.push(std::thread::spawn(move || {
            let mut sum = 0;
            for i in my_chunk {
                sum += i;
            }
            sum
        }));
    }

    // Sum the sums from each thread.
    let mut sum = 0;
    for handle in thread_handles {
        sum += handle.join().unwrap();
    }
    println!("Sum is {sum}");
}

There's a lot to unpack here, so I've added comments:

  1. We use a constant to define how many threads we want to use. This is a good idea, because it makes it easy to change the number of threads later. We'll use 8 threads, because my laptop happens to have 8 cores.
  2. We create a vector of numbers to add up. We use the collect function to build a vector from an iterator. We'll cover iterators later, but for now, just know that collect builds a vector from a range. This is a handy shorthand for turning any range into a vector.
  3. We create a vector of thread handles. We'll use this to join the threads later.
  4. We use the chunks function to divide the vector into chunks. This returns an iterator, so we can use it in a for loop. Chunks aren't guaranteed to be of equal size, but they're guaranteed to be as close to equal as possible. The last chunk will be smaller than the others.
  5. Now we hit a problem:
    • chunks is a vector owned by the main thread.
    • Each chunk is a slice --- a borrowed reference --- to part of the vector.
    • We can't pass a borrowed reference to a thread, because the thread might outlive the main thread. There's no guarantee that the order of execution will ensure that the data is destroyed in a safe order.
    • Instead, we use to_owned which creates an owned copy of each chunk. This is a memcpy operation, so it's not free, but it's safe.

This is a common pattern when working with threads. You'll often need to move data into the thread, rather than passing references.

Moving chunks like this works fine, but if you are using threads to divide up a heavy workload with a single answer --- there's an easier way!

Scoped Threads

In the previous example we divided our workload into chunks and then took a copy of each chunk. That works, but it adds some overhead. Rust has a mechanism to assist with this pattern (it's a very common pattern): scoped threads.

Let's build an example:

use std::thread;

fn main() {
    const N_THREADS: usize = 8;

    let to_add: Vec<u32> = (0..5000).collect();
    let chunks = to_add.chunks(N_THREADS);
    let sum = thread::scope(|s| {
        let mut thread_handles = Vec::new();

        for chunk in chunks {
            let thread_handle = s.spawn(move || {
                let mut sum = 0;
                for i in chunk {
                    sum += i;
                }
                sum
            });
            thread_handles.push(thread_handle);
        }

        thread_handles
            .into_iter()
            .map(|handle| handle.join().unwrap())
            .sum::<u32>()
    });
    println!("Sum is {sum}");
}

This is quite similar to the previous example, but we're using scoped threads. When you use thread::scope you are creating a thread scope. Any threads you spawn with the s parameter are guaranteed to end when the scope ends. You can still treat each scope just like a thread.

Because the threads are guaranteed to terminate, you can safely borrow data from the parent scope. This is a lifetime issue: a normal thread could keep running for a long time, past the time the scope that launched it ends---so borrowing data from that scope would be a bug (and a common cause of crashes and data corruption in other languages). Rust won't let you do that. But since you have the guarantee of lifetime, you can borrow data from the parent scope without having to worry about it.

This pattern is perfect for when you want to fan out a workload to a set of calculation threads, and wait to combine them into an answer.

Making it Easy with Rayon

A library named "Rayon" is the gold-standard for easy thread-based concurrency in Rust. It actually uses another crate (crossbeam) under the hood, but it provides a much simpler interface for the most common use cases. Rayon can help you with a lot of tasks. Let's work through using it.

Parallel Iterators

Let's start by adding Rayon to the project:

cargo add rayon

Probably the nicest addition Rayon bring is par_iter. The majority of things you can do with an iterator, you can auto-parallelize with par_iter. For example:

use rayon::prelude::*;

fn main() {
    let numbers: Vec<u64> = (0 .. 1_000_000).collect();
    let sum = numbers.par_iter().sum::<u64>();
    println!("{sum}");
}

Rayon creates a thread-pool (1 thread per CPU), with a job queue. The queue implements work-stealing (no idle threads), and supports "sub-tasks" - a task can wait for another task to complete. It really is as simple as using par_iter() (for an iterator of references), par_iter_mut() (for an iterator of mutable references), or into_par_iter() (for an iterator of values that moves the values).

Let's do another test, this time with nested tasks. We'll use a really inefficient function for finding prime numbers:

use std::time::Instant;
use rayon::prelude::*;

fn is_prime(n: u32) -> bool {
    (2 ..= n/2).into_par_iter().all(|i| n % i != 0 )
}

fn main() {
    // Print primes below 1,000
    let now = Instant::now();
    let numbers: Vec<u64> = (2 .. 10_000).collect();
    let elapsed = now.elapsed();
    let mut primes: Vec<&u64> = numbers.par_iter().filter(|&n| is_prime(*n as u32)).collect();
    primes.sort();
    println!("{primes:?}");
    println!("It took {} us to find {} primes", elapsed.as_micros(), primes.len());
}

Workspaces, Crates, Programs, Libraries and Modules

Let's talk about some terminology:

  • A crate is a Rust package. It can either be a program or a library---it's a package of code managed by Cargo.
  • A program is an executable program. A crate produces a program if it has a main.rs file, and usually a main function (you can change the main function name, but it does need an entry point)
  • A library is a crate with a lib.rs file. It compiles as a static library by default, you can override this if you need dynamic libraries (Rust is very much oriented towards self-contained statically linked systems).
  • A module is a unit-of-work for the compiler. Programs and libraries are divided into modules.
  • A workspace is a Cargo helper that lets you include multiple crates in one environment with a shared compilation target directory and better incremental compilation.

This is quite unlike C++'s system. #include is almost a cut-and-paste; the new C++20 modules system is a bit more similar--but I had troubles getting it to work consistently across platforms.

Workspaces

The example code uses a workspace, and I'd encourage you to do the same. Workspaces are a great mechanism for storing related code together.

Let's create a workspace.

  1. cd to your parent directory.
  2. Create a new Rust project with cargo new my_workspace.
  3. cd into my_workspace.
  4. Edit src/main.rs to change "Hello, World!" to something like "You probably intended to run a workspace member". This is optional, but helps avoid confusion.
  5. While in my_workspace, create a new project. cargo new hello.
  6. Edit my_workspace/Cargo.toml:
[workspace]
members = [ "hello" ]

Now change directory to my_workspace/hello and run the program with cargo run.

Take a look at my_workspace and you will see that a target directory has appeared. Within a workspace, all compiler artifacts are shared. For large projects, this can save a huge amount of disk space. It can also save on re-downloading dependencies, and will only recompile portions of the workspace that have changed.

While working on Hands-on Rust, I initially had 55 projects in separate crates without a workspace. I noticed that my book's code folder was using nearly 6 gigabytes of disk space, which was crazy. So I added a workspace, and that shrunk to a few hundred megabytes. Every single project was downloading all of the dependencies and building them separately.

Workspaces are safe to upload to github or your preferred Git repo. You can even access dependencies within a workspace remotely (we'll cover that in dependencies).

Libraries

Let's workshop through creating our first library. Keep the my_workspace and hello projects.

Change directory back to the workspace root (my_workspace/). Create a new library project;

cargo new hello_library --lib

Notice the --lib flag. You are creating a library.

Open my_workspace/Cargo.toml and add hello_library as a workspace member:

[workspace]
members = [ "hello", "hello_library" ]

Now open hello_library/src/lib.rs. Notice that Rust has auto-generated an example unit test system. We'll cover that in unit tests shortly. For now, delete it all and replace with the following code:

#![allow(unused)]
fn main() {
pub fn say_hello() {
    println!("Hello, world!");
}
}

The pub marks the function as "public"---available from outside the current module. Since it is in lib.rs, it will be exported in the library.

Now open hello/Cargo.toml and we'll add a dependency:

[dependencies]
hello_libary = { path = "../hello_library" }

And open hello/src/main.rs and we'll use the dependency. Replace the default code with:

use hello_library::say_hello;

fn main() {
    say_hello();
}

Congratulations! You've made your first statically linked library.

Modules and Access

Rust can subdivide code into modules, which can both be and contain public and private (private being the default). Coming from C++, I found this a little confusing. You can also create modules in-place (as namespaces) or in separate files. This can be confusing, so let's work through some examples.

Inline Module (Namespace)

Open hello_library/src/lib.rs. Let's add a private module:

#![allow(unused)]
fn main() {
mod private {
    fn hi() {
        println!("Say Hi!");
    }
}

pub fn say_hello() {
    println!("Hello, world!");
}
}

If you try to use private::hi() in your hello/src/main.rs program---it won't work. The module and the function are both private:

use hello_library::say_hello;

fn main() {
    say_hello();
    say_hello_library::private::hi(); // Will not compile
}

You can fix this by changing the module to be public:

#![allow(unused)]
fn main() {
pub mod private {
    fn hi() {
        println!("Say Hi!");
    }
}

pub fn say_hello() {
    println!("Hello, world!");
}
}

And it still doesn't work! That's because making a module public only exposes the public members of the module. So you also need to decorate the function as public:

#![allow(unused)]
fn main() {
pub mod private {
    pub fn hi() {
        println!("Say Hi!");
    }
}

pub fn say_hello() {
    println!("Hello, world!");
}
}

So that allows you to make a public namespace---and include private parts in the namespace that aren't exposed to the world. What if you want to write a function in a module, and expose it in a different namespace?

#![allow(unused)]
fn main() {
pub mod private {
    pub fn hi() {
        println!("Say Hi!");
    }
}

pub use private::hi;

pub fn say_hello() {
    println!("Hello, world!");
}
}

The use statement---importing something into the current namespace---can also be decorated with pub to re-export that import. You can use this with dependencies or your modules. (It's common to make a prelude module and import all of the most-likely to be useful functions and types into it for re-rexport). Now your program can refer to hello_library::hi directly.

File-based modules

If you're working in a team, it's usually a good idea to not all be trying to edit the same file at once. There are other advantages to using multiple files:

  • Rust can compile multiple files at the same time.
  • Organizing your code with files makes it a lot easier to find things.
  • You can use conditional compilation to include different files based on compilation constraints.

Let's make a one-file module. In hello_library/src create a new file named goodbye.rs. In that file, write:

#![allow(unused)]
fn main() {
pub fn bye() {
    println!("Goodbye");
}
}

Simply having the file doesn't make it do anything, or part of your project. In hello_library/src/lib.rs add a line to include the module:

#![allow(unused)]
fn main() {
mod goodbye;
}

The module is now private, even though the bye function is public! You will be able to access bye elsewhere in your library, but not from consumer applications. You can use the same mechanisms as for inline modules to change that. pub mod exports it as a hello_library::goodbye (the filename is the namespace). Or you can pub use goodbye::bye.

Directory modules

The final type of module places the module in a directory. The directory must contain a mod.rs file to act as the module root---and can include other files or inline modules as above.

Create a new directory, hello_library/src/dirmod. In that directory, create mod.rs:

#![allow(unused)]
fn main() {
pub fn dir_hello() {
    println!("Hello from dir module");
}
}

Now in hello_library/src/lib.rs include the new module:

#![allow(unused)]
fn main() {
pub mod dirmod;
}

You can now access the module in your hello project, with hello_library::dirmod::dir_hello().

Traits

You've used traits a lot---they are an important part of Rust. But we haven't really talked about them.

Implementing Traits

Whenever you've used #[derive(Debug, Clone, Serialize)] and similar---you are using procedural macros to implement traits. We're not going to dig into procedural macros---they are worthy of their own class---but we will look at what they are doing.

Debug is a trait. The derive macro is implementing the trait for you (including identifying all of the fields to output). You can implement it yourself:

#![allow(unused)]
fn main() {
use std::fmt;

struct Point {
    x: i32,
    y: i32,
}

impl fmt::Debug for Point {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("Point")
         .field("x", &self.x)
         .field("y", &self.y)
         .finish()
    }
}
}

Traits are an interface. Each trait defines functions that must be implemented to apply the trait to a type. Once you implement the trait, you can use the trait's functions on the type---and you can also use the trait as a type.

Making a Trait

The code for this is in code/04_mem/make_trait.

Let's create a very simple trait:

#![allow(unused)]
fn main() {
trait Animal {
    fn speak(&self);
}
}

This trait has one function: speak. It takes a reference to self (the type implementing the trait) and returns nothing.

Note: trait parameters are also part of the interface, so if a trait entry needs &self---all implementations of it will need &self.

Now we can make a cat:

#![allow(unused)]
fn main() {
struct Cat;

impl Animal for Cat {
    fn speak(&self) {
        println!("Meow");
    }
}
}

Now you can run speak() on any Cat:

fn main() {
    let cat = Cat;
    cat.speak();
}

You could go on and implement as many speaking animals as you like.

Traits as Function Parameters

You can also create functions that require that a parameter implement a trait:

#![allow(unused)]
fn main() {
fn speak_twice(animal: &impl Animal) {
    animal.speak();
    animal.speak();
}
}

You can call it with speak_twice(&cat)---and it runs the trait's function twice.

Traits as Return Types

You can also return a trait from a function:

#![allow(unused)]
fn main() {
fn get_animal() -> impl Animal {
    Cat
}
}

The fun part here is that you no-longer know the concrete type of the returned type---you know for sure that it implements Animal. So you can call speak on it, but if Cat implements other traits or functions, you can't call those functions.

Traits that Require Other Traits

You could require that all Animal types require Debug be also implemented:

#![allow(unused)]
fn main() {
trait Animal: Debug {
    fn speak(&self);
}
}

Now Cat won't compile until you derive (or implement) `Debug).

You can keep piling on the requirements:

#![allow(unused)]
fn main() {
trait DebuggableClonableAnimal: Animal+Debug+Clone {}
}

Let's make a Dog that complies with these rules:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
struct Dog;

impl Animal for Dog {
    fn speak(&self) {
        println!("Woof");
    }
}

impl DebuggableClonableAnimal for Dog {}
}

Now you can make a dog and call speak on it. You can also use DebuggableCloneableAnimal as a parameter or return type, and be sure that all of the trait functions are available.

Dynamic Dispatch

All of the examples above can be resolved at compile time. The compiler knows the concrete type of the trait, and can generate the code for it. But what if you want to store a bunch of different types in a collection, and call a trait function on all of them?

You might want to try this:

#![allow(unused)]
fn main() {
let animals: Vec<impl Animal> = vec![Cat, Dog];
}

And it won't work. The reason it won't work is that Vec stores identical entries for each record. That means it needs to know the size of the entry. Since cats and dogs might be of different sizes, Vec can't store them.

You can get around this with dynamic dispatch. You've seen this once before, with type GenericResult<T> = std::result::Result<T, Box<dyn std::error::Error>>;. The dyn keyword means that the type is dynamic---it can be different sizes.

Now think back to boxes. Boxes are a smart-pointer. That means they occupy the size of a pointer in memory, and that pointer tells you where the data actually is in the heap. So you can make a vector of dynamic, boxed traits:

#![allow(unused)]
fn main() {
let animals: Vec<Box<dyn Animal>> = vec![Box::new(Cat), Box::new(Dog)];
}

Each vector entry is a pointer (with a type hint) to a trait. The trait itself is stored in the heap. Accessing each entry requires a pointer dereference and a virtual function call. (A vtable will be implemented, but often optimized away---LLVM is very good at avoiding making vtables when it can).

In the threads class, someone asked if you could "send interfaces to channels". And yes, you can---you have to use dynamic dispatch to do it. This is valid:

#![allow(unused)]
fn main() {
let (tx, rx) = std::sync::mpsc::channel::<Box<dyn Animal>>();
}

This works with other pointer types like Rc, and Arc, too. You can have a reference-counted, dynamic dispatch pointer to a trait.

Using dynamic dispatch won't perform as well as static dispatch, because of pointer chasing (which reduces the likelihood of a memory cache hit).

The Any Type

If you really, really need to find out the concrete type of a dynamically dispatched trait, you can use the std::any::Any trait. It's not the most efficient design, but it's there if you really need it.

The easiest way to "downcast" is to require Any in your type and an as_any function:

#![allow(unused)]
fn main() {
struct Tortoise;

impl Animal for Tortoise {
    fn speak(&self) {
        println!("What noise does a tortoise make anyway?");
    }
}

impl DowncastableAnimal for Tortoise {
    fn as_any(&self) -> &dyn Any {
        self
    }
}
}

Then you can "downcast" to the concrete type:

#![allow(unused)]
fn main() {
let more_animals : Vec<Box<dyn DowncastableAnimal>> = vec![Box::new(Tortoise)];
for animal in more_animals.iter() {
    if let Some(cat) = animal.as_any().downcast_ref::<Tortoise>() {
        println!("We have access to the tortoise");
    }
    animal.speak();
}
}

If you can avoid this pattern, you should. It's not very Rusty---it's pretending to be an object-oriented language. But it's there if you need it.

Implementing Operators

"Operator overloading" got a bad name from C++. You can abuse it, and decide that operators do bizarre things. Please don't. If you allow two types to be added together, please use an operation that makes sense to the code reader!

See the 04_mem/operator_overload project.

You can implement operators for your types. Let's make a Point type that can be added together:

use std::ops::Add;

struct Point {
    x: f32,
    y: f32,
}

impl Add for Point {
    type Output = Point;

    fn add(self, rhs: Self) -> Self::Output {
        Point {
            x: self.x + rhs.x, 
            y: self.y + rhs.y
        }
    }
}

fn main() {
    let a = Point { x: 1.0, y: 2.0 };
    let b = Point { x: 3.0, y: 4.0 };
    let c = a + b;
    println!("c.x = {}, c.y = {}", c.x, c.y);
}

There's a full range of operators you can overload. You can also overload the +=, /, * operators, and so on. This is very powerful for letting you express functions (rather than remembering to add x and y each time)---but it can be abused horribly if you decide that + should mean "subtract" or something. Don't do that. Please.

Generics

Generics are very closely tied to traits. "Generics" are meta-programming: a way to write "generic" code that works for multiple types. Traits are a way to specify the requirements for a generic type.

The simplest generic is a function that takes a generic type. Who'se sick of typing to_string() all the time? I am! You can write a generic function that accepts any type that implements ToString---even &str (bare strings) implement ToString:

#![allow(unused)]
fn main() {
fn print_it<T: ToString>(x: T) {
    println!("{}", x.to_string());
}
}

So now you can call print_it with print_it("Hello"), print_it(my_string) or even print_it(42) (because integers implement ToString).

There's a second format for generics that's a bit longer but more readable when you start piling on the requirements:

#![allow(unused)]
fn main() {
fn print_it<T>(x: T)
where
    T: ToString,
{
    println!("{}", x.to_string());
}
}

You can combine requirements with +:

#![allow(unused)]
fn main() {
fn print_it<T>(x: T)
where
    T: ToString + Debug,
{
    println!("{:?}", x);
    println!("{}", x.to_string());
}
}

You can have multiple generic types:

#![allow(unused)]
fn main() {
fn print_it<T, U>(x: T, y: U)
where
    T: ToString + Debug,
    U: ToString + Debug,
{
    println!("{:?}", x);
    println!("{}", x.to_string());
    println!("{:?}", y);
    println!("{}", y.to_string());
}
}

The generics system is almost a programming language in and of itself---you really can build most things with it.

Traits with Generics

See the 04_mem/trait_generic project.

Some traits use generics in their implementation. The From trait is particularly useful, so let's take a look at it:

#![allow(unused)]
fn main() {
struct Degrees(f32);
struct Radians(f32);

impl From<Radians> for Degrees {
    fn from(rad: Radians) -> Self {
        Degrees(rad.0 * 180.0 / std::f32::consts::PI)
    }
}

impl From<Degrees> for Radians {
    fn from(deg: Degrees) -> Self {
        Radians(deg.0 * std::f32::consts::PI / 180.0)
    }
}
}

Here we've defined a type for Degrees, and a type for Radians. Then we've implemented From for each of them, allowing them to be converted from the other. This is a very common pattern in Rust. From is also one of the few surprises in Rust, because it also implements Into for you. So you can use any of the following:

#![allow(unused)]
fn main() {
let behind_you = Degrees(180.0);
let behind_you_radians = Radians::from(behind_you);
let behind_you_radians2: Radians = Degrees(180.0).into();
}

You can even define a function that requires that an argument be convertible to a type:

#![allow(unused)]
fn main() {
fn sin(angle: impl Into<Radians>) -> f32 {
    let angle: Radians = angle.into();
    angle.0.sin()
}
}

And you've just made it impossible to accidentally use degrees for a calculation that requires Radians. This is called a "new type" pattern, and it's a great way to add constraints to prevent bugs.

You can also make the sin function with generics:

#![allow(unused)]
fn main() {
fn sin<T: Into<Radians>>(angle: T) -> f32 {
    let angle: Radians = angle.into();
    angle.0.sin()
}
}

The impl syntax is a bit newer, so you'll see the generic syntax more often.

Generics and Structs

You can make generic structs and enums, too. In fact, you've seen lots of generic enum types already: Option<T>, Result<T, E>. You've seen plenty of generic structs, too: Vec<T>, HashMap<K,V> etc.

Let's build a useful example. How often have you wanted to add entries to a HashMap, and instead of replacing whatever was there, you wanted to keep a list of all of the provided values that match a key.

The code for this is in 04_mem/hashmap_bucket.

Let's start by defining the basic type:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

struct HashMapBucket<K,V>
{
    map: HashMap<K, Vec<V>>
}
}

The type contains a HashMap, each key (of type K) referencing a vector of values (of type V). Let's make a constructor:

#![allow(unused)]
fn main() {
impl <K,V> HashMapBucket<K,V> 
{
    fn new() -> Self {
        HashMapBucket {
            map: HashMap::new()
        }
    }
}

So far, so good. Let's add an `insert` function (inside the implementation block):

```rust
fn insert(&mut self, key: K, value: V) {
    let values = self.map.entry(key).or_insert(Vec::new());
    values.push(value);
}
}

Uh oh, that shows us an error. Fortunately, the error tells us exactly what to do---the key has to support Eq (for comparison) and Hash (for hashing). Let's add those requirements to the struct:

#![allow(unused)]
fn main() {
impl <K,V> HashMapBucket<K,V> 
where K: Eq + std::hash::Hash
{
    fn new() -> Self {
        HashMapBucket {
            map: HashMap::new()
        }
    }

    fn insert(&mut self, key: K, value: V) {
        let values = self.map.entry(key).or_insert(Vec::new());
        values.push(value);
    }
}
}

So now we can insert into the map and print the results:

fn main() {
    let mut my_buckets = HashMapBucket::new();
    my_buckets.insert("hello", 1);
    my_buckets.insert("hello", 2);
    my_buckets.insert("goodbye", 3);
    println!("{:#?}", my_buckets.map);
}

In 21 lines of code, you've implemented a type that can store multiple values for a single key. That's pretty cool. Generics are a little tricky to get used to, but they can really supercharge your productivity.

Amazing Complexity

If you look at the Bevy game engine, or the Axum webserver, you'll find the most mind-boggling combinations of generics and traits. It's not uncommon to see a type that looks like this:

Remember how in Axum you could do dependency injection by adding a layer containing a connection pool, and then every route could magically obtain one by supporting it as a parameter? That's generics and traits at work.

In both cases:

  • A function accepts a type that meets certain criteria. Axum layers are cloneable, and can be sent between threads.
  • The function stores the layers as a generic type.
  • Routes are also generic, and parameters match against a generic+trait requirement. The route is then stored as a generic function pointer.

There's even code that handles <T1>, <T1, T2> and other lists of parameters (up to 16) with separate implementations to handle whatever you may have put in there!

It's beyond the scope of a foundations class to really dig into how that works---but you have the fundamentals.

Error Handling

Much of this section applies to both async and non-async code. Async code has a few extra considerations: you are probably managing large amounts of IO, and really don't want to stop the world when an error occurs!

Rust Error Handling

In previous examples, we've used unwrap() or expect("my message") to get the value out of a Result. If an error occurred, your program (or thread) crashes. That's not great for production code!

Aside: Sometimes, crashing is the right thing to do. If you can't recover from an error, crashing is preferable to trying to continue and potentially corrupting data.

So what is a Result?

A Result is an enum, just like we covered in week 1. It's a "sum type"---it can be one of two things---and never both. A Result is either Ok(T) or Err(E). It's deliberately hard to ignore errors!

This differs from other languages:

LanguageDescriptionError Types
CErrors are returned as a number, or even NULL. It's up to you to decipher what the library author meant. Convention indicates that returning <0 is an error, and >=0 is success.int
C++Exceptions, which are thrown and "bubble up the stack" until they are caught in a catch block. If an exception is uncaught, the program crashes. Exceptions can have performance problems. Many older C++ programs use the C style of returning an error code. Some newer C++ programs use std::expected and std::unexpected to make it easier to handle errors without exceptions.std::exception, expected, int, anything you like!
JavaChecked exceptions---which are like exceptions, but handling them is mandatory. Every function must declare what exceptions it can throw, and every caller must handle them. This is a great way to make sure you don't ignore errors, but it's also a great way to make sure you have a lot of boilerplate code. This can get a little silly, so you find yourself re-throwing exceptions to turn them into types you can handle. Java is also adding the Optional type to make it easier to handle errors without exceptions.Exception, Optional
GoFunctions can return both an error type and a value. The compiler won't let you forget to check for errors, but it's up to you to handle them. In-memory, you are often returning both the value and an empty error structure.error
RustFunctions return an enum that is either Ok(T) or Err(E). The compiler won't let you forget to check for errors, and it's up to you to handle them. Result is not an exception type, so it doesn't incur the overhead of throwing. You're always returning a value or an error, never both.Result<T, E>

So there's a wide range of ways to handle errors across the language spectrum. Rust's goal is to make it easy to work with errors, and hard to ignore them - without incurring the overhead of exceptions. However (there's always a however!), default standard-library Rust makes it harder than it should be.

Strongly Typed Errors: A Blessing and a Curse!

The code for this is in the 03_async/rust_errors1 directory.

Rust's errors are very specific, and can leave you with a lot of things to match. Let's look at a simple example:

use std::path::Path;

fn main() {
    let my_file = Path::new("mytile.txt");
    // This yields a Result type of String or an error
    let contents = std::fs::read_to_string(my_file);
    // Let's just handle the error by printing it out
    match contents {
        Ok(contents) => println!("File contents: {contents}"),        
        Err(e) => println!("ERROR: {e:#?}"),
    }
}

This prints out the details of the error:

ERROR: Os {
    code: 2,
    kind: NotFound,
    message: "The system cannot find the file specified.",
}

That's great, but what if we want to do something different for different errors? We can match on the error type:

#![allow(unused)]
fn main() {
match contents {
    Ok(contents) => println!("File contents: {contents}"),
    Err(e) => match e.kind() {
        std::io::ErrorKind::NotFound => println!("File not found"),
        std::io::ErrorKind::PermissionDenied => println!("Permission denied"),
        _ => println!("ERROR: {e:#?}"),
    },
}
}

The _ is there because otherwise you end up with a remarkably exhaustive list:

#![allow(unused)]
fn main() {
match contents {
    Ok(contents) => println!("File contents: {contents}"),
    Err(e) => match e.kind() {
        std::io::ErrorKind::NotFound => println!("File not found"),
        std::io::ErrorKind::PermissionDenied => println!("Permission denied"),
        std::io::ErrorKind::ConnectionRefused => todo!(),
        std::io::ErrorKind::ConnectionReset => todo!(),
        std::io::ErrorKind::ConnectionAborted => todo!(),
        std::io::ErrorKind::NotConnected => todo!(),
        std::io::ErrorKind::AddrInUse => todo!(),
        std::io::ErrorKind::AddrNotAvailable => todo!(),
        std::io::ErrorKind::BrokenPipe => todo!(),
        std::io::ErrorKind::AlreadyExists => todo!(),
        std::io::ErrorKind::WouldBlock => todo!(),
        std::io::ErrorKind::InvalidInput => todo!(),
        std::io::ErrorKind::InvalidData => todo!(),
        std::io::ErrorKind::TimedOut => todo!(),
        std::io::ErrorKind::WriteZero => todo!(),
        std::io::ErrorKind::Interrupted => todo!(),
        std::io::ErrorKind::Unsupported => todo!(),
        std::io::ErrorKind::UnexpectedEof => todo!(),
        std::io::ErrorKind::OutOfMemory => todo!(),
        std::io::ErrorKind::Other => todo!(),
        _ => todo!(),            
    },
}
}

Many of those errors aren't even relevant to opening a file! Worse, as the Rust standard library grows, more errors can appear---meaning a rustup update run could break your program. That's not great! So when you are handling individual errors, you should always use the _ to catch any new errors that might be added in the future.

Pass-Through Errors

The code for this is in the 03_async/rust_errors2 directory.

If you are just wrapping some very simple functionality, you can make your function signature match the function you are wrapping:

use std::path::Path;

fn maybe_read_a_file() -> Result<String, std::io::Error> {
    let my_file = Path::new("mytile.txt");
    std::fs::read_to_string(my_file)
}

fn main() {
    match maybe_read_a_file() {
        Ok(text) => println!("File contents: {text}"),
        Err(e) => println!("An error occurred: {e:?}"),
    }
}

No need to worry about re-throwing, you can just return the result of the function you are wrapping.

The ? Operator

We mentioned earlier that Rust doesn't have exceptions. It does have the ability to pass errors up the call stack---but because they are handled explicitly in return statements, they don't have the overhead of exceptions. This is done with the ? operator.

Let's look at an example:

#![allow(unused)]
fn main() {
fn file_to_uppercase() -> Result<String, std::io::Error> {
    let contents = maybe_read_a_file()?;
    Ok(contents.to_uppercase())
}
}

This calls our maybe_read_a_file function and adds a ? to the end. What does the ? do?

  • If the Result type is Ok, it extracts the wrapped value and returns it---in this case to contents.
  • If an error occurred, it returns the error to the caller.

This is great for function readability---you don't lose the "flow" of the function amidst a mass of error handling. It's also good for performance, and if you prefer the "top down" error handling approach it's nice and clean---the error gets passed up to the caller, and they can handle it.

What if I just want to ignore the error?

You must handle the error in some way. You can just call the function:

#![allow(unused)]
fn main() {
file_to_uppercase();
}

This will generate a compiler warning that there's a Result type that must be used. You can silence the warning with an underscore:

#![allow(unused)]
fn main() {
let _ = file_to_uppercase();
}

_ is the placeholder symbol - you are telling Rust that you don't care. But you are explicitly not caring---you've told the compiler that ignoring the error is a conscious decision!

You can also use the if let pattern and simply not add an error handler:

#![allow(unused)]
fn main() {
if let Ok(contents) = file_to_uppercase() {
    println!("File contents: {contents}");
}
}

What About Different Errors?

The ? operator is great, but it requires that the function support exactly the type of error that you are passing upwards. Otherwise, in a strong-typed language you won't be able to ensure that errors are being handled.

Let's take an example that draws a bit from our code on day 1.

The code for this is in the 03_async/rust_errors3 directory.

Let's add Serde and Serde_JSON to our project:

cargo add serde -F derive
cargo add serde_json

And we'll quickly define a deserializable struct:

#![allow(unused)]
fn main() {
use std::path::Path;
use serde::Deserialize;

#[derive(Deserialize)]
struct User {
    name: String,
    password: String,
}

fn load_users() {
    let my_file = Path::new("users.json");
    let raw_text = std::fs::read_to_string(my_file)?;
    let users: Vec<User> = serde_json::from_str(&raw_text)?;
    Ok(users)
}
}

This isn't going to compile yet, because we aren't returning a type from the function. So we add a Result:

#![allow(unused)]
fn main() {
fn load_users() -> Result<Vec<User>, Error> {
}

Oh no! What do we put for Error? We have a problem! read_to_string returns an std::io::Error type, and serde_json::from_str returns a serde_json::Error type. We can't return both!

Boxing Errors

There's a lot of typing for a generic error type, but it works:

#![allow(unused)]
fn main() {
type GenericResult<T> = std::result::Result<T, Box<dyn std::error::Error>>;

fn load_users() -> GenericResult<Vec<User>> {
    let my_file = Path::new("users.json");
    let raw_text = std::fs::read_to_string(my_file)?;
    let users: Vec<User> = serde_json::from_str(&raw_text)?;
    Ok(users)
}
}

This works with every possible type of error. Let's add a main function and see what happens:

fn main() {
    let users = load_users();
    match users {
        Ok(users) => {
            for user in users {
                println!("User: {}, {}", user.name, user.password);
            }
        },
        Err(err) => {
            println!("Error: {err}");
        }
    }
}

The result prints:

Error: The system cannot find the file specified. (os error 2)

You have the exact error message, but you really don't have any way to tell what went wrong programmatically. That may be ok for a simple program.

Easy Boxing with Anyhow

There's a crate named anyhow that makes it easy to box errors. Let's add it to our project:

cargo add anyhow

Then you can replace the Box definition with anyhow::Error:

#![allow(unused)]
fn main() {
fn anyhow_load_users() -> anyhow::Result<Vec<User>> {
    let my_file = Path::new("users.json");
    let raw_text = std::fs::read_to_string(my_file)?;
    let users: Vec<User> = serde_json::from_str(&raw_text)?;
    Ok(users)
}
}

It still functions the same way:

Error: The system cannot find the file specified. (os error 2)

In fact, anyhow is mostly just a convenience wrapper around Box and dyn. But it's a very convenient wrapper!

Anyhow does make it a little easier to return your own error:

#![allow(unused)]
fn main() {
#[allow(dead_code)]
fn anyhow_load_users2() -> anyhow::Result<Vec<User>> {
    let my_file = Path::new("users.json");
    let raw_text = std::fs::read_to_string(my_file)?;
    let users: Vec<User> = serde_json::from_str(&raw_text)?;
    if users.is_empty() {
        anyhow::bail!("No users found");
    }
    if users.len() > 10 {
        return Err(anyhow::Error::msg("Too many users"));
    }
    Ok(users)
}
}

I've included the short-way and the long-way - they do the same thing. bail! is a handy macro for "error out with this message". If you miss Go-like "send any error you like", anyhow has your back!

As a rule of thumb: anyhow is great in client code, or code where you don't really care what went wrong---you care that an error occurred and should be reported.

Touring the Rust Ecosystem

Rust and C++ Tooling Equivalencies

This is a cheat sheet for you to refer to later.

Using Cargo

The cargo command is a swiss-army knife that handles building projects, testing them, controlling dependencies and more. It is extensible, you can add more features to it and use it to install programs.

Cargo CommandC++ EquivalentPurpose
Package Commands
cargo init
Compilation
cargo buildmakeBuilds your project, placing the output in the target directory.
cargo runmake ; ./my_programRuns cargo build, and then runs the resulting executable.
cargo checkBuild only the source, and skip assembly and linking for a quick check of syntax.
cargo cleanmake cleanRemoves all build artefacts and empties the target directory.
cargo rustcPass extra rustc commands to the build process
Formatting
cargo fmtFormats your source code according to the Rust defaults.
Testing
cargo testmake testExecutes all unit tests in the current project
cargo benchExecutes all benchmarks in the current project.
Linting
cargo clippyRuns the Clippy linter
cargo fixApplies all Clippy suggestions
Documentation
cargo docBuilds a documentation website from the current project's sourcecode.
cargo rustdocRun the documentation builder with extra command options.
Dependencies
cargo fetchDownloads all dependencies listed in Cargo.toml from the Internet.
cargo addAdd a dependency to the current project's Cargo.toml
cargo removeRemove a dependency from the current project's Cargo.toml file
cargo updateUpdate dependencies to the latest version in Cargo.toml
cargo treeDraw a tree displaying all dependencies, and each dependency's dependencies
cargo vendorDownload all dependencies, and provide instructions to modify your Cargo.toml to use the downloaded dependencies.

Unit Tests

You saw an example unit test when you created a library. Rust/Cargo has a built-in unit testing system. Let's explore it a bit.

Let's build a very simple example, and examine how it works:

The code for this is in projects/part2/unit_test

#![allow(unused)]
fn main() {
fn double(n: i32) -> i32 {
    n * 2
}

#[cfg(test)] // Conditional compilation: only build in `test` mode
mod test { // Create a module to hold the tests
    use super::*; // Include everything from the parent module/namespace

    #[test] // This is a test, we want to include in our unit test runs
    fn two_times() {
        assert_eq!(4, double(2)); // Assert that 2*2 = 4
        assert!(5 != double(2)); // Assert that it doesn't equal 5
    }
}
}

You can run tests for the current project with cargo test. You can append --all to include all projects in the current workspace.

We'll talk about more complicated tests later.

Benchmarking

Cargo has built-in benchmarking, but using it requires the nightly unstable code channel. I generally don't recommend relying on nightly code! If you are writing performance-critical code, benchmarking is essential. Fortunately, Rust makes it relatively straightforward to include benchmarks with a bit of boilerplate.

Quick and Dirty Benchmarks

This example is in project/simple_bench

A quick and dirty way to benchmark operations is to use Instant and Duration:

use std::time::Instant;

fn main() {
    let now = Instant::now();
    let mut i = 0;
    for j in 0 .. 1_000 {
        i += j*j;
    }
    let elapsed = now.elapsed();
    println!("Time elapsed: {} nanos", elapsed.as_nanos());
    println!("{i}");
}

Criterion

This project is in projects/part2/criterion_bench

In Cargo.toml, add:

[dev-dependencies]
criterion = { version = "0.4", features = [ "html_reports" ] }

[[bench]]
name = "my_benchmark"
harness = false

[dev-dependencies] is new! This is a dependency that is only loaded by development tools, and isn't integrated into your final program. No space is wasted.

Create <project>/benches/my_benchmark.rs:

#![allow(unused)]
fn main() {
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 1,
        1 => 1,
        n => fibonacci(n-1) + fibonacci(n-2),
    }
}

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("fib 20", |b| b.iter(|| fibonacci(black_box(20))));
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
}

Run cargo bench and see the result.

Go to target/criterion and you have a full HTML report with statistics.

Flamegraphs

It pretty much requires Linux (and the perf infrastructure), but it's worth looking at Cargo Flamegraphs if you are developing on that platform. It's an easy wrapper around perf for generating flamegraphs to find your hotspots.

FFI: Linking Rust and C or C++

Rust behaves very well when talking to other languages---both as a library for other languages to consume, and as a consumer of other languages' libraries.

We'll refer to "C Libraries"---but we really mean any language that compiles to a C-friendly library format. C, C++, Go, Fortran, Haskell, and many others can all be consumed by Rust.

Consuming C Libraries

The code for this is in 04_mem/c_rust (C Rust)

Let's start with a tiny C library:

// A simple function that doubles a number
int double_it(int x) {
    return x * 2;
}

We'd like to compile this and include it in a Rust program. We can automate compilation by including the ability to compile C (and C++) libraries as part of our build process with the cc crate. Rather than adding it with cargo add, we want to add it as a build dependency. It won't be included in the final program, it's just used during compilation. Open Cargo.toml:

[package]
name = "c_rust"
version = "0.1.0"
edition = "2021"

[dependencies]

[build-dependencies]
cc = "1"

Now we can create a build.rs file in the root of our project (not the src directory). This file will be run as part of the build process, and can be used to compile C libraries. We'll use the cc crate to do this:

fn main() {
    cc::Build::new()
        .file("src/crust.c")
        .compile("crust");
}

build.rs is automatically compiled and executed when your Rust program builds. You can use it to automate any build-time tasks you want. The cc calls will build the listed files and include the linked result in your final program as a static library.

Lastly, let's create some Rust to call the C:

#![allow(unused)]
fn main() {
// Do it by hand
extern "C" {
    fn double_it(x: i32) -> i32;
}

mod rust {
    pub fn double_it(x: i32) -> i32 {
        x * 2
    }
}
}

We've used an extern "C" to specify linkage to an external C library. We've also created a Rust version of the same function, so we can compare the two.

Now let's use some unit tests to prove that it works:

#![allow(unused)]
fn main() {
#[cfg(test)]
mod test {
    use super::*;

    #[test]
    fn test_double_it() {
        assert_eq!(unsafe { double_it(2) }, 4);
    }

    #[test]
    fn test_c_rust() {
        assert_eq!(unsafe { double_it(2) }, rust::double_it(2));
    }
}
}

And it works when we run cargo test.

Header files and BindGen

You need LLVM installed (clang 5 or greater) to use this. On Windows, winget install LLVM.LLVM will work. Also set an environment variable LIBCLANG_PATH to the location of the Clang install. On Windows, $Env:LIBCLANG_PATH="C:\Program Files\LLVM\bin"

Larger C examples will include header files. Let's add crust.h:

int double_it(int x);

And add C to require it:

#include "crust.h"

// A simple function that doubles a number
int double_it(int x) {
    return x * 2;
}

We can add it to the build.rs file, but it will be ignored (it's just a forward declaration).

Writing the extern "C" for a large library could be time consuming. Let's use bindgen to do it for us.

Add another build-dependency:

[build-dependencies]
cc = "1"
bindgen = "0"

Now in build.rs we'll add some calls to use it:

#![allow(unused)]
fn main() {
let bindings = bindgen::Builder::default()
    .header("src/crust.h")
    .parse_callbacks(Box::new(bindgen::CargoCallbacks))
    .generate()
    .expect("Unable to generate bindings");

// Write the bindings to the $OUT_DIR/bindings.rs file.
let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());
bindings
    .write_to_file(out_path.join("bindings.rs"))
    .expect("Couldn't write bindings!");
}

See this page for details

This is pretty much standard boilerplate, but there are a lot of options available.

Now run cargo build. You'll see a new file in target/debug/build/c_rust-*/out/bindings.rs. This is the automatically generated bindings file. Let's use it:

#![allow(unused)]
fn main() {
include!(concat!(env!("OUT_DIR"), "/bindings.rs"));
}

Your compile time has suffered, but now the header is parsed and Rust bindings are generated automatically. The unit tests should still work.

Calling Rust from Other Languages

The code for this is in 04_mem/rust_c (Rust C)

You can also setup Rust functions and structures for export via a C API. You lose some of the richness of the Rust language ---everything has to be C compatible---but you can still use Rust's safety and performance.

Start with some Cargo.toml entries:

[package]
name = "rust_c"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib"]

[dependencies]
libc = "0.2"

Providing a lib and crate-type section lets you change compilation behavior. We're instructing Rust to build a C-compatible static library (it can also take a dynlib for dynamic linkage).

Next, we'll build a single Rust function to export:

#![allow(unused)]
fn main() {
use std::ffi::CStr;

/// # Safety
/// Use a valid C-String!
#[no_mangle]
pub unsafe extern "C" fn hello(name: *const libc::c_char) {
    let name_cstr = unsafe { CStr::from_ptr(name) };
    let name = name_cstr.to_str().unwrap();
    println!("Hello {name}");
}
}

Notice that we're using c_char as an array---just like the C ABI. CStr and CString provide Rust friendly layers between string types, allowing you to convert back and forth. C strings will never be as safe as Rust strings, but this is a good compromise.

We've turned off name mangling, making it easy for the linker to find the function.

The function is also "unsafe"---because it receives an unsafe C string type.

Build the project with cargo build, and you'll see that target/debug/rust_c.lib (on Windows, .a on Linux) has been created. This is the static library that we can link to from C.

Linkage via C requires a header file. In this case, it's pretty easy to just write one:

void hello(char *name);

You can now use this in C or another language. In Go, it looks like this:

package main

/*
#cgo LDFLAGS: ./rust_c.a -ldl
#include "./lib/rust_c.h"
*/
import "C"

import "fmt"
import "time"

func main() {
	start := time.Now()
    fmt.Println("Hello from GoLang!")
	duration := time.Since(start)
	fmt.Println(duration)
	start2 := time.Now()
	C.hello(C.CString("from Rust!"))
	duration2 := time.Since(start2)
	fmt.Println(duration2)
}

(There's a few microseconds delay in the Rust call, but it's pretty fast! Marshaling the C string in Go is the slowest part).

Using CBindGen to Write the Header For You

Setup cbindgen as a build dependency:

[package]
name = "rust_c"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib"]

[dependencies]
libc = "0.2"

[build-dependencies]
cbindgen = "0.24"

And once again, add a build.rs file:

use std::env;
use std::path::PathBuf;
use cbindgen::Config;


fn main() {
    let crate_dir = env::var("CARGO_MANIFEST_DIR").unwrap();

    let package_name = env::var("CARGO_PKG_NAME").unwrap();
    let output_file = target_dir()
        .join(format!("{}.hpp", package_name))
        .display()
        .to_string();

    let config = Config {
        //namespace: Some(String::from("ffi")),
        ..Default::default()
    };

    cbindgen::generate_with_config(&crate_dir, config)
      .unwrap()
      .write_to_file(&output_file);
}

/// Find the location of the `target/` directory. Note that this may be 
/// overridden by `cmake`, so we also need to check the `CARGO_TARGET_DIR` 
/// variable.
fn target_dir() -> PathBuf {
    if let Ok(target) = env::var("CARGO_TARGET_DIR") {
        PathBuf::from(target)
    } else {
        PathBuf::from(env::var("CARGO_MANIFEST_DIR").unwrap()).join("target")
    }
}

This is boilerplate from this guide

Now run cargo build and a target directory appears - with a header file.