Introduction

Welcome to the Ardan Labs class on Rust FFI - the Foreign Function Interface.

This class is designed to teach you how to use Rust to interoperate with other languages. We will focus on C and C++ in this class, but the concepts can be applied to any language that supports a C FFI.

GitHub Reference

All of the files for this presentation are available here: https://github.com/thebracket/ArdanFfi

Building this Guide Locally

If you'd like to have a copy of this reference guide, clone the GitHub repo above and follow these steps in a terminal:

cargo install mdbook # If you aren't already an mdbook user
cd ArdanFfi/manual
mdbook serve

You can now open the URL http://localhost:3000 and have a locally served version of this guide.

Outline

Who is Herbert Wolverson?

I'm Herbert Wolverson, Rust Lead with Ardan Labs. I'm also the author of Hands-on Rust and Rust Brain Teasers with Pragmatic Programmers.

Language Wars!

Seriously, Come in Peace

The number 1 complaint about Rustaceans is that we tend to pop up and say "should've written it in Rust", or "rewrite it in Rust!".

That's not how you make friends or influence people. C has been around since 1972, and C++ since 1985. Rust draws from many of their mistakes --- but it also draws from many of their successes. By all means, rewrite things in Rust - but please, please, please - let's keep it friendly!

If You Get Stuck

I've included a Dockerfile, code/Dockerfile.ex01. so you can marvel at the splendour of Hello World in C!

You can run the Dockerfile with:

cd code/
docker build -t ffi . # You can also type "make"
docker run -it ffi

# You're now in a bash prompt. vim is available. You can run the examples in there.

Getting Started with FFI

FFI stands for "Foreign Function Interface". While that sounds rather alarming, it just means "stuff that wasn't written in Rust".

The "C ABI" --- Application Binary Interface, we're playing acronym soup --- is the lingua franca of modern operating systems. It's sometimes the only well-defined binary interface for calling functions and exchanging data. Unfortunately, you can't just drop a Rust library straight into a C++, Go, C#, etc. project and expect it to do much. The languages are different, and need a common language in order to communicate.

FFI is that commonality.

Rust is great at FFI, it was one of the original design goals. There's no performance penalty (like CGo, C# Marshaling, etc.), but you do lose some of the richness of the type system.

How to Avoid Doing FFI

If your goal is not do FFI, this may have been the wrong class! With that said, it's important to think about alternatives.

You could:

Wrap up the code you need in another language in an executable and run it with std::process::Command.
- Now every call requires that you setup its inputs.
- Every call requires that the OS load the program, allocate it, etc.
- Now you have to read the input.
- And worst of all - you aren't writing Rust!
You could put it on a microservice
- Now you pay for a network call every time you need it.
- You still have to wrap the input/output, and
- You're still not writing Rust.

FFI and Safety

You're going to type unsafe a lot more than you're (hopefully) used to! There's even been a proposal for a "safe unsafe" flag!

The "safe unsafe" tag is horribly named, but the concept makes some sense. FFI involves a lot of "unsafe" tags, because you are stepping beyond what Rust can verify - rather than doing something actually unsafe. "Safe Unsafe" would be a way to tag that while this operation can't be verified, it has been extensively checked.

The Types of Unsafe

The unsafe tag isn't inherently bad. It's a way to tell the compiler that you're doing something that cannot be verified by Rust's safety rules. FFI - calling code outside of Rust - is inherently "unsafe" because Rust can't reach into the external code and verify that it, too, is safe.

You tend to run into a few types of "unsafe" code:

External Code: you're calling code that Rust cannot verify.
Dangerous Pointers: you're interacting with pointers in a way that Rust can't verify. For example, linked lists. You run into this inside many libraries. The unsafe tag serves as a marker---"check here".
But It's Faster: you're going for an optimization that Rust can't verify as safe. Don't do this unless you really have no alternative, and have profiled extensively.
YOLO: You Only Live Once, and really want to do something. Don't do this.

What's Unsafe?

In the 2024 edition, Rust has marked a few more things "unsafe". The class uses the Rust 2024 edition; if you're running an older compiler, you'll need to run rustup update to get the latest edition.

So... let's write some horribly unsafe code!

I promise, it's not really that unsafe!

What is FFI Anyway?

At the lowest level, FFI is a way to expose functions and data types in a way that other languages can understand. Just about every language out there supports some form of FFI.

You're using FFI, probably right now. C++, Java, Python, C#, etc. need it to call C functions. Every system call via libc that your browser makes is going through FFI!

So Why is it Needed?

Languages other than C mangle names. For example, a Rust library that exports example_function may well list it as _RNvCskwGfYPst2Cb_3foo16example_functionfoo::example_function in the file header. A C++ function is equally mangled. It's even scarier with Go and other managed languages - they handle functions themselves, and won't even export the function until you tell them to. You can decipher that (rustfilt exists for it!) - but your link process just became pretty terrifying.
struct Foo { i: i32, j: i16 } looks pretty nice, and is probably struct Foo { int i, short j }; in C. But it might not be. Rust is allowed to rearrange your functions. C++ can, too. Once again, it's even scarier with managed languages!

But don't despair: this is a well-trodden path, and one that can help you rewrite things in Rust rather than heckling on social media!

For example, here is the output from nm -an hello_world | grep main on a Rust binary:

0000000000007a90 t _ZN11hello_world4main17h6a1de0d75d6764daE

Notice that the main function is mangled? You can't call it from another language without knowing the mangling scheme - and Rust is allowed to change it at any time, including on recompilation of the same file.

So there's not really a sane way to avoid using FFI if you want to call Rust from another language - or vice versa.

A Gentle Start

Let's start gently, with a simple example.

You've all written:

fn main() {
    println!("Hello, world!");
}

Chances are pretty good that at some point in your career, you've written this, too:

#include <stdio.h>

int main() {
    printf("Hello, world!\n");
    return 0;
}

Let's combine the two, and make "Hello World" as a C library!

"Hello World" as a cloud managed service probably exists, too. It probably even uses Kubernetes.

Hello C

Let's start with a little bit of Rust. We'll make a new project.

cargo new hello_c

As usual, you'll have a project skeleton:

.
├── Cargo.toml
└── src
    └── main.rs

In the src directory, make a new file named hello.c. This is our revolutionary new C library:

#include <stdio.h>

void say_hello() {
    printf("Hello, world!\n");
}

Your directory structure should look like this:

#![allow(unused)]
fn main() {
.
├── Cargo.toml
└── src
    ├── hello.c
    └── main.rs
}

We're not going to bother making a header file, CMake, Makefile or anything else. Yet.

Hello Rust

Sadly, simply having a .c file in your project isn't enough to make it work. Maybe that'll be a future Cargo feature!

We need to tell Rust that the C function exists. In src/main.rs, add the following:

#![allow(unused)]
fn main() {
extern "C" {
    fn say_hello();
}
}

There's a few things here to remember:

extern means that the function is defined elsewhere.
"C" is the calling convention. This is the default for C, and is typically the only one you'll use.
fn say_hello(); is the function signature. For now, it's up to you to match void say_hello() to fn say_hello() (easy enough - just wait...).

Next up, we need to call the function. In src/main.rs, add the following:

/// Safety: C is inherently unsafe!
fn main() {
    // Let's say "hello world" from C.
    unsafe {
        say_hello();
    }
}

There's a bit to mention here, too:

We have a Safety comment! If you are using unsafe code, Clippy really wants you to have one of these.
We're wrapping say_hello in unsafe.

For all the eager beavers who ran cargo run - it won't work yet! We haven't actually compiled the C code.

Pop Quiz: Why is `say_hello` wrapped in `unsafe`?

C is inherently unsafe.
Unsafe improves performance.
Unsafe is required for FFI.

The answer is 3, although I really want it to be 1.

unsafe doesn't actually mean that a function is unsafe! It means that the function falls outside what Rust can guarantee is safe. FFI functions are inherently unsafe, because you are calling code outside of Rust's control.

Build the C Library

Sadly, simply having a .c file in your project isn't enough to make it work.

Note: this is where we find out if everyone's laptop has the right software installed!

There's a lot of ways to build a C library. It's one of the reasons to love Cargo! We're going to use the cc crate, which will try its very best to find a workable C compiler on your system, invoke it to build the library, and link it to your Rust program.

First, add the cc crate to your Cargo.toml:

[build-dependencies]
cc = "1.0"

Now we're going to make a build.rs file. If you haven't used build.rs before, it's a special file that Cargo will run before building your project.

In the root of your project (NOT the src directory), make a new file named build.rs:

fn main() {
    cc::Build::new()
        .file("src/hello.c")
        .compile("hello");
}

Your tree should look like this:

.
├── build.rs
├── Cargo.toml
└── src
    ├── hello.c
    └── main.rs

So, let's see what happens! If all goes well:

cargo run
Hello, world!

And hidden in your target/debug directory there's even a libhello.a - a static C library.

But What If It Doesn't Work?

This is the fun part. If you don't have a C compiler installed, it's not a C compiler that Rust can find, you'll have a not overly helpful error message.

On Windows, you need the Build Tools for Visual Studio.
On Mac, you need to install the Xcode Command Line Tools. You can do this by running xcode-select --install in your terminal.
On Linux, you need to install build-essential. I personally like to install clang as well.

Extern C Gets Old, Fast

It's not really a big deal for "hello world" - but once you start linking with larger C libraries, you'll really start to hate writing extern "C" { ... } blocks with all the function signatures.

Let's pretend that Hello World is a real library, and make add hello.h (next to hello.c in the src directory):

#ifndef HELLO_H
#define HELLO_H

void say_hello(); /* Function prototype for say_hello() */

#endif

Yes, I know that #pragma once is a great way to start fights in the C++ community.

We'll use a tool named bindgen to read the header file and build the Rust for us. Let's add bindgen to our build dependencies:

[workspace.dependencies]
cc = "1"
bindgen = "0"

Bindgen is proof that some projects never get to version 1. It's been 0.x since 2014!

Now we can update build.rs to generate the bindings for us:

use std::env;
use std::path::PathBuf;

fn main() {
    // Read the header
    let bindings = bindgen::Builder::default()
        .header("src/hello.h")
        .parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
        .generate()
        .expect("Unable to generate bindings");

    // Emit the bindings
    let out_path = PathBuf::from(env::var("OUT_DIR").unwrap());
    bindings
        .write_to_file(out_path.join("hello.rs"))
        .expect("Couldn't write bindings!");

    // Build the C code
    cc::Build::new()
        .file("src/hello.c")
        .compile("hello");
}

And update main.rs to use the generated bindings. Replace the entire extern "C" block with:

#![allow(unused)]
fn main() {
include!(concat!(env!("OUT_DIR"), "/hello.rs"));
}

Now, on every compilation the hello.h will be turned into Rust (which is really handy if the C might change) and bindings generated.

Nestled somewhere in your target directory, you'll find a hello.rs file that looks like this:

#![allow(unused)]
fn main() {
/* automatically generated by rust-bindgen 0.70.1 */

extern "C" {
    pub fn say_hello();
}
}

Stuck?

If it didn't compile, you might need to install clang and llvm on your system. Go to: https://rust-lang.github.io/rust-bindgen/requirements.html for a guide.

There's also a Dockerfile available:

cd workshops/
docker build -t ex02 -f Dockerfile.ex02 .
docker run -it ex02

How about some Rust from C?

Consuming C (and libraries with a C interface) from Rust is generally pretty straightforward -- although we'll look at some of the "ugly" cases in a bit. How about going the other way? You want to write some Rust, and make use of it from another language.

Note that Python has great support for calling Rust code. We'll touch on it at the end of the class, and hopefully it'll be its own class soon.

Create A Portable Rust Function

Since we did "Hello World" from C, let's do "Hello World" from Rust:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn say_hello() {
    println!("Hello, World from Rust.")
}
}

There's a bit of extra stuff here!

no_mangle disables "name mangling". Rust (just like C++) adds all kinds of stuff to your function names in libraries if you don't turn mangling off. This helps the compiler build Rust projects, but also makes it pretty much impossible to figure out the function name when linking from C. So you have to turn off mangling.
extern "C" in the function header. This sets the function's calling convention. The optimizer won't helpfully rearrange your arguments, decide to inline your function, or replace any arguments with registers. Without this, Rust reserves the right to do whatever it thinks might help. For once, we dont' want it to be helpful.

And Build It!

Build it with cargo build. In your target/debug directory, you should see something like:

-rw-r--r--@   1 herbert  staff   160B Jan  8 12:16 libex03_rust_from_c.d
-rwxr-xr-x@   1 herbert  staff   397K Jan  8 12:16 libex03_rust_from_c.dylib

You'll see a .so on Linux, a .dylib on Mac and a .dll on Windows.

Let's Write some C

In a new directory, create a C file:

void say_hello();

int main() {
    say_hello();
    return 0;
}

Now copy libex03_rust_from_c.so (or .dylib etc) into your C source directory. On Mac/Linux, compile with:

cc rust_from_c.c -o rust_from_c -L. -lex03_rust_from_c

Run ./rust_from_c and it prints "Hello, World from Rust."

On Linux, you may need LD_LIBRARY_PATH=. ./rust_from_c. Linux doesn't look in the current directory by default!

This is Painful - Automate!

There's a script in the repo "c" directory called build_linux.sh. It makes this a lot easier:

#!/bin/bash

# Setting the CARGO_TARGET_DIR lets you specify where the build will go
CARGO_TARGET_DIR="tmp" cargo build
# Since we know the output, we can copy it
cp tmp/debug/libex03_rust_from_c.so .
# Clean up afterwards!
CARGO_TARGET_DIR="tmp" cargo clean
# Invoke the C compiler
cc rust_from_c.c -o rust_from_c -L. -lex03_rust_from_c
# Run it, including the LD_LIBRARY_PATH
LD_LIBRARY_PATH=. ./rust_from_c

Static Linkage

Let's do the same thing, but statically linked. It's a lot easier.

This makes some UNIX people grumpy. Every binary has a copy of its dependencies when you statically link. So you're using more disk space, you can't update a single file and fix all the binaries that use it (useful for security). On the other hand, your binary is self-contained, no more LD_LIBRARY fights and it fits better with the Rust way. So be prepared to be flexible!

In your Cargo.toml, change the crate-type:

[lib]
crate-type = ["staticlib"] # Will create .a on Linux & Mac, .lib on Windows

Now when you build, you produce a .a file (or a .lib on Windows, which likes to be special).

Now you can simplify your build_linux.sh script:

#!/bin/bash
# Constrain the build location
CARGO_TARGET_DIR="tmp" cargo build
# You need the .a file now
cp tmp/debug/libex03a_rust_from_c_static.a .
# Cleaning up makes people happy
CARGO_TARGET_DIR="tmp" cargo clean
# Include the .a like any other C .a file in your build command
cc rust_from_c.c -o rust_from_c_static libex03a_rust_from_c_static.a
# Runs with no linkage magic
./rust_from_c_static

Size

Your statically linked rust_from_c_static binary is 4.5 Megabytes. Your dynamically linked rust_from_c is 16kb (with a 3.7 mb dynamic library in tow). We're not doing any sort of optimization, but that's why the UNIX people worry.

The original PDP-11 maxed out at 4Mb of RAM. The PDP-7 that ran a mini-UNIX had a maximum of 8kb!

Auto-generating a C Header

Manually writing headers or function entries for every Rust function you create is ok for a tiny single-function project, but gets pretty painful when you are exporting a larger project.

Note that I often maintain the headers myself, especially when dealing with languages other than C. So this is optional, but can save you some time. We've also switch to ex04_rust_headers in the repo.

Add cbindgen to Cargo.toml:

[build-dependencies]
cbindgen = "0.24"

And add a build.rs:

fn main() {
    // Generate the header file
    cbindgen::generate(".")
        .expect("Unable to generate bindings")
        .write_to_file("bindings.h");
}

When you build the project, as well as your dynamic library - you will find a bindings.h file:

Note: This requires libclang to work. That can be a little painful on Windows! There's a Docker version if you're stuck.

#ifndef MY_PROJECT_BINDINGS_H
#define MY_PROJECT_BINDINGS_H

#include <stdarg.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

void say_hello(void);

#endif /* MY_PROJECT_BINDINGS_H */

That Was Easy! Let's Go Home

Obviously, there's a lot more to cover. But that really was the basics of FFI---and in a lot of cases, that's enough to get you going.

Now we're going to start diving into some of the bad and ugly parts!

Working with C in Rust

It's really common to have some functionality in C that you want to use from Rust. It's quite common for this to be part of an effort to port the code over to Rust using a pattern like this:

Get the C linked in and working.
Write some unit tests to ensure that the C is doing what you think it's doing.
Write a Rust version, possibly line-by-line porting.
Run the same unit tests on the Rust version.
Stop using C, announce to the Ministry of Defence that your code is now safe, and roll around in a bathtub of grant money.

So for this section, we'll act like we're porting some simple C to Rust - and get a feel for the process, and some of the pitfalls you might not have expected.

Our Porting Environment

I recommend grabbing this from the repo (ex05_porting) rather than retyping the whole thing. We'll use that as a starting point.

Let's take a quick look around what we have:

src
- clib.h - the C library we're going to be working with.
- clib.c - the C header for the library.
- lib.rs - a Rust library you are building, to port your amazing C library.
build.rs - the same Bindgen and CC setup you used before.
Cargo.toml - has bindgen and cc as dev dependencies.

If you look at clib.h, we're starting with a really simple C function:

char double_byte(char n);

In lib.rs, we've imported the source and setup a simple unit test:

#![allow(unused)]
fn main() {
mod c_lib {
    include!(concat!(env!("OUT_DIR"), "/clib.rs"));
}

#[cfg(test)]
mod tests {
    use super::c_lib::*;

    #[test]
    fn test_double() {
        unsafe {
            assert_eq!(double_byte(2), 4);
        }
    }
}
}

You can even run it, the unit tests pass. Hooray! C and Rut agree, 2 times 2 equals 4.

Porting Our First Function

So we have the devilishly hard C function:

char double_byte(char n) {
    return n * 2;
}

Let's stretch our Rust abilities and write a Rust equivalent. We'll put it in a rs module, just to help keep the two separated.

#![allow(unused)]
fn main() {
mod rs {
    pub fn double_byte(n: i8) -> i8 { n * 2 }
}

pub use rs::*;
}

Nothing special here. We are putting our ported Rust into a module (you'd probably use separate files in a real project), and since it's a library - we're exporting things. Mostly because squiqqly "unused" warnings annoy me.

A Simple Test

There's almost no point in testing something this simple....

#![allow(unused)]
fn main() {
#[test]
fn test_double() {
    unsafe {
        assert_eq!(double_byte(2), 4);
    }
    assert_eq!(super::rs::double_byte(2), 4);
}
}

Run the test, and 2 times 2 equals 4.

That Wasn't a Gotcha - Let's Test a Bit More!

Let's write a much more comprehensive test, since we're dealing with such a small range of values:

#![allow(unused)]
fn main() {
#[test]
fn range_test_double() {
    for n in c_char::MIN .. c_char::MAX {
        let c_result = unsafe { double_byte(n) };
        let rust_result = super::rs::double_byte(n);
        assert_eq!(c_result, rust_result);
    }
}
}

Run the test, and it panics. And it panics in the Rust side. How could doubling a byte possibly be unsafe? Overflow.

There's another gotcha here! On an M1 Mini in Docker, char is unsigned. This is a great example of the perils of FFI! Let's fire up the Docker version and see what happens...

Overflow

In C, numeric operations overflow (it can be undefined behavior on some editions of C, but the behavior is consistent enough that everyone expects it). So for a signed 8-bit number, 64 * 2 = -128 (the first bit is used to indicate sign - so it overflows and sets the negative bit).

Now, as the Rust porter - you have to try to figure out what the C programmer was thinking.

Did they WANT overflow?

Some algorithms use numeric overflow. Rust can do that:

#![allow(unused)]
fn main() {
pub fn double_byte(n: i8) -> i8 { n.wrapping_mul(2) }
}

Now your program won't panic, and it is clearly documented that you intend to wrap. That's more important than you might think. In a few decades when Rust programmers are grumbling that some new language is taking our jobs - it's now obvious to the maintainer what you intended. More reasonably, it's obvious to you when you come back in a few weeks.

Did they NOT want overflow?

Sometimes, you are lucky and you will find a C programmer who remembered to add some guard code to handle unintended overflow. If you are really lucky, you'll see something like this:

#include <limits.h>
#include <stdbool.h>

bool double_byte(char n, char *result) {
    // Check for overflow before multiplying
    if (n > CHAR_MAX / 2 || n < CHAR_MIN / 2) {
        return false; // Indicate overflow
    }
    *result = n * 2;
    return true; // Indicate success
}

Note: there are many different approaches to this in common use, some of them compiler specific, and some of them don't actually work...

Usage for this is fun, because regular C doesn't support tuples or other multi-returns.

#include <stdio.h>

int main() {
    char n = 64;
    char result;

    if (double_byte(n, &result)) {
        printf("Doubled value: %d\n", result);
    } else {
        printf("Overflow occurred!\n");
    }

    return 0;
}

A literal port would give you:

#![allow(unused)]
fn main() {
fn double_byte(n: i8, result: &mut bool) -> i8
}

You could port it like that, but if you can deduce the intent --- you are far better off porting to idiomatic Rust:

#![allow(unused)]
fn main() {
fn double_byte(n: i8) -> Option<i8> {
    n.checked_mul(n)
}
}

What if you have NO IDEA what they wanted?

This happens a LOT more than you might think. The bad news is that you need to either:

Pick one.
Trace the original function usage, and find out if anything relies upon wrapping behavior.

It's quite common to discover that your decades old code has had a bug for years and years. It's sadly common to discover that you have been relying on that bug!

C Types

C is a simple language, just like a high-level assembly, right? Let's talk a bit about C types - and continue to port some code.

Pop Quiz

The C type int. Raise your hand if you beleve it means:

A signed integer.

A signed 32-bit integer.

A type that is at least 16-bits wide, but the actual width is implementation defined. It is guaranteed to be at least as large as a short int, is not guaranteed to be smalller than a long and is frequently, but not always, the native bit-size of your target platform.

This a a trick question, because the C standard has been adding requirements to integers. When C first came along, it didn't have to be signed - because not every platform supported it. The most recent C++ standard even requires two's complement encoding.

C Types

signed char
short int (or short)
int
long int (or long)
long long int (or long long)
unsigned char
unsigned short int (or unsigned short)
unsigned int
unsigned long int (or unsigned long)
unsigned long long int (or unsigned long long)
float
double
long double
char (can be either signed or unsigned, depending on the implementation, but is a distinct type from signed char and unsigned char).
void

With the exception of void (and char in the most recent C standard) - all of these are platform dependent. This is mostly because C has been around for a while!

For example:

The TI DSP TMS320 uses a 16 bit char!
The old CDC 6000 series had char available in 6, 9 and 12 bit types.
You really should check CHAR_BIT in <limits.h> for exotic platforms.

Matching C Types with FFI Types

When you are dealing with C libraries and may need to support multiple platforms, Rust includes a bunch of "c types" to assist you.

Let's rewrite our Rust implementation of double_byte to use one of Rust's FFI types. These are defined for each platform to match the equivalent C code - they are type aliases.

#![allow(unused)]
fn main() {
mod rs {
    use std::ffi::c_char;

    pub fn double_byte(n: c_char) -> c_char { n.wrapping_mul(2) }
}
}

There's a few implications here to ponder:

You no longer know exactly how big your char is on your target platform. So you can no longer make assumptions beyond the lowest common denominator: the C standard does now require that a char have at least 8 bits.
Could you safely assume that it's an i8, and write accordingly? In a lot of cases, the answer is "yes". It's up to you.

Let's go and see what bindgen created:

#![allow(unused)]
fn main() {
pub fn double_byte(n: ::std::os::raw::c_char) -> ::std::os::raw::c_char;
}

Oh dear. We have os::raw and std:ffi???

Some of Rust's Murkier Corners

When you dive deeply into the Rust standard library, a few things become less than clear. Here's the definition of std::os::raw:c_char:

#![allow(unused)]
fn main() {
macro_rules! alias_core_ffi {
    ($($t:ident)*) => {$(
        #[stable(feature = "raw_os", since = "1.1.0")]
        #[doc = include_str!(concat!("../../../../core/src/ffi/", stringify!($t), ".md"))]
        #[doc(cfg(all()))]
        pub type $t = core::ffi::$t;
    )*}
}

alias_core_ffi! {
    c_char c_schar c_uchar
    c_short c_ushort
    c_int c_uint
    c_long c_ulong
    c_longlong c_ulonglong
    c_float
    c_double
    c_void
}
}

That's a very long way of definiing type aliases into the FFI types we used. So that's ok then! We used the same type. How does FFI define it?

#![allow(unused)]
fn main() {
type_alias! { "c_char.md", c_char = c_char_definition::c_char; #[doc(cfg(all()))] }
}

Oh boy, another macro! And now it points into a thing called c_char_definition? Following the definition, you see some of the magic that makes Rust work. In .rustup/toolchains/stable-aarch-apple-darwin/lib/rustlib/src/rust/library/core/src/ffi/mod.rs (whew!), there's a platform specific definition that maps all of the platform C types to the FFI types, which are in turn aliases into os types.

Isn't it nice to not have to worry about this, most of the time? Since the C ABI is the lingua-franca between languages, every single Rust implementation has to provide these.

Using stdint.h Types

Safety standards for C - such as MISRA-B (pronounced "miserable" by many programmers) - find the C type standards as unpalattable as I do. Much OS development such as Linux also recommends specifying exactly what type you want to use. C supports this with stdint.h.

Let's modify our clib to use it:

First the header:

#pragma once

#include <stdint.h>

int8_t double_byte(int8_t n);

Then the body:

#include "clib.h"

int8_t double_byte(int8_t n) {
    return n * 2;
}

Aren't you glad that Rust doesn't make you specify everything twice?

From the Rust point-of-view, this is much nicer. You can be sure that an int8_t is the same as an i8, and an uint32_t is a u32. No more figuring out what a long long int means on this platform! And we can go back to some familar Rust:

#![allow(unused)]
fn main() {
mod rs {
    pub fn double_byte(n: i8) -> i8 { n.wrapping_mul(2) }
}
}

If the code-base you are porting supports stdint, you can expect to enjoy reduced mental gymnastics. My experience is split: some shops adopted it enthusiastically, some didn't. It causes less of a fight than most C standard enhancements...

There Has To Be a Downside...

Let's look at what bindgen has given us. We'll scroll for a while:

#![allow(unused)]
fn main() {
/* automatically generated by rust-bindgen 0.70.1 */

pub const __WORDSIZE: u32 = 64;
pub const __has_safe_buffers: u32 = 1;
pub const __DARWIN_ONLY_64_BIT_INO_T: u32 = 1;
pub const __DARWIN_ONLY_UNIX_CONFORMANCE: u32 = 1;
pub const __DARWIN_ONLY_VERS_1050: u32 = 1;
pub const __DARWIN_UNIX03: u32 = 1;
pub const __DARWIN_64_BIT_INO_T: u32 = 1;
pub const __DARWIN_VERS_1050: u32 = 1;
pub const __DARWIN_NON_CANCELABLE: u32 = 0;
pub const __DARWIN_SUF_EXTSN: &[u8; 14] = b"$DARWIN_EXTSN\0";
pub const __DARWIN_C_ANSI: u32 = 4096;
pub const __DARWIN_C_FULL: u32 = 900000;
pub const __DARWIN_C_LEVEL: u32 = 900000;
pub const __STDC_WANT_LIB_EXT1__: u32 = 1;
pub const __DARWIN_NO_LONG_LONG: u32 = 0;
pub const _DARWIN_FEATURE_64_BIT_INODE: u32 = 1;
pub const _DARWIN_FEATURE_ONLY_64_BIT_INODE: u32 = 1;
pub const _DARWIN_FEATURE_ONLY_VERS_1050: u32 = 1;
pub const _DARWIN_FEATURE_ONLY_UNIX_CONFORMANCE: u32 = 1;
pub const _DARWIN_FEATURE_UNIX_CONFORMANCE: u32 = 3;
pub const __has_ptrcheck: u32 = 0;
pub const USE_CLANG_TYPES: u32 = 0;
pub const __PTHREAD_SIZE__: u32 = 8176;
pub const __PTHREAD_ATTR_SIZE__: u32 = 56;
pub const __PTHREAD_MUTEXATTR_SIZE__: u32 = 8;
pub const __PTHREAD_MUTEX_SIZE__: u32 = 56;
pub const __PTHREAD_CONDATTR_SIZE__: u32 = 8;
pub const __PTHREAD_COND_SIZE__: u32 = 40;
pub const __PTHREAD_ONCE_SIZE__: u32 = 8;
pub const __PTHREAD_RWLOCK_SIZE__: u32 = 192;
pub const __PTHREAD_RWLOCKATTR_SIZE__: u32 = 16;
pub const INT8_MAX: u32 = 127;
pub const INT16_MAX: u32 = 32767;
pub const INT32_MAX: u32 = 2147483647;
pub const INT64_MAX: u64 = 9223372036854775807;
pub const INT8_MIN: i32 = -128;
pub const INT16_MIN: i32 = -32768;
pub const INT32_MIN: i32 = -2147483648;
pub const INT64_MIN: i64 = -9223372036854775808;
pub const UINT8_MAX: u32 = 255;
pub const UINT16_MAX: u32 = 65535;
pub const UINT32_MAX: u32 = 4294967295;
pub const UINT64_MAX: i32 = -1;
pub const INT_LEAST8_MIN: i32 = -128;
pub const INT_LEAST16_MIN: i32 = -32768;
pub const INT_LEAST32_MIN: i32 = -2147483648;
pub const INT_LEAST64_MIN: i64 = -9223372036854775808;
pub const INT_LEAST8_MAX: u32 = 127;
pub const INT_LEAST16_MAX: u32 = 32767;
pub const INT_LEAST32_MAX: u32 = 2147483647;
pub const INT_LEAST64_MAX: u64 = 9223372036854775807;
pub const UINT_LEAST8_MAX: u32 = 255;
pub const UINT_LEAST16_MAX: u32 = 65535;
pub const UINT_LEAST32_MAX: u32 = 4294967295;
pub const UINT_LEAST64_MAX: i32 = -1;
pub const INT_FAST8_MIN: i32 = -128;
pub const INT_FAST16_MIN: i32 = -32768;
pub const INT_FAST32_MIN: i32 = -2147483648;
pub const INT_FAST64_MIN: i64 = -9223372036854775808;
pub const INT_FAST8_MAX: u32 = 127;
pub const INT_FAST16_MAX: u32 = 32767;
pub const INT_FAST32_MAX: u32 = 2147483647;
pub const INT_FAST64_MAX: u64 = 9223372036854775807;
pub const UINT_FAST8_MAX: u32 = 255;
pub const UINT_FAST16_MAX: u32 = 65535;
pub const UINT_FAST32_MAX: u32 = 4294967295;
pub const UINT_FAST64_MAX: i32 = -1;
pub const INTPTR_MAX: u64 = 9223372036854775807;
pub const INTPTR_MIN: i64 = -9223372036854775808;
pub const UINTPTR_MAX: i32 = -1;
pub const SIZE_MAX: i32 = -1;
pub const RSIZE_MAX: i32 = -1;
pub const WINT_MIN: i32 = -2147483648;
pub const WINT_MAX: u32 = 2147483647;
pub const SIG_ATOMIC_MIN: i32 = -2147483648;
pub const SIG_ATOMIC_MAX: u32 = 2147483647;
pub type int_least8_t = i8;
pub type int_least16_t = i16;
pub type int_least32_t = i32;
pub type int_least64_t = i64;
pub type uint_least8_t = u8;
pub type uint_least16_t = u16;
pub type uint_least32_t = u32;
pub type uint_least64_t = u64;
pub type int_fast8_t = i8;
pub type int_fast16_t = i16;
pub type int_fast32_t = i32;
pub type int_fast64_t = i64;
pub type uint_fast8_t = u8;
pub type uint_fast16_t = u16;
pub type uint_fast32_t = u32;
pub type uint_fast64_t = u64;
pub type __int8_t = ::std::os::raw::c_schar;
pub type __uint8_t = ::std::os::raw::c_uchar;
pub type __int16_t = ::std::os::raw::c_short;
pub type __uint16_t = ::std::os::raw::c_ushort;
pub type __int32_t = ::std::os::raw::c_int;
pub type __uint32_t = ::std::os::raw::c_uint;
pub type __int64_t = ::std::os::raw::c_longlong;
pub type __uint64_t = ::std::os::raw::c_ulonglong;
pub type __darwin_intptr_t = ::std::os::raw::c_long;
pub type __darwin_natural_t = ::std::os::raw::c_uint;
pub type __darwin_ct_rune_t = ::std::os::raw::c_int;
#[repr(C)]
#[derive(Copy, Clone)]
pub union __mbstate_t {
    pub __mbstate8: [::std::os::raw::c_char; 128usize],
    pub _mbstateL: ::std::os::raw::c_longlong,
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of __mbstate_t"][::std::mem::size_of::<__mbstate_t>() - 128usize];
    ["Alignment of __mbstate_t"][::std::mem::align_of::<__mbstate_t>() - 8usize];
    ["Offset of field: __mbstate_t::__mbstate8"]
        [::std::mem::offset_of!(__mbstate_t, __mbstate8) - 0usize];
    ["Offset of field: __mbstate_t::_mbstateL"]
        [::std::mem::offset_of!(__mbstate_t, _mbstateL) - 0usize];
};
pub type __darwin_mbstate_t = __mbstate_t;
pub type __darwin_ptrdiff_t = ::std::os::raw::c_long;
pub type __darwin_size_t = ::std::os::raw::c_ulong;
pub type __darwin_va_list = __builtin_va_list;
pub type __darwin_wchar_t = ::std::os::raw::c_int;
pub type __darwin_rune_t = __darwin_wchar_t;
pub type __darwin_wint_t = ::std::os::raw::c_int;
pub type __darwin_clock_t = ::std::os::raw::c_ulong;
pub type __darwin_socklen_t = __uint32_t;
pub type __darwin_ssize_t = ::std::os::raw::c_long;
pub type __darwin_time_t = ::std::os::raw::c_long;
pub type __darwin_blkcnt_t = __int64_t;
pub type __darwin_blksize_t = __int32_t;
pub type __darwin_dev_t = __int32_t;
pub type __darwin_fsblkcnt_t = ::std::os::raw::c_uint;
pub type __darwin_fsfilcnt_t = ::std::os::raw::c_uint;
pub type __darwin_gid_t = __uint32_t;
pub type __darwin_id_t = __uint32_t;
pub type __darwin_ino64_t = __uint64_t;
pub type __darwin_ino_t = __darwin_ino64_t;
pub type __darwin_mach_port_name_t = __darwin_natural_t;
pub type __darwin_mach_port_t = __darwin_mach_port_name_t;
pub type __darwin_mode_t = __uint16_t;
pub type __darwin_off_t = __int64_t;
pub type __darwin_pid_t = __int32_t;
pub type __darwin_sigset_t = __uint32_t;
pub type __darwin_suseconds_t = __int32_t;
pub type __darwin_uid_t = __uint32_t;
pub type __darwin_useconds_t = __uint32_t;
pub type __darwin_uuid_t = [::std::os::raw::c_uchar; 16usize];
pub type __darwin_uuid_string_t = [::std::os::raw::c_char; 37usize];
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct __darwin_pthread_handler_rec {
    pub __routine: ::std::option::Option<unsafe extern "C" fn(arg1: *mut ::std::os::raw::c_void)>,
    pub __arg: *mut ::std::os::raw::c_void,
    pub __next: *mut __darwin_pthread_handler_rec,
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of __darwin_pthread_handler_rec"]
        [::std::mem::size_of::<__darwin_pthread_handler_rec>() - 24usize];
    ["Alignment of __darwin_pthread_handler_rec"]
        [::std::mem::align_of::<__darwin_pthread_handler_rec>() - 8usize];
    ["Offset of field: __darwin_pthread_handler_rec::__routine"]
        [::std::mem::offset_of!(__darwin_pthread_handler_rec, __routine) - 0usize];
    ["Offset of field: __darwin_pthread_handler_rec::__arg"]
        [::std::mem::offset_of!(__darwin_pthread_handler_rec, __arg) - 8usize];
    ["Offset of field: __darwin_pthread_handler_rec::__next"]
        [::std::mem::offset_of!(__darwin_pthread_handler_rec, __next) - 16usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_attr_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 56usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_attr_t"][::std::mem::size_of::<_opaque_pthread_attr_t>() - 64usize];
    ["Alignment of _opaque_pthread_attr_t"]
        [::std::mem::align_of::<_opaque_pthread_attr_t>() - 8usize];
    ["Offset of field: _opaque_pthread_attr_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_attr_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_attr_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_attr_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_cond_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 40usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_cond_t"][::std::mem::size_of::<_opaque_pthread_cond_t>() - 48usize];
    ["Alignment of _opaque_pthread_cond_t"]
        [::std::mem::align_of::<_opaque_pthread_cond_t>() - 8usize];
    ["Offset of field: _opaque_pthread_cond_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_cond_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_cond_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_cond_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_condattr_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 8usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_condattr_t"]
        [::std::mem::size_of::<_opaque_pthread_condattr_t>() - 16usize];
    ["Alignment of _opaque_pthread_condattr_t"]
        [::std::mem::align_of::<_opaque_pthread_condattr_t>() - 8usize];
    ["Offset of field: _opaque_pthread_condattr_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_condattr_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_condattr_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_condattr_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_mutex_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 56usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_mutex_t"][::std::mem::size_of::<_opaque_pthread_mutex_t>() - 64usize];
    ["Alignment of _opaque_pthread_mutex_t"]
        [::std::mem::align_of::<_opaque_pthread_mutex_t>() - 8usize];
    ["Offset of field: _opaque_pthread_mutex_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_mutex_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_mutex_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_mutex_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_mutexattr_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 8usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_mutexattr_t"]
        [::std::mem::size_of::<_opaque_pthread_mutexattr_t>() - 16usize];
    ["Alignment of _opaque_pthread_mutexattr_t"]
        [::std::mem::align_of::<_opaque_pthread_mutexattr_t>() - 8usize];
    ["Offset of field: _opaque_pthread_mutexattr_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_mutexattr_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_mutexattr_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_mutexattr_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_once_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 8usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_once_t"][::std::mem::size_of::<_opaque_pthread_once_t>() - 16usize];
    ["Alignment of _opaque_pthread_once_t"]
        [::std::mem::align_of::<_opaque_pthread_once_t>() - 8usize];
    ["Offset of field: _opaque_pthread_once_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_once_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_once_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_once_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_rwlock_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 192usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_rwlock_t"]
        [::std::mem::size_of::<_opaque_pthread_rwlock_t>() - 200usize];
    ["Alignment of _opaque_pthread_rwlock_t"]
        [::std::mem::align_of::<_opaque_pthread_rwlock_t>() - 8usize];
    ["Offset of field: _opaque_pthread_rwlock_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_rwlock_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_rwlock_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_rwlock_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_rwlockattr_t {
    pub __sig: ::std::os::raw::c_long,
    pub __opaque: [::std::os::raw::c_char; 16usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_rwlockattr_t"]
        [::std::mem::size_of::<_opaque_pthread_rwlockattr_t>() - 24usize];
    ["Alignment of _opaque_pthread_rwlockattr_t"]
        [::std::mem::align_of::<_opaque_pthread_rwlockattr_t>() - 8usize];
    ["Offset of field: _opaque_pthread_rwlockattr_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_rwlockattr_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_rwlockattr_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_rwlockattr_t, __opaque) - 8usize];
};
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct _opaque_pthread_t {
    pub __sig: ::std::os::raw::c_long,
    pub __cleanup_stack: *mut __darwin_pthread_handler_rec,
    pub __opaque: [::std::os::raw::c_char; 8176usize],
}
#[allow(clippy::unnecessary_operation, clippy::identity_op)]
const _: () = {
    ["Size of _opaque_pthread_t"][::std::mem::size_of::<_opaque_pthread_t>() - 8192usize];
    ["Alignment of _opaque_pthread_t"][::std::mem::align_of::<_opaque_pthread_t>() - 8usize];
    ["Offset of field: _opaque_pthread_t::__sig"]
        [::std::mem::offset_of!(_opaque_pthread_t, __sig) - 0usize];
    ["Offset of field: _opaque_pthread_t::__cleanup_stack"]
        [::std::mem::offset_of!(_opaque_pthread_t, __cleanup_stack) - 8usize];
    ["Offset of field: _opaque_pthread_t::__opaque"]
        [::std::mem::offset_of!(_opaque_pthread_t, __opaque) - 16usize];
};
pub type __darwin_pthread_attr_t = _opaque_pthread_attr_t;
pub type __darwin_pthread_cond_t = _opaque_pthread_cond_t;
pub type __darwin_pthread_condattr_t = _opaque_pthread_condattr_t;
pub type __darwin_pthread_key_t = ::std::os::raw::c_ulong;
pub type __darwin_pthread_mutex_t = _opaque_pthread_mutex_t;
pub type __darwin_pthread_mutexattr_t = _opaque_pthread_mutexattr_t;
pub type __darwin_pthread_once_t = _opaque_pthread_once_t;
pub type __darwin_pthread_rwlock_t = _opaque_pthread_rwlock_t;
pub type __darwin_pthread_rwlockattr_t = _opaque_pthread_rwlockattr_t;
pub type __darwin_pthread_t = *mut _opaque_pthread_t;
pub type intmax_t = ::std::os::raw::c_long;
pub type uintmax_t = ::std::os::raw::c_ulong;
extern "C" {
    pub fn double_byte(n: i8) -> i8;
}
pub type __builtin_va_list = *mut ::std::os::raw::c_char;
}

Oh boy. Now we have a Rust definition of every single type exported by stdint on your platform, sitting inside your auto-generated blob. You're compiling it (and discarding most of it) every time. Down at the bottom, you have the part you actually care about:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn double_byte(n: i8) -> i8;
}
}

It's not doing any real harm, other than making your IDE and compiler work harder (it's also one reason I often write the imports by hand for simpler setups). But what if you DON'T want to generate the most amazingly large imports (for fun, try bindgen on some Linux headers...). If this really bothers you, the easiest approach is to adopt an explicit list of what you want to import:

In your build.rs:

#![allow(unused)]
fn main() {
    let bindings = bindgen::Builder::default()
        .header("src/clib.h")
        .parse_callbacks(Box::new(bindgen::CargoCallbacks::new()))
        .allowlist_function("double_byte")
        .generate()
        .expect("Unable to generate bindings");
}

Bindgen has an amazing number of options. If you want to, you can make a list of functions in a file and call allowlist_file. The downside is that now you have to list what you want to import. For a big library, that may be painful.

C Structures

We've covered most of the primitive types (we're avoid strings for now!). So how about Structs? This will naturally lead us towards the part we're all dreading --- pointers (and strings!). For now, let's have a gentle start.

First of all, some good news. Rust and C structs are really similar. These will have the same representation in memory (assuming you're on a sane platform with i32 for int, and i8 for char!)

C Rust

C	Rust
`struct MyStruct { int field; int field2; char field3; };`	`#![allow(unused)] fn main() { #[repr(C)] struct MyStruct { field: i32, field2: i32, field3: i8, } }`

struct MyStruct {
    int field;
    int field2;
    char field3;
};

#![allow(unused)]
fn main() {
#[repr(C)]
struct MyStruct {
    field: i32,
    field2: i32,
    field3: i8,
}
}

Note the #[repr(C)]. If you don't specify this, Rust reserves the right to reorder your struct however it feels. That can sometimes improve performance. Forgetting the repr can ruin your day by sometimes working.

Sticking with "pure" C (as opposed to C++) for now, you can use structs a lot like you use them in Rust --- with some differences.

#include <stdio.h>
#include <stdlib.h>

struct MyStruct {
    int field;
    int field2;
    char field3;
};

void print(struct MyStruct s) {
    printf("%d, %d, %d\n", s.field, s.field2, s.field3);
}

void print_ptr(struct MyStruct *s) {
    printf("%d, %d, %d\n", s->field, s->field2, s->field3);
}

int main() {
    struct MyStruct s = {
        1, 2, 3
    };
    print(s);
    
    // "s" is not invalidated because it was copied. C doesn't move!
    printf("%d, %d, %d\n", s.field, s.field2, s.field3);
    
    // Grab a pointer to s and it works like a reference
    print_ptr(&s);
    return 0;
}

Notice how C doesn't have Rust's "move by default"---it copies. You can take a pointer, and use it.

Simple Struct Usage

Let's add some C to our testbed header and code file:

struct MyStruct {
    int integer;
    char byte;
};

int copy_struct(struct MyStruct s);
struct MyStruct return_struct(int n, char c);

int copy_struct(struct MyStruct s) {
    return s.integer;
}

struct MyStruct return_struct(int n, char c) {
    struct MyStruct s = { n, c };
    return s;
}

Bindgen creates a struct that should look familiar. By default, it'll implement Debug, Clone and Copy when possible.

#![allow(unused)]
fn main() {
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct MyStruct {
    pub integer: ::std::os::raw::c_int,
    pub byte: ::std::os::raw::c_char,
}
}

With that in mind, let's add some tests to our library to use these:

#![allow(unused)]
fn main() {
#[test]
fn test_copy_struct() {
    let s = MyStruct { integer: 12, byte: 3 };
    let n = unsafe { copy_struct(s) };
    assert_eq!(n, s.integer);
}

#[test]
fn test_return_struct() {
    let s = unsafe { return_struct(11, 2) };
    assert_eq!(s.byte, 2);
    assert_eq!(s.integer, 11);
}
}

The tests work, and I don't havd any gotcha moments for you! This is one of the best parts: as long as you remember the representation, and are using primitives---C to Rust (and vice versa) just works.

Simple Struct Pointers

Yes, I just used "C", "Pointers" and "Simple" in the same sentence. I'll doubtless regret that.

Let's add another item to our header and C body:

int reference_struct(struct MyStruct *s);

int reference_struct(struct MyStruct *s) {
    return s->integer;
}

This is the easiest case - we're pointing to a structure that already exists. The Rust test is pretty straightforward:

#![allow(unused)]
fn main() {
#[test]
fn test_reference_struct() {
    let mut s = MyStruct { integer: 9, byte: 7 };
    let n = unsafe { reference_struct(&mut s) };
    assert_eq!(n, 9);
}
}

And here we have a whole bunch of things to spot!

s has to be mutable! We didn't specify const in C-land, so the function is free to do whatever it feels like to your data.
We have to borrow with &mut for the same reason.

You can fix that if you can adjust the C. It's remarkable how many C programmers fight doing this, or just forget.

int reference_struct_const(const struct MyStruct *s);

int reference_struct_const(const struct MyStruct *s) {
    return s->integer;
}

And now, a Rust test:

#![allow(unused)]
fn main() {
#[test]
fn test_reference_struct_const() {
    let s = MyStruct { integer: 9, byte: 7 };
    let n = unsafe { reference_struct_const(&s) };
    assert_eq!(n, 9);
}
}

You see? That wasn't so bad! Just wait...

Out Variables

Since plain C doesn't have tuples or another easy way to return multiple things at once, a LOT of C uses patterns like thsi one:

#include <stdbool.h> // Yes, you really do need a header to use a bool!
int is_byte_twelve(const struct MyStruct* s, bool* val);

int is_byte_twelve(const struct MyStruct* s, bool* val) {
    int errnum;
    if (!s) {
        errnum = -1;
        goto error; // Goto is really common in C for error handling.
    }
    *val = s->byte == 12 ? true : false;
    return 0;

error:
    return errnum;
}

The idea here is that something more complicated than checking for 12 occurs, and either true or false is returned. But the function is fallible! In Rust terms, it's Result<bool, MyError> - where MyError is in this case an int referencing some pretend documentation.

So what's the issue here?

We have to think about the ownership of val. In this case, it's up to us to create it and pass a pointer. Passing NULL will ruin your day.
It's a bit messy to follow the logic if you aren't used to it.
It's not really a Rust paradigm, so you'll dance around a bit.

Here's a Rust test for it:

#![allow(unused)]
fn main() {
#[test]
fn test_is_byte_twelve() {
    let s = MyStruct { integer: 0, byte: 12 };
    let mut result = false; // Rust won't let you not initialize it
    let retval = unsafe { is_byte_twelve(&s, &mut result) };
    assert_eq!(0, retval);
    assert_eq!(true, result);

    let s = MyStruct { integer: 0, byte: 13 };
    let retval = unsafe { is_byte_twelve(&s, &mut result) };
    assert_eq!(0, retval);
    assert_eq!(false, result);
}
}

OK, that was a long one. Drink some coffee, and we're going to dive into ownership.

Pointers and Ownership

We lost the word "simple". Sorry everyone, buckle up!

Pure C programs often allocate a bunch of structs, and pass pointers around like crazy. This can be especially challenging while you replace parts of a C program in Rust!

Let's start with an all-too-common one. The C program makes a pointer to some data, and it's up to you to dispose of it.

// Header
#include <stdlib.h>

struct MyStruct * factory();

// Body
struct MyStruct * factory() {
    struct MyStruct * s = (struct MyStruct *)malloc(sizeof(struct MyStruct));
    return s;
}

Ignore the hundreds of warnings that just appeared from bindgen including links to the C standard library.

So we have a C function handling us a pointer into the heap, the authors didn't provide us with a handy "free" function, and this isn't C++ - so no RAII.

We have a few options. My favourite is to take a moment:

Good Old Free

Just in case you were missing the "good old days" of C (!), let's look at some C-like options.

It's surprising how C-like low-level Rust can get!

Let's start by looking at the interface bindgen gives us:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn factory() -> *mut MyStruct;
}
}

That looks a lot like the C interface. A naked, mutable pointer (unsafe if you use it!) to a MyStruct.

We're dealing with C, so null pointers exist!

Rust actually has null pointers, just not in normal/idiomatic Rust. Step 1 is to invoke the factory, and makre sure that the pointer we received is actually valid:

#![allow(unused)]
fn main() {
#[test]
    fn test_factory_free() {
        let object = unsafe { factory() };
        // No null pointers for us!
        assert_ne!(object, std::ptr::null_mut());
    }
}

Otherwise, accessing the null pointer would do exactly what it does in C-land: crash your program with a segmentation fault. We're looking to interoperate with C, not emulate it!

C, the Land of the Free (and the Malloc)

And let's call free, just as if we were using C.

#![allow(unused)]
fn main() {
#[test]
    fn test_factory_free() {
        let object = unsafe { factory() };
        // No null pointers for us!
        assert_ne!(object, std::ptr::null_mut());

        // We can call libc directly to free the memory
        // In this case, we've linked libc via the c_lib crate, sometimes
        // you have to import libc directly
        unsafe { free(object as *mut c_void) };
    }
}

This comes with a lot of caveats:

If the C library is using a custom allocator, don't do this. The C library pretty much has to expose a de-allocator function for you to use, or the memory is going to be leaked.
If the object contains other pointers, you have to go through and free them all in order. That can be really error-prone.
As-is, you have a *mut pointer to the object. Unless you're only going to be interacting with the C side of the world, you want to create an idiomatic Rust solution!

Make a Box

Rust gives you another, safer option (that again, only works with normal allocators). Box has a function to consume a "naked" pointer, and safely wrap it in a Rust box - so you get automatic deallocation when you are done with it.

#![allow(unused)]
fn main() {
#[test]
fn test_factory_box() {
    let object = unsafe { factory() };
    // No null pointers for us!
    assert_ne!(object, std::ptr::null_mut());

    let mut object = unsafe { Box::from_raw(object) };
    object.byte = 12;
    assert_eq!(object.byte, 12);

    // Let's check that we can still work with it
    let mut result = false;
    let retval = unsafe { is_byte_twelve(object.as_mut(), &mut result) };
    assert_eq!(0, retval);
    assert_eq!(true, result);
}
}

This still gets messy if you have nested objects - you'll need to make your own constructor that wraps nested objects in Boxes - but you've come a long way! You can safely consume a pointer to an object, and dispose of it using normal RAII rules.

Bonus! Box::from_raw doesn't do any copy or move operations. It takes ownership of the pointer, from the Rust view of the world. Since you are literally just reinterpreting some memory and attaching a Box to it, this is very fast.

C Strings

We've deliberately saved this up - we needed to cover pointers (there will be more on that) first.

C strings are a pointer to a set of memory whose bytes are assumed to be ASCII characters (1 byte each). They might actually be utf8 encoded, but that's not standard. "Best" of all, C strings don't store a length. Instead, they are assumed to keep going until the next 0 in memory.

So:

const char * hello = "Hello\0";

Is guaranteed to be:

Letter	Byte
H	72
e	101
l	108
l	108
o	111
\0	0

So Why Is This a Problem?

History has shown that this was a relatively terrible idea.

Functions like strcpy keep reading until they find a zero. If there isn't one, you can go straight past the end of the buffer. Isn't it nice when sensitive data a few variables down a struct appears in your string?
Since the exact size of the string isn't readily available, you have to be really careful when copying strings into buffers. There are thousands of CVEs (Common Vulnerability) reports resulting from this.

Pascal in 1970 (2 years before C!) figured this out, and strings consisted of a LENGTH and the data. Admittedly, length was a byte (no 256 character strings; Turbo Pascal fixed that).

Rust actually makes it a little more confusing!

Rust has several types of string. The big ones are:

&str - a pointer to a string in memory that stores both the length and the data. No null terminator!
string - a vector of bytes types, mapped to chars.
CowString - a copy-on-write string.
There's a few more for things like Path buffers and dealing with C.

On top of that, a Rust char is not the same as a C char.

A C char is typcally one byte (on most platforms).
A Rust char is 4 bytes, unless it isn't. All Rust characters are UTF-8 encoded, which can be anywhere from 1 to 4 bytes. Rust strings store a byte array that is interpreted based on "codepoints".

Try this:

fn main() {
    let s = "I🩷🦀".to_string();
    println!("Length: {}", s.len());
    println!("Character length: {}", s.chars().count());

    // Print byte values
    println!("Bytes:");
    for (i, byte) in s.bytes().enumerate() {
        println!("Byte {}: {}", i, byte);
    }
}

That yields:

Length: 9
Character length: 3
Bytes:
Byte 0: 73
Byte 1: 240
Byte 2: 159
Byte 3: 169
Byte 4: 183
Byte 5: 240
Byte 6: 159
Byte 7: 166
Byte 8: 128

So...

The string is technically incompatible with the C standard, but most compilers will take UTF-8 (whether it will display is another question).
The string is definitely incompatible with the original ASCII (which goes to 128), but everyone really uses ANSI now.
There isn't a null terminator in sight.

Sending Strings to C Functions

Let's go back to our project. Let's add another C function.

// Header
int string_length(const char* s);

// Body
#include <string.h>

int string_length(const char* s) {
    return strlen(s);
}

This function will explode nicely if we don't null-terminate our string - and yield a length otherwise. Let's handle some ways to make a string to pass to it:

For constants, my favourite is the relatively new c literal syntax.

#![allow(unused)]
fn main() {
#[test]
fn test_string_length() {
    // Use a Rust "C string literal" to create a null-terminated string
    let s = c"Hello!";
    let len = unsafe { string_length(s.as_ptr()) };
    assert_eq!(6, len);
}
}

Prior to this code, you'd need:

#![allow(unused)]
fn main() {
let s = "Hello!";
let s_c = std::ffi::CString::new(s).unwrap();
let len = unsafe { string_length(s_c.as_ptr()) };
assert_eq!(6, len);
}

Go ahead and test that, too. You can use that with any Rust type that breaks down to &str, so now you can make C strings dynamically.

The unwrap is there because CString checks that you haven't included any null/0 bytes in the string. That would be bad on the C side. It also guarantees that it will add the zero to the end. So you're adding safety!

Ownership 1

So what if the C function you are calling takes ownership of the string you pass - by deleting it!

C doesn't have a formal concept of ownership, and often the way to discover that you need to jump through this hoop is to read the C code. It's caused nightmares for large C projects!

C will happily let you allocate a buffer on your local stack, pass a pointer to it as a string and then die horribly when that function frees it.
C won't tell you that you allocated a string and never freed it.
And so on. Sit an old C programmer down with some free beer if you want to hear more. It'll be a long night.

Let's add another function to the C:

// Header
int string_length_and_delete(char* s);

// Body
int string_length_and_delete(char* s) {
    int len = strlen(s);
    free(s);
    return len;
}

Fortunately, CString has an into_raw function just for this case:

#![allow(unused)]
fn main() {
#[test]
fn test_string_length_and_delete() {
    // `into_raw` consumes the CString and returns a raw pointer
    let s = std::ffi::CString::new("Hello!").unwrap();
    let len = unsafe { string_length_and_delete(s.into_raw()) };
    assert_eq!(6, len);

    // Uncomment to show that this preserves Rust safety
    // println!("{:?}", s); // This will panic, as the CString has been deleted
}
}

If you try and use s after into_raw - it won't compile. Safety is preserved.

Taking Ownership of a String

Let's go the other way. A C function allocates a string on the heap, and sends you a pointer to it. The C programmer helpfully noted that you need to delete it when you are done (we may be in a parallel universe). Let's add another C function:

// Header
char * return_hello();

// Body
char * return_hello() {
    char * s = (char *)malloc(6);
    strcpy(s, "Hello");
    return s;
}

And let's test it on the Rust side:

#![allow(unused)]
fn main() {
#[test]
fn test_return_hello() {
    let s = unsafe { return_hello() };

    // Convert the C string to a Rust string.
    // from_raw consumes the CString and returns the original String
    let s = unsafe { std::ffi::CString::from_raw(s) };
    let s = s.to_str().unwrap();
    assert_eq!("Hello", s);
}
}

So what's up with the to_str, and it being fallible? If the string from C isn't null-terminated, it will fail to convert to a CString. Likewise, if the string isn't UTF-8 compatible. ASCII is - but who knows what will appear in the buffer?

When I started helping with the LibreQoS project, a friend of mine - a Linux kernel contributor - expressed concern that my Rust code was passing strings around like candy. I showed him how hard Rust make it to leak a string, or perform the other nightmare scenarios - and to his credit, he tried to learn Rust!

Callbacks and Functions

A lot of C code passes function pointers around. A lot of Rust code does, too - although its often behind nice syntax sugar with closures. It's a really powerful pattern.

C Calling Back into Rust

Let's add a C function that calls another function.

// Header
int call_me(int (*maybe)(int));

// Body
int call_me(int (*maybe)(int)) {
    int sum = 0;
    for (int i = 0; i < 10; i++) {
        sum += maybe(i);
    }
    return sum;
}

bindgen generates the appropriate Rust bindings:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn call_me(
        maybe: ::std::option::Option<
            unsafe extern "C" fn(arg1: ::std::os::raw::c_int) -> ::std::os::raw::c_int,
        >,
    ) -> ::std::os::raw::c_int;
}
}

Aren't you glad you didn't type that yourself?

You can setup a Rust test to run this:

#![allow(unused)]
fn main() {
#[test]
fn test_call_me_maybe() {
    #[no_mangle]
    unsafe extern "C" fn maybe(n: i32) -> i32 {
        1
    }

    let n = unsafe { call_me(Some(maybe)) };
    assert_eq!(10, n); // Runs 10 times
}
}

Bindgen has helpfully replaced the possibility of null with an Option, and you're using the function export syntax from earlier.

This is remarkably useful when you are linking library services. One side can (possibly asynchronously) iterate through a series of results and pass them across the FFI boundary individually.

I used this in a client's C# client for LanceDb. The Rust library lazily initializes a Tokio instance, and passes commands from C calls into the executor via a channel. Callbacks for results are passed along, allowing asynchronous functions to run - and stream data to the client. A "oneshot" calls back to the original function to indicate completion and provide a result code.

And Now... Some Pain

When you're dealing with C --- a language that's been around since 1972, and has been through multiple iterations and best practices --- it's really hard to avoid some pain. Sorry.

Make sure you have some of your preferred beverage, we're going to take a dive into some of the murkier corners.

Accidentally Including The World

We actually did this one (somewhat on purpose, really), so let's clean it up.

We've seen what happens when you do this. Let's go ahead and clean up real quick.

Our header file needs only to import:

#pragma once

#include <stdbool.h>

// Remainder unchanged

Our C body file gains the removed includes:

#include <stdlib.h>
#include <string.h>
#include "clib.h"

You won't always get the luxury of cleaning up other people's messes. But when you can, it helps a lot. The warnings you see now stem from the library not having actually used functions, and a couple of places where mut wasn't needed.

Macros

C macros are nothing like Rust macros. You can #define true false if you want to ruin someone's day (and make it conditional!). Rust macros are nice and hygienic.

A remarkable number of production C++ projects include #define private public in their unit test code to make it easier to test the innards of classes!

The problem is - the C world runs on macros. Take a look at the Linux source code sometime, the macros there will make you weep.

Macros as Constants

C has perfectly good const support, and even static const if you want it. A lot of C code doesn't do that, preferring instead to #define constants. Even better, macros in C don't have types!

So let's add a couple of #define style constants to our header:

#define MY_CLIB_VERSION 1
#define MY_CLIB_VERSION_STRING "1.0"

We're on solid ground for the integer. Bindgen creates an i32:

#![allow(unused)]
fn main() {
fn test_constants() {
    assert_eq!(MY_CLIB_VERSION, 1);
}
}

How about the string? Bindgen has, unfortunately, exported it as:

#![allow(unused)]
fn main() {
pub const MY_CLIB_VERSION_STRING: &[u8; 4] = b"1.0\0";
}

You can test it by explicitly turning it into a CStr (it's static):

#![allow(unused)]
fn main() {
#[test]
fn test_constants() {
    assert_eq!(MY_CLIB_VERSION, 1);
    let c_str = unsafe { CStr::from_bytes_with_nul(MY_CLIB_VERSION_STRING)}.unwrap();
    let s = c_str.to_str().unwrap();
    assert_eq!(s, "1.0");
}
}

So it's there - it's just not really friendly. Fortunately, this is good enough to handle the most common usages such as this:

#define BUFFER_SIZE 256
void process_data(char buffer[BUFFER_SIZE], size_t data_length);

Macros as Types

Yes, people really do this. typedef is easier...

#define CALLBACK int (*callback)(int)
int call_me_with_callback(CALLBACK);

If you search bindgen's output, you won't find a type defined for CALLBACK. It has rolled it into the function signature for you:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn call_me_with_callback(
        callback: ::std::option::Option<
            unsafe extern "C" fn(arg1: ::std::os::raw::c_int) -> ::std::os::raw::c_int,
        >,
    ) -> ::std::os::raw::c_int;
}
}

So even though it's a potentially evil footgun, it works just fine.

Macros as Code

You will sometimes run into times where programmers made use of the proprocessor a little too much.

#define SQUARE(x) ((x) * (x))

And now the bad news. Bindgen won't translate this for you. It's nowhere in the generated output. You have to port these on your own.

One of the hurdles the Rust for Linux people have faced is that Linux does this more than you might expect. Bindgen couldn't handle the macro magic, and chunks have to be ported by hand.

Into the Void

If you're familiar with some older-style C (or just "clever" C), you may have run into:

void into_the_void(void * black_hole, int len);

You may have even run into void ** or void ***!

This is a very un-Rustacean way to write things, but the C-world generally doesn't care. So we have to make it work!

What does this mean?

A void pointer is literally a memory address with no assertion as to what it points at. It's really common in older C, and you still find it. I ran into it a bunch when working with libxdp --- which is actively maintained!

There's a few reasons it might exist:

You genuinely want to pass a number of types, often with a flag indicating what you're passing. This is basically poor-man's C++ inheritance.
Your code-base hasn't adopted the C type system in any real detail.
Your code-base predates the C type system.
You have to work with a really broken C compiler for some obscure platform. It's terrifying how often that happens.

Let's try it!

Let's add another C function:

// header
int extract_from_the_void(void *s);

// body
int extract_from_the_void(void *s) {
    return ((struct MyStruct *)s)->integer;
}

Let's see what bindgen makes:

#![allow(unused)]
fn main() {
extern "C" {
    pub fn extract_from_the_void(s: *mut ::std::os::raw::c_void) -> ::std::os::raw::c_int;
}
}

Oh wow... Rust actually has void type, too. And JUST like C, you can cast it into other things:

#![allow(unused)]
fn main() {
#[test]
fn test_extract_from_the_void() {
    let my_struct = MyStruct { integer: 12, byte: 3 };
    let n = unsafe { extract_from_the_void(&my_struct as *const MyStruct as *mut c_void) };
    assert_eq!(n, 12);
}
}

Unions

Unions are pretty much the definition of what Rust doesn't allow in safe code: an area of memory that can be interpreted to mean multiple things. A union is exactly the size of its largest variant. They are also really useful!

So let's add a C union to our C header file:

union MyUnion {
    int integer;
    char byte;
};

What does bindgen come up with?

#![allow(unused)]
fn main() {
#[repr(C)]
#[derive(Copy, Clone)]
pub union MyUnion {
    pub integer: ::std::os::raw::c_int,
    pub byte: ::std::os::raw::c_char,
}
}

Rust actually supports unions! It's just unsafe to do very much with them.

You can write to a Rust union safely, just ONLY set the one field and your code is both safe and sound:

#![allow(unused)]
fn main() {
#[test]
fn testing_unions() {
    let u = MyUnion { integer: 12 };
    let u = MyUnion { byte: 1 };
}
}

Accessing a union field is always unsafe, because Unions violate the aliasing rule. It works, though:

#![allow(unused)]
fn main() {
#[test]
fn testing_unions() {
    // This kinda makes sense
    let u = MyUnion { integer: 12 };
    assert_eq!(unsafe { u.integer }, 12);
    assert_eq!(unsafe { u.byte }, 12); // Technically undefined behavior, but it works
}
}

And this kinda works:

#![allow(unused)]
fn main() {
// This is getting weird
let u = MyUnion { integer: 512 };
assert_eq!(unsafe { u.integer }, 512);
assert_eq!(unsafe { u.byte }, 0); // You're just reading the first byte!
}

And please don't do this:

#![allow(unused)]
fn main() {
// This is just wrong
let u = MyUnion { byte: 1 };
assert_eq!(unsafe { u.integer }, 1); // You're reading the first 4 bytes of a single byte!
}

Unions are REALLY useful

Even though they are unsafe (even C++ tried to restrict them, but the userbase said NO), unions can be super useful. Here's the Linux defintion of an IPv6 address:

struct in6_addr
{
        union 
        {
                __u8    u6_addr8[16];
                __be16  u6_addr16[8];
                __be32  u6_addr32[4];
        } in6_u;
};

It's 128-bits of data, but you can access it as bytes, 16-bit or 32-bit numbers. Likewise, it's common to use a union of 4 bytes or a 32-bit integer (big endian) for an IP address; access all the octets individually, or as a single number.

Working with Rust in C

We're not going to heavily belabor this, but I'd like to make sure you're equipped to offer your awesome Rust to other languages. Except for languages with nifty binding systems (Python in particular, NodeJS too)---you're going to need to use the C ABI.

Rust doesn't have a stable ABI. At all. It's not even guaranteed that a binary will be exactly the same if you compile it twice on the same system. So you may even find yourself using a C ABI in Rust if you want to support a plugin system.

You can also pay for Gitoxide and have a stable ABI, eventually! You won't be updating your Rust version very often, but it's possible.

Slices

Rust has really nice handling of slices of memory.

Let's create a new project:

cargo new ex18 --lib

Now setup a tree like this:

.
├── c
│   ├── build_linux.sh
│   ├── ex18.c
├── Cargo.toml
└── src
    └── lib.rs

Don't forget that Cargol.toml needs to specify your crate type:

[lib]
crate-type = ["staticlib"] # Will create a .so on Linux, .dylib on Mac, .dll on Windows

Our Rust file (lib.rs) is designed to illustrate working with slices:

#![allow(unused)]
fn main() {
// We're using "no_mangle" to ensure function names are preserved
#[no_mangle]
// Don't forget extern "C"!
pub extern "C" fn sum_byte_slice(
    ptr: *const u8, // A raw pointer to bytes.
    length: usize, // Length, in bytes.
) -> i32 {
    // Always a good idea to do a null pointer check
    if ptr.is_null() || length == 0 {
        return 0;
    }
    // Slices ARE a pointer and a length!
    let slice = unsafe { std::slice::from_raw_parts(ptr, length) };

    // Now we can work as normal safe code
    let mut sum: i32 = 0;
    for &num in slice {
        sum += num as i32;
    }
    sum
}
}

Notice that the ONLY unsafe part is the from_raw_parts. There's a few caveats with from_raw_parts; you need to be using the same memory alignment as the C compiler, the slice can't span multiple allocations. And of course, if the C program sends you an invalid length then you can expect all manner of trouble.

Slices with Structs

You often want to pass a blob of structs into Rust.

Let's add a type to our lib.rs file:

#![allow(unused)]
fn main() {
#[repr(C)]
#[derive(Debug)]
pub struct MyData {
    pub a: i32,
    pub b: i16,
    pub c: i8,
}
}

If you forget repr(C), you can expect bizarre things to happen.

Slicing Raw Bytes

There's a lot of ways to do this. The bytemuck and zerocopy crates are popular if you are diving deeply into this.

Let's make a Rust function that takes whatever C gives it as a byte array, performs some safety checks, and treats the byte blob as a slice of MyStruct:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn print_slice_of_mydata(ptr: *const u8, length: usize) {
    // Null checks are needed
    if ptr.is_null() || length == 0 {
        return;
    }
    // Check that the alignment means this is possible
    assert_eq!(ptr.align_offset(std::mem::align_of::<MyData>()), 0, "Pointer is not aligned");    
    // If the number of bytes isn't a multiple of the struct size,
    // it's probably not valid
    assert_eq!(length % std::mem::size_of::<MyData>(), 0, "Length is not a multiple of MyData size");

    // Make the slice. Note that we're CASTING the pointer, just like C. `from_raw_parts` likes
    // the number of ELEMENTS, just like C pointer math.
    let slice = unsafe { std::slice::from_raw_parts(ptr as *const MyData, length / size_of::<MyData>()) };

    // Work with it normally
    for data in slice {
        println!("{data:?}");
    }
}
}

Now let's write some C to use it:

#include <stdio.h>
#include <stdint.h> // For easy types

// The struct definition is a 1:1 match
struct MyData {
    int32_t a;
    int16_t b;
    int8_t c;
};

// You can use cbindgen, but this is an "easy" one!
void print_slice_of_mydata(struct MyData* data, size_t len);

int main() {
    // Declare some data (on the stack)
    struct MyData data[] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9}
    };
    printf("Raw byte slice:\n");
    // Call our function
    print_slice_of_mydata(data, sizeof(data));
    printf("\n");

    return 0;
}

Typed Slices

Assuming everything is a byte array is very early-C like, but more modern C likes types. So let's make a more idiomatic version. In lib.rs:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn print_slice_nicely(ptr: *const MyData, num_elements: usize) {
    // Still null checking
    if ptr.is_null() || num_elements == 0 {
        return;
    }
    // Still alignment checking
    assert_eq!(ptr.align_offset(std::mem::align_of::<MyData>()), 0, "Pointer is not aligned");
    // No need for sizeof anymore
    let slice = unsafe { std::slice::from_raw_parts(ptr, num_elements) };
    // And now its just a slice
    for data in slice {
        println!("{data:?}");
    }
}
}

There's a bunch of wins here:

It's now really obvious what our function expects.
There's a little less error checking needed on the Rust side.

The C program changes a little:

printf("Nicely formatted slice:\n");
print_slice_nicely(data, sizeof(data) / sizeof(data[0]));

That's right - the C grows! sizeof(data) is in bytes, so you have to divide it by the element size. It's six of one, half a dozen of the other: one side is going to be doing that!

Important: make a convention. Are you passing lengths as number of elements or bytes? Number of elements is generally more intuitive, but the C creatures may disagree. Ask them. Nicely.

Concurrency!

We haven't really done much that shows Rust shining, yet. So let's fix that.

In lib.rs, let's build:

#![allow(unused)]
fn main() {
use rayon::prelude::*;

#[no_mangle]
pub extern "C" fn is_prime_slow(n: i32) -> bool {
    if n < 2 {
        return false;
    }
    for i in 2..n/2+ 1 {
        if n % i == 0 {
            return false;
        }
    }
    true
}

#[no_mangle]
pub extern "C" fn count_primes(slice: *const i32, len: usize) -> usize {
    // Safety, as much as we can
    assert!(!slice.is_null());
    assert!(len > 0);

    // Get the slice
    let slice = unsafe { std::slice::from_raw_parts(slice, len) };
    slice.par_iter().filter(|n| is_prime_slow(**n)).count()
}
}

And add Rayon to your dependencies with cargo add rayon. We've made a deliberately SLOW prime number detector, and then a function that uses Rayon to auto-parallelize it across all your CPUs and count the result.

Now for the C. Let's start by making sure it works:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>

#define ARR_SIZE 1000

// Link to the Rust versions
bool is_prime_slow(int32_t n);
int64_t count_primes(int32_t *arr, size_t len);

int main() {
    time_t start, end;

    // Allocate memory for the array
    printf("Allocating memory for the array...\n");
    int32_t *arr = (int32_t *)malloc(ARR_SIZE * sizeof(int32_t));

    // Populate the array with random numbers
    printf("Populating the array with random numbers...\n");
    for (int32_t i = 0; i < ARR_SIZE; i++) {
        arr[i] = rand();
    }

    // Count the primes in the array with Rust, cheating we're using Rayon
    start = time(NULL);
    int64_t sum_rust = count_primes(arr, ARR_SIZE);
    end = time(NULL);
    printf("Count (from Rust, Parallel): %ld. Seconds: %ld\n", sum_rust, end - start);

    // Free the allocated memory
    free(arr);
    return 0;
}

And here's a long version that tests all of it:

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>

#define ARR_SIZE 1000

// Link to the Rust versions
bool is_prime_slow(int32_t n);
int64_t count_primes(int32_t *arr, size_t len);

// A native C version of the same thing
bool is_prime_slow_c(int32_t n) {
    if (n < 2) return false;
    for (int32_t i = 2; i < n/2; i++) {
        if (n % i == 0) return false;
    }
    return true;
}

int main() {
    time_t start, end;

    // Allocate memory for the array
    printf("Allocating memory for the array...\n");
    int32_t *arr = (int32_t *)malloc(ARR_SIZE * sizeof(int32_t));

    // Populate the array with random numbers
    printf("Populating the array with random numbers...\n");
    for (int32_t i = 0; i < ARR_SIZE; i++) {
        arr[i] = rand();
    }

    // Count the primes in the array with Rust, cheating we're using Rayon
    start = time(NULL);
    int64_t sum_rust = count_primes(arr, ARR_SIZE);
    end = time(NULL);
    printf("Count (from Rust, Parallel): %ld. Seconds: %ld\n", sum_rust, end - start);

    // Count the primes in the array with a C loop
    printf("Calculating the sum of the array in C...\n");
    start = time(NULL);
    int64_t sum = 0;
    for (int32_t i = 0; i < ARR_SIZE; i++) {
        if (is_prime_slow(arr[i])) sum++;
    }
    end = time(NULL);
    printf("Count (from C): %ld. Seconds: %ld\n", sum, end - start);

    // Count the primes in the array with a C loop using native code
    printf("Calculating the sum of the array in C (native)...\n");
    start = time(NULL);
    sum = 0;
    for (int32_t i = 0; i < ARR_SIZE; i++) {
        if (is_prime_slow_c(arr[i])) sum++;
    }
    end = time(NULL);
    printf("Count (from native C): %ld. Seconds: %ld\n", sum, end - start);

    // Free the allocated memory
    free(arr);
    return 0;
}

On my system at the office:

Rust completes in 4 seconds.
A C loop to the Rust function completes in 33 seconds.
A C loop to the native C function completes in 33 seconds.

So we've learned:

Rayon makes it easy to get a big win if you have multiple CPUs - and Rust makes concurrency much less scary.
There really isn't a performance penalty for calling into Rust.

Yes, You Can Do That in C!

But you might not like it.

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>
#include <time.h>
#include <pthread.h>
#include <unistd.h>     // For sysconf (on POSIX systems)

#define ARR_SIZE 1000

bool is_prime_slow_c(int32_t n) {
    if (n < 2) return false;
    for (int32_t i = 2; i < n / 2; i++) {
        if (n % i == 0) return false;
    }
    return true;
}

// Structure to hold thread-specific data
typedef struct {
    int32_t *arr;       // Pointer to the array
    int start;          // Start index (inclusive)
    int end;            // End index (exclusive)
    long partial_sum;   // This thread's count of primes
} ThreadData;

// Thread function: count how many numbers in [start, end) are prime
void *thread_func(void *arg) {
    ThreadData *data = (ThreadData *)arg;
    long local_count = 0;

    for (int i = data->start; i < data->end; i++) {
        if (is_prime_slow_c(data->arr[i])) {
            local_count++;
        }
    }

    // Store result in the struct
    data->partial_sum = local_count;
    return NULL;
}

int main() {
    // Start timing
    time_t start_time = time(NULL);

    // Allocate memory for the array
    printf("Allocating memory for the array...\n");
    int32_t *arr = (int32_t *)malloc(ARR_SIZE * sizeof(int32_t));

    // Populate the array with random numbers
    printf("Populating the array with random numbers...\n");
    for (int32_t i = 0; i < ARR_SIZE; i++) {
        arr[i] = rand();
    }

    // -----------------------------------------------------------
    // Pthread-based parallel prime counting
    // -----------------------------------------------------------

    // 1. Determine the number of CPUs
    int num_cpus = (int)sysconf(_SC_NPROCESSORS_ONLN);
    if (num_cpus < 1) {
        fprintf(stderr, "Could not determine number of CPUs; defaulting to 1.\n");
        num_cpus = 1;
    }

    // 2. Create arrays to hold thread data and pthread handles
    ThreadData *thread_data = (ThreadData *)malloc(num_cpus * sizeof(ThreadData));
    pthread_t *threads = (pthread_t *)malloc(num_cpus * sizeof(pthread_t));

    // Calculate how many elements per thread
    int chunk_size = ARR_SIZE / num_cpus;

    // 3. Initialize per-thread data and create threads
    for (int i = 0; i < num_cpus; i++) {
        thread_data[i].arr = arr;
        thread_data[i].start = i * chunk_size;
        // Last chunk might take the "remainder" if ARR_SIZE not perfectly divisible
        thread_data[i].end = (i == num_cpus - 1) ? ARR_SIZE : (i + 1) * chunk_size;
        thread_data[i].partial_sum = 0;

        pthread_create(&threads[i], NULL, thread_func, &thread_data[i]);
    }

    // 4. Join threads and accumulate partial sums
    long sum = 0;
    for (int i = 0; i < num_cpus; i++) {
        pthread_join(threads[i], NULL);
        sum += thread_data[i].partial_sum;
    }

    // 5. Clean up
    free(thread_data);
    free(threads);

    // End timing
    time_t end_time = time(NULL);
    printf("Count (from C, Parallel): %ld. Seconds: %ld\n", sum, (long)(end_time - start_time));

    // Free the allocated memory
    free(arr);

    return 0;
}

On my office system, this completes in 6-7 seconds - slightly slower than Rayon. And what a lot of work that was!

Async!

A surprisingly large amount of cool stuff in Rust uses async. That's an argument we should probably avoid unless everyone's feeling fighty!

I recently worked on a project that wanted to use LanceDb (a vector embedding database) in C#. C# doesn't have the greatest FFI story, but it does have great async. Unfortunately, it's not the same async Rust uses - so you can't just magically call async functions. There really isn't a great story for creating bindings other than the C ABI, either. We're not going to look directly at that project (it's huge, but it's at https://github.com/thebracket/LanceDbCSharp/ if you're bored). But we can look at how it works - and how you can use it to bridge the FFI gap and still use async.

So there's an "external" set of functions:

When you make a Connection, if the runtime isn't initialized it lazilly spawns a thread and launches Tokio on it.
- It's fun to make sure that Tokio is ready, so a "oneshot" channel calls back to say "I'm here!".
- We just hand out connection handles (id numbers) and keep the connections in a state table.
When you try to perform a database operation with the connection:
- You call a function in the external (C ABI).
- The function puts everything you want to do into an enum.
  - This includes callbacks:
    - For function completion.
    - For passing results back to the client when they are available.
- That enum is passed into Tokio as a channel call.
- A big match statement spawns tokio tasks to handle the actual call.
- Each task returns results by calling the provided callback. Rust owns the data - C# has to copy it (in this case that's unavoidable anyway, the format has to be marshalled).
- When the function finishes, it replies on a oneshot to the calling function in external - which can then return.

It's not perfect, but it works---and you can use this pattern. Meta even use it with their async C++ and async Rust to bridge the divide between the two. There was a talk on that here last year.

Let's Try It!

Create a new project (cargo new ex21 --lib). The Cargo.toml looks like this:

[package]
name = "ex21_async"
version = "0.1.0"
edition = "2021"

[lib]
crate-type = ["staticlib"] # Will create a .so on Linux, .dylib on Mac, .dll on Windows

[dependencies]
tokio = { version = "1.43.0", features = ["full"] }

Let's get started on lib.rs. Some superstructure:

#![allow(unused)]
fn main() {
use std::{sync::OnceLock, thread};
use tokio::sync::oneshot;

enum Command {
    AsyncGenerator { callback: extern "C" fn(i32, i32), complete: oneshot::Sender<()>, n: i32 },
}

static COMMAND_TX: OnceLock<tokio::sync::mpsc::Sender<Command>> = OnceLock::new();
}

We're making an enum containing the commands we want to be able to pass into Tokio-land. We've also created a OnceLock that will contain a channel sender.

Now we can make a function to start Tokio:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn start_generator() {
    // Oneshot: so we know when Tokio is alive
    let (tx, rx) = oneshot::channel();

    // Command channel
    let (cmd_tx, mut cmd_rx) = tokio::sync::mpsc::channel(100);
    COMMAND_TX.set(cmd_tx).unwrap();

    // In a thread, so that thread "blocks on" forever...
    thread::spawn(move || {
        // Start Tokio runtime
        let rt = tokio::runtime::Runtime::new().unwrap();
        rt.block_on(async {
            // Tokio is now alive!
            tx.send(()).unwrap();

            // Process commands
            while let Some(cmd) = cmd_rx.recv().await {
                match cmd {
                    Command::AsyncGenerator { callback, complete, n } => {
                        tokio::spawn(generator(callback, complete, n));
                    }
                }
            }
        });

        // Wait for the response
        let _ = rx.blocking_recv().unwrap();
    });
}
}

Let's build the generator function:

#![allow(unused)]
fn main() {
async fn generator(callback: extern "C" fn(i32, i32), complete: oneshot::Sender<()>, n: i32) {
    for i in 0..10 {
        // Simulate async work
        tokio::time::sleep(std::time::Duration::from_millis(100)).await;

        // Call the callback
        callback(i, n);
    }

    // Send the result back to the main thread
    complete.send(()).unwrap();
}
}

And finally, an interface function to allow it to be called externally:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn async_generator(callback: extern "C" fn(i32, i32), n: i32) {
    // Oneshot: so we know when the generator is done
    let (tx, rx) = oneshot::channel();

    // Send the command to Tokio
    let _ = COMMAND_TX
        .get()
        .unwrap()
        .blocking_send(Command::AsyncGenerator {
            callback,
            complete: tx,
            n,
        });

    // Wait for the response
    let _ = rx.blocking_recv().unwrap();
}
}

Now let's make a C directory. We're going to use C++ (so pthreads don't drive us insane). Here's the build script:

#!/bin/bash
CARGO_TARGET_DIR="tmp" cargo build
cp tmp/debug/libex21_async.a .
CARGO_TARGET_DIR="tmp" cargo clean
c++ -std=c++17 ex21.cc -o ex21 libex21_async.a
./ex21

And here's the C++ file ex21.cc:

#include <thread>
#include <vector>
#include <stdio.h>

// Notice that C++ requires the "extern "C" - it has name
// mangling, too.
extern "C" {
    void start_generator();
    void async_generator(void (*callback)(int, int), int n);
}

void thread_function(int i) {
    async_generator([](int generated, int thread_id) {
        printf("[%d] Called with %d\n", thread_id, generated);
    }, i);
}

int main() {
    // Launch the async Rust
    start_generator();

    // Create and start threads
    std::thread t1(thread_function, 1);
    std::thread t2(thread_function, 2);

    // Wait for all threads to finish
    t1.join();
    t2.join();
    
    return 0;
}

If you run this, you'll see that the threads are calling into the async runtime and values are being yielded. It's super-efficient, because the threads are put to sleep by the channel call (on the Rust side), and everything wakes up as needed.

A simple arithmetic generator isn't all that useful, but if you have Rust async code that uses databases, the network, or other naturally async properties---you can now link it into your C or C++.

C++

First of all: Everything we've learned about C also works with C++, with the exception of remembering the extern C blocks in C++.

C++ started as "C with classes" in 1983! The first C++ standard launched in 1998, and the standard library as we know it today appeared in 2005. In 2011, C++11 added working smart pointers. Containers added safe range-checked accessors, strings gained a safe wrapper, and we haven't had any memory errors since.

Snarkiness aside, we --- the Rust community --- owe a lot to C++. If you run into Mr. Stroustrup, shake his hand and be nice!

In particular:

Feature	Rust Feature
RAII	`Drop`
`unique_ptr`	`Box` - it's basically the same thing
`shared_ptr`	`Arc` and `Rc`
`std::string`	`string`
`std::vector`	`vec`
Exceptions	Ok, we dodged that bullet
Terrifying template meta-programming	Often-terrifying Generics

The C++ world has been in a bit of a panic, at least in America. The Department of Defense declared that new code has to be created in a memory safe language --- and a lot of defence contractors have legacy C++ code that's nearly as old as I am. It's not at all uncommon to go to train a company who have millions of lines of C++, gradually accumulated (and gradually adopting new language features, sometimes) over the decades.

So when you go into these companies, you face:

The team may not actually want to learn Rust. In fact, there's quite a bit of open hostility. r/cpp on Reddit says some really mean things about us. (We can help by not saying "rewrite it in Rust" quite so often).
There's always That One Guy who has been writing C++ since the time of the dinosaurs and is going to make your life as miserable as possible. Witness the Rust for Linux team facing a long-time Linux contributor screaming "YOU CAN'T MAKE ME LEARN RUST!" while the presenter tried to explain that they weren't asking him to.
Grief goes through Denial, Anger, Bargaining, Depression and Acceptance (Kübler-Ross). Sudden career-path changes are hard, it's natural. Unfortunately, sometimes the Rust community is on the wrong end of the processing chain---but be a good human and help!

A fellow named Sean Baxter recently proposed a "Safe C++" that implements:

A borrow checker.
Range-checked containers by default.
An unsafe tag, and requiring its use for naked pointer operations.

Sadly, the C++ standards committee reacted very badly. So Mr. Baxter came up with a second proposal that would tie C++ and Rust together, ensuring a baseline level of compatibility. Unfortunately, that went down so well that Mr. Baxter has changed careers.

Interop Would be Nice

I promise not to sing it, but Imagine by John Lennon should accompany this slide.

In an ideal world, your C++ and Rust code would work really nicely together. C++ classes would be readily accessible from Rust, and complex Rust types would be readily available from C++. The types that are basically the same would work together. This is an ongoing area of work for the Rust Foundation, and is greatly in-demand.

LLVM (and GCC) compiles both Rust and C++. Yet there is no common way to link them together, other than the C ABI. You can do a lot with that, but you lose a fair amount of the richness of both languages. It's two highly sophisticated individuals talking through a string telephone.

Unfortunately, we're not quite there yet. It's getting better.

CXX.RS

DTolnay - the author of everything that BurntSushi didn't write as far as I can tell - took a good stab at C++ interop with the cxx.rs project.

There's a fair amount of boilerplate involved, but with CXX.RS you can have a Rust project that directly works with C++. Not everything works, but it's a great step forward in using old C++ code in your new and shiny Rust project.

Walkthrough: instantiate a C++ class and use it

Create a new Rust project and add cxx as a dependency:

[package]
name = "simple_class"
version = "0.1.0"
edition = "2021"

[dependencies]
cxx = "1.0"

[build-dependencies]
cxx-build = "1.0"

Now let's build a header file. In include/simple_class.h:

#pragma once
#include <memory>

class SimpleClass {
    public:
    SimpleClass();
    void say_hello() const;
    ~SimpleClass();

    // An example of mutable class methods, which are a little harder.
    void set_counter(uint64_t value);

    private:
    uint64_t counter;
};

std::unique_ptr<SimpleClass> create_simple_class();

And in src/simple_class.cpp:

#include "simple_class.h"
#include <iostream>

SimpleClass::SimpleClass() {
    std::cout << "SimpleClass constructor\n";
    this->counter = 1;
}

SimpleClass::~SimpleClass() {
    std::cout << "SimpleClass destructor\n";
}

void SimpleClass::set_counter(uint64_t value) {
    this->counter = value;
}

void SimpleClass::say_hello() const {
    for (int i = 0; i < this->counter; i++) {
        std::cout << "Hello from SimpleClass run (" << i << ")\n";
    }
}

std::unique_ptr<SimpleClass> create_simple_class() {
    return std::make_unique<SimpleClass>();
}

So we've built a simple class that tells you when it is constructed or destructed. We've also exposed a function that creates an instance of the class in a unique_ptr.

Let's write out main.rs to use it:

#[cxx::bridge]
mod ffi {
    unsafe extern "C++" {
        // List each header to include
        include!("simple_class.h");

        // List classes as namespaces
        type SimpleClass;

        // Const methods are easiest, you can just use &self
        fn say_hello(&self);

        // Mutable methods require the Pin system below for `self`
        fn set_counter(self: Pin<&mut SimpleClass>, counter: u64);

        // Free function that creates an instance of the class.
        fn create_simple_class() -> UniquePtr<SimpleClass>;
    }

    extern "Rust" {
        // This is where we'll put functions to go the other way
    }
}

fn main() {
    // We create a unique ptr class. You'll see the constructor run.
    let mut simple_class = ffi::create_simple_class();

    // Calling say_hello is easy - it's immutable
    simple_class.say_hello();

    // You have to "pin" mutable methods to stop memory rearranging.
    simple_class.pin_mut().set_counter(2);    
    simple_class.say_hello();

    // The destructor fires
}

There's two halves here:

The cxx:bridge triggers the CXX library to generate intermediary types. We include the header file, define a Rust type alias for SimpleClass and create a function signature.
In main(), we actually run the program.

We still have to actually compile the thing. So we need a build.rs file:

// build.rs

fn main() {
    cxx_build::bridge("src/main.rs")
        .file("src/simple_class.cpp")
        .include("include")
        .std("c++14")
        .compile("simple_class");

    println!("cargo:rerun-if-changed=src/main.rs");
    println!("cargo:rerun-if-changed=src/simple_class.cpp");
    println!("cargo:rerun-if-changed=src/simple_class.h");
}

Running the program with cargo run compiles both the Rust and the C++, joins them together, and prints:

SimpleClass constructor
Hello from SimpleClass << (0)
Hello from SimpleClass << (0)
Hello from SimpleClass << (1)
SimpleClass destructor

It works!

CXX.RS The Other Way Around

Walkthrough: adding Rust to C++

Adding Rust to your C++ can also be done through cxx.rs.

In your main.rs file, add:

#![allow(unused)]
fn main() {
extern "Rust" {
    pub fn callback();
}
}

You can expose types in the same way. Declare type MyType; in the extern block.

Then below main, add the implementation:

#![allow(unused)]
fn main() {
pub fn callback() {
    println!("Callback called!");
}
}

In the simple_class.cpp file:

// This is auto-generated on build
#include "simple_callback/src/main.rs.h"

SimpleClass::SimpleClass() {
    std::cout << "SimpleClass constructor\n";
    this->counter = 1;
    callback();
}

And run the program and you get:

SimpleClass constructor
Callback called!
Hello from SimpleClass run (0)
Hello from SimpleClass run (0)
Hello from SimpleClass run (1)
SimpleClass destructor

Rust Foreign Function Interfaces C/C++