Embedded Challenges
There's a lot of different types of embedded out there. "Embedded" can mean a full-featured Raspberry PI 4---or a tiny microcontroller. Different platforms will have differing levels of support for embedded Rust. LLVM currently bounds which platforms you can target; Rust on GCC is advancing rapidly but isn't ready for production yet.
Minimizing Binary Size
For size-constrained builds, Rust has a lot of options:
Optimize for Size
In Cargo.toml
, you can specify optimization levels by profile. Add this to the Cargo.toml
file:
[profile.release]
opt-level = "s"
Run cargo build --release
. It'll take a moment, it has to recompile every dependency and also optimize the dependency for size.
On Windows, the resulting binary is now: 510,976 bytes (499 kb). A small improvement.
There's also an optimization level named "z". Let's see if it does any better?
[profile.release]
opt-level = "z"
It weighs in at 509,440 bytes (497.5 kb). A very tiny improvement.
Strip the binary
In Cargo.toml
, let's also strip the binary of symbols.
[profile.release]
opt-level = "z"
strip = true # Automatically strip symbols
Compiling again, this reduces the binary to 508,928 (497 kb).
Enable LTO
In Cargo.toml
, let's enable link-time optimization. This optimizes across crate boundaries, at the expense of a SLOW compile.
[profile.release]
opt-level = "z"
strip = true # Automatically strip symbols
lto = true
We're down to 438,272 bytes (428 kb). Getting better!
Reduce Codegen Units
By default, Rust parallelizes builds across all of your CPUs - which can prevent some optimizations. Let's make our compilation even slower in the name of a small binary:
[profile.release]
opt-level = "z"
strip = true # Automatically strip symbols
lto = true
codegen-units = 1
You may have to run cargo clean
before building this.
Our binary is now 425,472 bytes (415 kb). Another small improvement.
Abort on Panic
A surprising amount of a Rust binary is the "panic handler". Similar to an exception handler in C++, it adds some hefty code to unwind the stack and provide detailed traces on crashes. We can turn this behavior off:
[profile.release]
opt-level = "z"
strip = true # Automatically strip symbols
lto = true
codegen-units = 1
panic = "abort"
This reduces my binary to 336,896 bytes (329 kb). That's a big improvement! The downside is that if your program panics, it won't be able to tell you all the details about how it died.
Heavy Measures: Optimize the Standard Library for Size
If you don't have nightly
installed, you will need it:
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly
Then find out your current build target:
rustc -vV
And use that target to issue a build that includes the standard library:
cargo +nightly build -Z build-std=std,panic_abort -Z build-std-features=panic_immediate_abort --target x86_64-apple-darwin --release
The binary goes into target/(platform)/release
. There's a pretty substantial size improvement: 177,152 bytes (173 kb)
That's about as small as it gets without using a different standard library. Let's see what we can do about the dependencies.
Using Cargo Bloat
Install a tool, cargo install cargo-bloat
. And run cargo bloat
to see exactly where your binary size is going.
Building Without the Standard Library
If you are on a platform without standard library support (or for really small builds), you can combine these steps with adding #[no_std]
to your binary. You can still opt-in to parts of the library with core::
---depending upon what is available. This can also be useful for WASM builds in the browser. You can also use extern crate alloc
to opt-in to a Rust-provided allocator:
#![allow(unused)] #![no_std] fn main() { extern crate alloc; }
This allows you to use Vec
and similar in your code. You don't have the full standard library, but it's a pretty pleasant environment.
Using a Different Allocator
Rust defaults to using your platform's allocator. It used to use jemallocator
, but that didn't work properly on all platforms. Jem is amazing---it offers memory usage profiling, a pool-based system that minimizes the penalty for reallocation, and can improve the performance of real-time sytems significantly. The LibreQoS project adopted it for real-time packet analysis, and saw runtime performance improvements up to 15%.
To opt-in to jemalloc
, add the following to Cargo.toml
:
[target.'cfg(any(target_arch = "x86", target_arch = "x86_64"))'.dependencies]
jemallocator = "0.5"
And add this to your main.rs
file (outside of any functions):
#![allow(unused)] fn main() { // Use JemAllocator only on supported platforms #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] use jemallocator::Jemalloc; #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] #[global_allocator] static GLOBAL: Jemalloc = Jemalloc; }
The rest of the Rust system will pickup on these changes and use Jem. There are quite a few other allocation systems available.