Optimization Patterns

Once you have profiled and identified a bottleneck, you need concrete techniques to fix it. This topic covers the patterns that experienced Rust developers reach for: reducing allocation, reusing memory, choosing the right data structures, and knowing when optimization is premature.

Avoid Allocation: Borrow Instead of Own

The cheapest allocation is the one that never happens. When a function only needs to read data, accept a reference:

// BAD: forces the caller to allocate a String
fn greet(name: String) -> String {
    format!("Hello, {}", name)
}

// GOOD: borrows, no allocation required at the call site
fn greet(name: &str) -> String {
    format!("Hello, {}", name)
}

The same applies to collections:

// BAD: requires a Vec
fn sum(values: Vec<i64>) -> i64 {
    values.iter().sum()
}

// GOOD: accepts any slice
fn sum(values: &[i64]) -> i64 {
    values.iter().sum()
}

&str instead of String. &[T] instead of Vec<T>. This is the single most impactful optimization pattern in Rust. It avoids allocation at the call site and communicates intent: "I only need to read."

Reuse Allocations

When you must allocate, reuse the allocation across iterations:

// BAD: allocates a new Vec every iteration
fn process_batches(data: &[Vec<i32>]) -> Vec<i32> {
    let mut results = Vec::new();
    for batch in data {
        let processed: Vec<i32> = batch.iter().map(|x| x * 2).collect();
        results.extend_from_slice(&processed);
    }
    results
}

// GOOD: reuse the buffer
fn process_batches(data: &[Vec<i32>]) -> Vec<i32> {
    let mut results = Vec::new();
    let mut buffer = Vec::new();
    for batch in data {
        buffer.clear(); // reuse the allocation
        buffer.extend(batch.iter().map(|x| x * 2));
        results.extend_from_slice(&buffer);
    }
    results
}

Vec::clear() sets the length to zero but keeps the allocated memory. On the next iteration, extend fills the existing allocation instead of requesting new memory.

For strings:

let mut output = String::with_capacity(1024);
for item in items {
    output.clear();
    write!(output, "Item: {}", item).unwrap();
    send(&output);
}

Pre-Allocate with Capacity

When you know the final size, tell the allocator upfront:

// BAD: starts with capacity 0, grows multiple times
let mut results = Vec::new();
for i in 0..10_000 {
    results.push(i * 2);
}

// GOOD: one allocation, no reallocation
let mut results = Vec::with_capacity(10_000);
for i in 0..10_000 {
    results.push(i * 2);
}

Without with_capacity, Vec doubles its capacity each time it runs out of space. For 10,000 elements, that is approximately 14 reallocations and copies. With with_capacity, it is one allocation and zero copies.

The same applies to String::with_capacity and HashMap::with_capacity.

SmallVec for Small Collections

Many collections hold just a few elements. SmallVec stores them inline, avoiding heap allocation entirely:

use smallvec::SmallVec;

// Stores up to 4 elements inline, spills to heap beyond that
fn get_tags(item: &Item) -> SmallVec<[String; 4]> {
    let mut tags = SmallVec::new();
    if item.is_featured {
        tags.push("featured".to_string());
    }
    if item.is_new {
        tags.push("new".to_string());
    }
    tags
}

If most items have 4 or fewer tags, this never allocates. The data lives on the stack. Only when the count exceeds the inline capacity does it fall back to heap allocation.

Use SmallVec when profiling shows that a Vec with typically few elements is a hot allocation site.

Cow for Flexible Ownership

Cow (Clone on Write) defers allocation until mutation is needed:

use std::borrow::Cow;

fn normalize_name(name: &str) -> Cow<'_, str> {
    if name.contains(' ') {
        // Only allocate when we need to modify
        Cow::Owned(name.trim().to_lowercase())
    } else {
        // No allocation, just borrow the input
        Cow::Borrowed(name)
    }
}

fn main() {
    let a = normalize_name("alice");    // Borrowed, no allocation
    let b = normalize_name("  Bob  ");  // Owned, allocates
    println!("{} {}", a, b);
}

Cow is ideal for functions that usually return the input unchanged but occasionally need to modify it. It avoids unnecessary allocation in the common case.

Avoid Unnecessary Cloning

clone() is a red flag in performance-sensitive code. Each clone potentially allocates and copies:

// BAD: clones the entire HashMap
fn get_value(map: &HashMap<String, String>, key: &str) -> Option<String> {
    map.get(key).cloned()
}

// BETTER: return a reference
fn get_value<'a>(map: &'a HashMap<String, String>, key: &str) -> Option<&'a str> {
    map.get(key).map(|s| s.as_str())
}

When you find yourself cloning to satisfy the borrow checker, consider:

Can you restructure to use references?
Can you use Rc or Arc for shared ownership?
Is the data small enough that cloning is actually cheap (e.g., i64, small structs)?

HashMap Optimization

HashMap is a common bottleneck. A few techniques help:

use std::collections::HashMap;

// Pre-allocate
let mut map = HashMap::with_capacity(expected_size);

// Use entry API to avoid double lookups
map.entry(key.to_string())
    .and_modify(|count| *count += 1)
    .or_insert(1);

// Use &str for lookups when keys are String
// This avoids allocating a String just to look up a value
let value = map.get("key"); // works because String: Borrow<str>

For very hot HashMaps, consider rustc_hash::FxHashMap which uses a faster (non-cryptographic) hash function:

use rustc_hash::FxHashMap;

let mut map = FxHashMap::default();
map.insert("key", "value");

FxHash is roughly 2x faster than the default SipHash for small keys. Use it when you do not need protection against hash-flooding attacks.

String Optimization

Strings are one of the most common allocation sources:

// BAD: allocates for concatenation
let msg = "Hello, ".to_string() + &name + "!";

// GOOD: single allocation with known size
let msg = format!("Hello, {}!", name);

// BETTER for hot paths: write into a buffer
use std::fmt::Write;
let mut msg = String::with_capacity(7 + name.len() + 1);
write!(msg, "Hello, {}!", name).unwrap();

For static strings that are known at compile time, use &'static str or const:

const ERROR_MSG: &str = "Something went wrong";

No allocation at all. The string lives in the binary.

SIMD with std::simd

For CPU-bound inner loops processing arrays of numbers, SIMD (Single Instruction, Multiple Data) processes multiple elements per instruction:

#![feature(portable_simd)]
use std::simd::prelude::*;

fn sum_simd(data: &[f32]) -> f32 {
    let (prefix, chunks, suffix) = data.as_simd::<8>();

    let prefix_sum: f32 = prefix.iter().sum();
    let suffix_sum: f32 = suffix.iter().sum();

    let simd_sum = chunks
        .iter()
        .copied()
        .reduce(|a, b| a + b)
        .unwrap_or(f32x8::splat(0.0));

    prefix_sum + simd_sum.reduce_sum() + suffix_sum
}

This processes 8 floats at a time. On modern hardware, SIMD can deliver 4-8x speedups for numerical code. However, std::simd is still nightly-only. For stable Rust, use the packed_simd2 crate or write intrinsics directly.

SIMD is a last resort. Most code does not need it. Profile first.

When Optimization Matters

Not all code needs optimization. Focus on:

Hot loops that run millions of times
Allocation-heavy paths in request handlers
Serialization and deserialization in network services
Inner loops in data processing pipelines

Do not optimize:

Startup code that runs once
Error paths that rarely execute
Readability for marginal gains

The rule of thumb: if a function handles fewer than 1,000 calls per second and takes less than 1ms, optimization is premature. Spend your time on architecture and correctness instead.

Common Pitfalls

Optimizing without profiling. You will optimize the wrong thing. Measure first.
Sacrificing readability for micro-gains. A 2% speedup that makes code unreadable is rarely worth it.
Using unsafe for performance. Most performance wins come from better algorithms and fewer allocations, not from bypassing safety checks. unsafe should be the last tool, not the first.
Ignoring algorithmic complexity. No amount of allocation optimization fixes an O(n^2) algorithm. Fix the algorithm first.
Over-using SmallVec and Cow. These add complexity. Use them only when profiling shows the allocation is actually a bottleneck.

Key Takeaways

Borrow instead of owning: &str over String, &[T] over Vec<T>.
Reuse allocations with clear() instead of creating new collections.
Pre-allocate with with_capacity when you know the size.
SmallVec avoids heap allocation for small, fixed-size collections.
Cow defers allocation until mutation is needed.
FxHashMap is faster than the default HashMap for non-adversarial inputs.
Profile before optimizing. Fix algorithms before fixing allocations.