File I/O & Streams

CLI tools live and die by how they handle files and streams. Reading stdin, writing stdout, processing large files without loading them into memory, and giving useful error messages when a file does not exist — these are the basics that separate a script from a tool.

`std::fs` for File Operations

The std::fs module provides synchronous file operations:

use std::fs;
use std::path::Path;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read entire file into a String
    let content = fs::read_to_string("config.toml")?;
    println!("Config: {} bytes", content.len());

    // Read entire file into bytes
    let bytes = fs::read("image.png")?;
    println!("Image: {} bytes", bytes.len());

    // Write a string to a file (creates or overwrites)
    fs::write("output.txt", "Hello, world!\n")?;

    // Append to a file
    use std::io::Write;
    let mut file = fs::OpenOptions::new()
        .append(true)
        .create(true)
        .open("log.txt")?;
    writeln!(file, "New log entry")?;

    // Create directories
    fs::create_dir_all("output/reports/2024")?;

    // Copy and rename
    fs::copy("output.txt", "output_backup.txt")?;
    fs::rename("output_backup.txt", "backup.txt")?;

    // Check existence
    if Path::new("config.toml").exists() {
        println!("Config file found");
    }

    Ok(())
}

fs::read_to_string and fs::write are convenient for small files. For anything larger, use buffered I/O.

BufReader & BufWriter

Unbuffered I/O makes a system call for every read or write. Buffered I/O batches operations, which is dramatically faster for line-by-line processing:

use std::fs::File;
use std::io::{BufRead, BufReader, BufWriter, Write};

fn count_lines(path: &str) -> Result<usize, Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);
    let count = reader.lines().count();
    Ok(count)
}

fn write_numbered_lines(
    input_path: &str,
    output_path: &str,
) -> Result<(), Box<dyn std::error::Error>> {
    let input = File::open(input_path)?;
    let reader = BufReader::new(input);

    let output = File::create(output_path)?;
    let mut writer = BufWriter::new(output);

    for (i, line) in reader.lines().enumerate() {
        let line = line?;
        writeln!(writer, "{:4}: {}", i + 1, line)?;
    }

    writer.flush()?;
    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let lines = count_lines("src/main.rs")?;
    println!("Lines: {}", lines);

    write_numbered_lines("src/main.rs", "numbered.txt")?;
    println!("Wrote numbered output");

    Ok(())
}

BufReader::lines() returns an iterator of Result<String>. Each line is read on demand — the entire file is never in memory at once.

Reading stdin & Writing stdout

CLI tools should work with pipes. Read from stdin when no file argument is provided:

use std::io::{self, BufRead, Write, BufWriter};
use std::fs::File;
use std::path::PathBuf;

fn get_reader(path: Option<&PathBuf>) -> Result<Box<dyn BufRead>, io::Error> {
    match path {
        Some(p) => {
            let file = File::open(p)?;
            Ok(Box::new(io::BufReader::new(file)))
        }
        None => Ok(Box::new(io::BufReader::new(io::stdin()))),
    }
}

fn process(reader: Box<dyn BufRead>) -> Result<(), io::Error> {
    let stdout = io::stdout();
    let mut writer = BufWriter::new(stdout.lock());

    for line in reader.lines() {
        let line = line?;
        let upper = line.to_uppercase();
        writeln!(writer, "{}", upper)?;
    }

    Ok(())
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // In a real tool, this would come from clap
    let path: Option<PathBuf> = std::env::args().nth(1).map(PathBuf::from);
    let reader = get_reader(path.as_ref())?;
    process(reader)?;
    Ok(())
}

$ echo "hello world" | tool
HELLO WORLD

$ tool input.txt
CONTENTS IN UPPERCASE

Lock stdout once and wrap it in BufWriter for performance. Without this, every println! acquires and releases the stdout lock — measurably slow when processing millions of lines.

Path Handling

Use PathBuf and Path instead of raw strings. They handle platform differences (forward vs backward slashes) and provide useful methods:

use std::path::{Path, PathBuf};

fn process_path() {
    let path = PathBuf::from("/Users/alice/documents/report.pdf");

    println!("File name: {:?}", path.file_name());    // Some("report.pdf")
    println!("Extension: {:?}", path.extension());     // Some("pdf")
    println!("Stem: {:?}", path.file_stem());          // Some("report")
    println!("Parent: {:?}", path.parent());           // Some("/Users/alice/documents")

    // Build paths safely
    let mut output = path.parent().unwrap().to_path_buf();
    output.push("processed");
    output.push("report_v2.pdf");
    println!("Output: {}", output.display());
    // /Users/alice/documents/processed/report_v2.pdf

    // Check properties
    let p = Path::new("src/main.rs");
    println!("Is absolute: {}", p.is_absolute());       // false
    println!("Has extension rs: {}", p.extension() == Some("rs".as_ref()));
}

For joining paths, use .join() instead of string concatenation:

let base = PathBuf::from("/var/data");
let full = base.join("2024").join("report.csv");
// /var/data/2024/report.csv

Error Context for File Operations

Raw io::Error says "No such file or directory" without telling you which file. Add context:

use std::fs;
use std::path::Path;

// Without context — unhelpful error message
fn bad_read(path: &Path) -> Result<String, Box<dyn std::error::Error>> {
    Ok(fs::read_to_string(path)?)
    // Error: No such file or directory (os error 2)
}

// With context using anyhow
use anyhow::{Context, Result};

fn good_read(path: &Path) -> Result<String> {
    fs::read_to_string(path)
        .with_context(|| format!("failed to read {}", path.display()))
    // Error: failed to read config/settings.toml
    //   Caused by: No such file or directory (os error 2)
}

// With context using manual wrapping (no extra dependency)
fn manual_context(path: &Path) -> Result<String, String> {
    fs::read_to_string(path)
        .map_err(|e| format!("failed to read {}: {}", path.display(), e))
}

Always include the file path in error messages. Users need to know which file failed, not just that some file operation failed.

Processing Large Files Line by Line

For files that do not fit in memory, process them as streams:

use std::fs::File;
use std::io::{BufRead, BufReader};
use std::collections::HashMap;

fn word_frequency(path: &str) -> Result<HashMap<String, usize>, Box<dyn std::error::Error>> {
    let file = File::open(path)?;
    let reader = BufReader::new(file);
    let mut counts: HashMap<String, usize> = HashMap::new();

    for line in reader.lines() {
        let line = line?;
        for word in line.split_whitespace() {
            let word = word.to_lowercase();
            *counts.entry(word).or_insert(0) += 1;
        }
    }

    Ok(counts)
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let counts = word_frequency("large_file.txt")?;

    // Print top 10
    let mut top: Vec<_> = counts.into_iter().collect();
    top.sort_by(|a, b| b.1.cmp(&a.1));

    for (word, count) in top.iter().take(10) {
        println!("{:>8} {}", count, word);
    }

    Ok(())
}

This processes a multi-gigabyte file using only a few kilobytes of buffer memory (plus the HashMap). The BufReader handles the buffering; lines() yields one line at a time.

For binary data, use read with a fixed buffer:

use std::fs::File;
use std::io::Read;

fn count_bytes(path: &str) -> Result<u64, Box<dyn std::error::Error>> {
    let mut file = File::open(path)?;
    let mut buf = [0u8; 8192];
    let mut total = 0u64;

    loop {
        let n = file.read(&mut buf)?;
        if n == 0 {
            break;
        }
        total += n as u64;
    }

    Ok(total)
}

Common Pitfalls

Reading entire files into memory — fs::read_to_string loads everything at once. For files larger than available RAM, this panics or thrashes swap. Use BufReader for large files.
Forgetting to flush BufWriter — BufWriter flushes on drop, but if an error occurs during flush, it is silently ignored. Call .flush() explicitly before exiting.
String paths instead of PathBuf — String does not handle non-UTF-8 paths (which exist on Linux). Use PathBuf and Path for correctness.
No error context — "Permission denied" is useless without the file path. Always include the path in error messages.
Unbuffered stdout in loops — each println! locks and unlocks stdout. For tight loops, lock once and use BufWriter.
Not handling stdin — a CLI tool that only accepts file arguments cannot be piped. Always support stdin as a fallback.

Key Takeaways

Use BufReader and BufWriter for all non-trivial file I/O. The performance difference is enormous.
Support both file arguments and stdin. Use Box<dyn BufRead> to abstract over the source.
Always include the file path in error messages. Use anyhow::Context or manual formatting.
Process large files line by line with reader.lines(). Never load a file of unknown size into memory.
Lock stdout once and use BufWriter when writing many lines in a loop.
Use PathBuf and Path, not String, for file paths.