5 min read
On this page

Threads

A thread is a lightweight unit of execution within a process. Threads within the same process share the same address space but have independent execution stacks and register state.

Thread vs Process

| Aspect | Process | Thread | |---|---|---| | Address space | Own (isolated) | Shared with other threads | | Creation cost | Heavy (fork, COW pages) | Light (just stack + registers) | | Context switch | Expensive (TLB flush) | Cheap (no address space switch) | | Communication | IPC (pipes, sockets, shm) | Shared memory (direct) | | Failure isolation | Crash doesn't affect others | Crash kills entire process | | Memory overhead | Full address space | Just stack (~2-8 MB default) |

What Threads Share

  • Code (text segment)
  • Global data, heap
  • Open file descriptors
  • Signal handlers
  • Current working directory
  • User/group ID

What Threads Have Privately

  • Thread ID
  • Program counter and registers
  • Stack (each thread has its own)
  • Signal mask
  • errno (on many systems)
  • Thread-local storage (TLS)

Threading Models

User-Level Threads (M:1)

Many user threads mapped to one kernel thread. The thread library manages threads entirely in user space.

User space:  [T1] [T2] [T3] [T4]
                    |
Kernel:          [KT1]

Advantages: Fast creation and switching (no syscalls). Customizable scheduling. Disadvantages: One blocking syscall blocks ALL threads. Can't use multiple cores (kernel sees one thread).

Examples: Early green threads, cooperative multitasking libraries.

Kernel-Level Threads (1:1)

Each user thread maps to one kernel thread. The kernel manages all threads.

User space:  [T1] [T2] [T3] [T4]
              |    |    |    |
Kernel:     [KT1][KT2][KT3][KT4]

Advantages: True parallelism on multi-core. Blocking one thread doesn't block others. Disadvantages: Thread creation requires syscall. Context switch goes through kernel.

Examples: Linux (NPTL), Windows, macOS. This is the standard model today.

Linux implements threads as lightweight processes using clone() with shared address space.

Hybrid (M:N)

M user threads mapped to N kernel threads (M > N typically).

User space:  [T1] [T2] [T3] [T4] [T5] [T6]
              |    |    |    |    |    |
Kernel:     [KT1]   [KT2]   [KT3]

Advantages: Best of both worlds. More user threads than kernel threads. Flexible scheduling. Disadvantages: Complex implementation. Tricky synchronization between user and kernel schedulers.

Examples: Go goroutines (M goroutines on N OS threads), Erlang (M processes on N schedulers), early Solaris, Windows UMS.

POSIX Threads (pthreads)

The standard threading API on UNIX/Linux.

Thread Lifecycle

#include <pthread.h>

void *worker(void *arg) {
    int id = *(int *)arg;
    printf("Thread %d running\n", id);
    return NULL;
}

int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};

    for (int i = 0; i < 4; i++) {
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);  // wait for completion
    }
}

Rust Threads

PROCEDURE MAIN()
    handles ← empty list
    FOR id ← 0 TO 3
        h ← SPAWN_THREAD(PROCEDURE()
            PRINT "Thread ", id, " running"
        )
        APPEND h TO handles

    FOR EACH handle IN handles
        JOIN(handle)

Rust's ownership system prevents data races at compile time.

Thread Pools

Pre-create a fixed number of threads. Queue tasks for execution by pool threads.

// Using a thread pool for parallel iteration
results ← PARALLEL_MAP(0..1000, x -> EXPENSIVE_COMPUTATION(x))

Advantages: Avoid thread creation overhead. Limit resource usage. Automatic load balancing.

Sizing: Typically:

  • CPU-bound tasks: num_threads ≈ num_cores
  • I/O-bound tasks: num_threads ≈ num_cores × (1 + wait_time / compute_time)

Thread-Local Storage (TLS)

Variables that are private to each thread despite being "global" in code.

// Thread-local variable: each thread has its own copy
THREAD_LOCAL CACHE ← empty list

// Access the thread-local cache
APPEND "item" TO CACHE

Use cases: Per-thread caches, errno (on some systems), random number generators, memory allocators (per-thread arenas).

Green Threads

User-space threads managed by a runtime, not the OS kernel.

Go Goroutines

go func() {
    fmt.Println("Hello from goroutine")
}()
  • Initial stack: ~2 KB (grows dynamically, up to 1 GB)
  • M:N scheduling: goroutines multiplexed onto OS threads
  • Millions of goroutines on a few OS threads
  • Cooperative scheduling (yield at function calls, channel operations, syscalls)
  • Work-stealing scheduler: Idle threads steal goroutines from busy threads' run queues

Fibers

Cooperative user-space threads. Must explicitly yield. Used in game engines, coroutine libraries.

Fiber A: compute → yield → compute → yield
Fiber B:           compute → yield → compute

Coroutines

Generalizations of functions that can suspend and resume. Stackful (like fibers/green threads) or stackless (like Rust async).

Rust async/await: Stackless coroutines compiled to state machines. Very lightweight — each future is typically 100-200 bytes.

Threading Challenges

Data Races

Two threads access the same memory, at least one writes, without synchronization.

// Data race example — prevented at compile time by ownership rules
data ← [1, 2, 3]
SPAWN_THREAD(PROCEDURE()
    APPEND 4 TO data      // ERROR: can't modify data from another thread
)
PRINT data                 // while it's accessed here

Rust's type system (Send + Sync traits) prevents data races at compile time. Other languages rely on programmer discipline + runtime tools (ThreadSanitizer).

Deadlock

Thread A holds lock X, waits for lock Y. Thread B holds lock Y, waits for lock X. Both wait forever.

Priority Inversion

A low-priority thread holds a lock needed by a high-priority thread. The high-priority thread is blocked by the low-priority thread.

Solutions: Priority inheritance (boost the lock-holding thread's priority), priority ceiling.

Thread Safety

A function/data structure is thread-safe if it can be safely called from multiple threads simultaneously.

Strategies: Locks, lock-free algorithms, immutability, thread-local storage, message passing.

Comparison

| Technology | Creation Cost | Memory | Parallelism | Scheduling | |---|---|---|---|---| | OS process | ~1 ms | ~MB+ | Yes (multi-core) | Preemptive (kernel) | | OS thread | ~10-100 μs | ~2-8 MB stack | Yes (multi-core) | Preemptive (kernel) | | Green thread (Go) | ~1 μs | ~2-8 KB stack | Yes (M:N) | Cooperative + preemptive | | Async task (Rust) | ~100 ns | ~100-200 B | Yes (on thread pool) | Cooperative | | Fiber | ~1 μs | ~KB | No (single thread) | Cooperative |

Applications in CS

  • Web servers: Thread per connection (Apache), thread pool (Tomcat), async (nginx, Tokio).
  • Databases: Thread per connection (PostgreSQL backend processes), thread pool (MySQL).
  • GUI applications: Main thread for UI, worker threads for computation. UI must be updated from the main thread.
  • Game engines: Main thread (game logic), render thread, audio thread, worker pool (tasks).
  • Parallel algorithms: Parallel sort, parallel search, MapReduce workers.
  • Mobile apps: Main thread (UI) + background threads (network, computation). Android: strict main-thread rules.