5 min read
On this page

Threads

A thread is a lightweight unit of execution within a process. Threads within the same process share the same address space but have independent execution stacks and register state.

Thread vs Process

Aspect Process Thread
Address space Own (isolated) Shared with other threads
Creation cost Heavy (fork, COW pages) Light (just stack + registers)
Context switch Expensive (TLB flush) Cheap (no address space switch)
Communication IPC (pipes, sockets, shm) Shared memory (direct)
Failure isolation Crash doesn't affect others Crash kills entire process
Memory overhead Full address space Just stack (~2-8 MB default)

What Threads Share

  • Code (text segment)
  • Global data, heap
  • Open file descriptors
  • Signal handlers
  • Current working directory
  • User/group ID

What Threads Have Privately

  • Thread ID
  • Program counter and registers
  • Stack (each thread has its own)
  • Signal mask
  • errno (on many systems)
  • Thread-local storage (TLS)

Threading Models

User-Level Threads (M:1)

Many user threads mapped to one kernel thread. The thread library manages threads entirely in user space.

User space:  [T1] [T2] [T3] [T4]
                    |
Kernel:          [KT1]

Advantages: Fast creation and switching (no syscalls). Customizable scheduling. Disadvantages: One blocking syscall blocks ALL threads. Can't use multiple cores (kernel sees one thread).

Examples: Early green threads, cooperative multitasking libraries.

Kernel-Level Threads (1:1)

Each user thread maps to one kernel thread. The kernel manages all threads.

User space:  [T1] [T2] [T3] [T4]
              |    |    |    |
Kernel:     [KT1][KT2][KT3][KT4]

Advantages: True parallelism on multi-core. Blocking one thread doesn't block others. Disadvantages: Thread creation requires syscall. Context switch goes through kernel.

Examples: Linux (NPTL), Windows, macOS. This is the standard model today.

Linux implements threads as lightweight processes using clone() with shared address space.

Hybrid (M:N)

M user threads mapped to N kernel threads (M > N typically).

User space:  [T1] [T2] [T3] [T4] [T5] [T6]
              |    |    |    |    |    |
Kernel:     [KT1]   [KT2]   [KT3]

Advantages: Best of both worlds. More user threads than kernel threads. Flexible scheduling. Disadvantages: Complex implementation. Tricky synchronization between user and kernel schedulers.

Examples: Go goroutines (M goroutines on N OS threads), Erlang (M processes on N schedulers), early Solaris, Windows UMS.

POSIX Threads (pthreads)

The standard threading API on UNIX/Linux.

Thread Lifecycle

#include <pthread.h>

void *worker(void *arg) {
    int id = *(int *)arg;
    printf("Thread %d running\n", id);
    return NULL;
}

int main() {
    pthread_t threads[4];
    int ids[4] = {0, 1, 2, 3};

    for (int i = 0; i < 4; i++) {
        pthread_create(&threads[i], NULL, worker, &ids[i]);
    }
    for (int i = 0; i < 4; i++) {
        pthread_join(threads[i], NULL);  // wait for completion
    }
}

Rust Threads

PROCEDURE MAIN()
    handles ← empty list
    FOR id ← 0 TO 3
        h ← SPAWN_THREAD(PROCEDURE()
            PRINT "Thread ", id, " running"
        )
        APPEND h TO handles

    FOR EACH handle IN handles
        JOIN(handle)

Rust's ownership system prevents data races at compile time.

Thread Pools

Pre-create a fixed number of threads. Queue tasks for execution by pool threads.

// Using a thread pool for parallel iteration
results ← PARALLEL_MAP(0..1000, x -> EXPENSIVE_COMPUTATION(x))

Advantages: Avoid thread creation overhead. Limit resource usage. Automatic load balancing.

Sizing: Typically:

  • CPU-bound tasks: num_threads ≈ num_cores
  • I/O-bound tasks: num_threads ≈ num_cores × (1 + wait_time / compute_time)

Thread-Local Storage (TLS)

Variables that are private to each thread despite being "global" in code.

// Thread-local variable: each thread has its own copy
THREAD_LOCAL CACHE ← empty list

// Access the thread-local cache
APPEND "item" TO CACHE

Use cases: Per-thread caches, errno (on some systems), random number generators, memory allocators (per-thread arenas).

Green Threads

User-space threads managed by a runtime, not the OS kernel.

Go Goroutines

go func() {
    fmt.Println("Hello from goroutine")
}()
  • Initial stack: ~2 KB (grows dynamically, up to 1 GB)
  • M:N scheduling: goroutines multiplexed onto OS threads
  • Millions of goroutines on a few OS threads
  • Cooperative scheduling (yield at function calls, channel operations, syscalls)
  • Work-stealing scheduler: Idle threads steal goroutines from busy threads' run queues

Fibers

Cooperative user-space threads. Must explicitly yield. Used in game engines, coroutine libraries.

Fiber A: compute → yield → compute → yield
Fiber B:           compute → yield → compute

Coroutines

Generalizations of functions that can suspend and resume. Stackful (like fibers/green threads) or stackless (like Rust async).

Rust async/await: Stackless coroutines compiled to state machines. Very lightweight — each future is typically 100-200 bytes.

Threading Challenges

Data Races

Two threads access the same memory, at least one writes, without synchronization.

// Data race example — prevented at compile time by ownership rules
data ← [1, 2, 3]
SPAWN_THREAD(PROCEDURE()
    APPEND 4 TO data      // ERROR: can't modify data from another thread
)
PRINT data                 // while it's accessed here

Rust's type system (Send + Sync traits) prevents data races at compile time. Other languages rely on programmer discipline + runtime tools (ThreadSanitizer).

Deadlock

Thread A holds lock X, waits for lock Y. Thread B holds lock Y, waits for lock X. Both wait forever.

Priority Inversion

A low-priority thread holds a lock needed by a high-priority thread. The high-priority thread is blocked by the low-priority thread.

Solutions: Priority inheritance (boost the lock-holding thread's priority), priority ceiling.

Thread Safety

A function/data structure is thread-safe if it can be safely called from multiple threads simultaneously.

Strategies: Locks, lock-free algorithms, immutability, thread-local storage, message passing.

Comparison

Technology Creation Cost Memory Parallelism Scheduling
OS process ~1 ms ~MB+ Yes (multi-core) Preemptive (kernel)
OS thread ~10-100 μs ~2-8 MB stack Yes (multi-core) Preemptive (kernel)
Green thread (Go) ~1 μs ~2-8 KB stack Yes (M:N) Cooperative + preemptive
Async task (Rust) ~100 ns ~100-200 B Yes (on thread pool) Cooperative
Fiber ~1 μs ~KB No (single thread) Cooperative

Applications in CS

  • Web servers: Thread per connection (Apache), thread pool (Tomcat), async (nginx, Tokio).
  • Databases: Thread per connection (PostgreSQL backend processes), thread pool (MySQL).
  • GUI applications: Main thread for UI, worker threads for computation. UI must be updated from the main thread.
  • Game engines: Main thread (game logic), render thread, audio thread, worker pool (tasks).
  • Parallel algorithms: Parallel sort, parallel search, MapReduce workers.
  • Mobile apps: Main thread (UI) + background threads (network, computation). Android: strict main-thread rules.