6 min read
On this page

Queueing Theory

Queueing theory mathematically models waiting lines (queues). It is fundamental to capacity planning, performance engineering, and system design in computer science.

Basic Concepts

A queueing system consists of:

  • Arrivals: Customers/requests arriving over time
  • Service: Processing of customers by servers
  • Queue: Buffer where customers wait
  • Discipline: Rule for selecting the next customer (FIFO, LIFO, priority, etc.)

Performance Metrics

  • λ: Arrival rate (customers per unit time)
  • μ: Service rate (customers per unit time per server)
  • ρ = λ/(sμ): Utilization (fraction of time servers are busy), s = number of servers
  • L: Average number of customers in the system (queue + service)
  • Lq: Average number of customers in the queue (waiting)
  • W: Average time in the system (waiting + service)
  • Wq: Average time in the queue (waiting only)
  • P₀: Probability system is empty

Stability condition: ρ < 1. If ρ ≥ 1, the queue grows without bound.

Kendall Notation

Queues are described by A/B/s/K/N/D:

| Position | Meaning | Common Values | |---|---|---| | A | Arrival process | M (Markovian/Poisson), D (deterministic), G (general) | | B | Service distribution | M, D, G | | s | Number of servers | 1, 2, ..., ∞ | | K | System capacity | ∞ (default, omitted) | | N | Population size | ∞ (default, omitted) | | D | Queue discipline | FIFO (default, omitted) |

M = Memoryless = exponential inter-arrival/service times.

Common models: M/M/1, M/M/c, M/G/1, M/D/1, G/G/1.

Little's Law

The most important result in queueing theory:

L = λW
  • L = average number in system
  • λ = average arrival rate
  • W = average time in system

Similarly: Lq = λWq.

And: W = Wq + 1/μ (total time = waiting time + service time).

Remarkably general: Holds for any queueing system in steady state, regardless of arrival distribution, service distribution, number of servers, or queue discipline.

Example: A web server handles λ = 100 requests/sec with average response time W = 50ms. Average requests in the system: L = 100 × 0.05 = 5.

M/M/1 Queue

The simplest non-trivial queue. Poisson arrivals (rate λ), exponential service (rate μ), single server.

Steady-State Results

ρ = λ/μ                    (utilization)
P₀ = 1 - ρ                 (probability system is empty)
Pₙ = (1-ρ)ρⁿ              (probability of n customers — geometric!)
L = ρ/(1-ρ) = λ/(μ-λ)     (average number in system)
Lq = ρ²/(1-ρ) = λ²/(μ(μ-λ))  (average number in queue)
W = 1/(μ-λ)                (average time in system)
Wq = ρ/(μ-λ) = λ/(μ(μ-λ))   (average waiting time)

Key Observations

  • As ρ → 1, L → ∞ and W → ∞ (system becomes overwhelmed)
  • At ρ = 0.5: L = 1, W = 2/μ
  • At ρ = 0.9: L = 9, W = 10/μ
  • At ρ = 0.99: L = 99, W = 100/μ

The nonlinearity near ρ = 1 is critical: Going from 90% to 99% utilization increases latency by 10×. This is why systems should not be run near full capacity.

Response Time Distribution

The time in system T follows an exponential distribution:

P(T > t) = e^(-(μ-λ)t)

Percentiles: p99 = -ln(0.01)/(μ-λ) ≈ 4.6 × mean.

M/M/c Queue

Poisson arrivals (rate λ), exponential service (rate μ), c servers.

ρ = λ/(cμ)     (per-server utilization, must be < 1)

Erlang C Formula

Probability that an arriving customer must wait:

C(c, λ/μ) = P(wait) = [1/(1-ρ)] · (cρ)^c/c! / [Σₖ₌₀^(c-1) (cρ)^k/k! + (cρ)^c/(c!(1-ρ))]

Performance Metrics

Wq = C(c, λ/μ) / (cμ - λ)
W = Wq + 1/μ
Lq = λWq
L = λW

Pooling effect: One queue feeding c servers (M/M/c) outperforms c separate M/M/1 queues. Pooling always reduces average waiting time.

M/G/1 Queue

Poisson arrivals, general service distribution (only mean 1/μ and variance σ² needed), single server.

Pollaczek-Khinchine Formula

Lq = ρ²(1 + C²_s) / (2(1-ρ))

where C²_s = σ²μ² is the squared coefficient of variation of service time.

Wq = Lq/λ
W = Wq + 1/μ
L = λW

Insight: Higher service time variability (larger C²_s) → longer queues. This is why consistent performance (low variance) matters more than just fast average performance.

| Service Distribution | C²_s | Lq | |---|---|---| | Deterministic (M/D/1) | 0 | ρ²/(2(1-ρ)) | | Exponential (M/M/1) | 1 | ρ²/(1-ρ) | | Hyperexponential | > 1 | Even worse |

M/D/1 has half the queue length of M/M/1 at the same utilization!

Birth-Death Processes

A continuous-time Markov chain where transitions only occur between adjacent states (n → n+1 or n → n-1).

Birth rates: λₙ (rate of arrival when n customers present) Death rates: μₙ (rate of departure when n customers present)

Balance equations: λₙPₙ = μₙ₊₁Pₙ₊₁.

Steady-state: Pₙ = P₀ · Πₖ₌₀^(n-1) λₖ/μₖ₊₁, where P₀ = 1 / Σ Πₖ₌₀^(n-1) λₖ/μₖ₊₁.

M/M/1, M/M/c, M/M/1/K are all birth-death processes with specific λₙ, μₙ.

Erlang Formulas

Erlang B (M/M/c/c — Loss System)

No queue — customers arriving when all c servers are busy are lost (blocked).

Blocking probability (Erlang B formula):

B(c, A) = (A^c/c!) / Σₖ₌₀^c (A^k/k!)

where A = λ/μ is the offered load (in Erlangs).

Application: Telephone networks (blocked calls are dropped).

Erlang C (M/M/c — Delay System)

Customers arriving when all servers are busy wait in queue.

The probability of waiting is the Erlang C formula (given above in M/M/c).

Application: Call centers (callers wait on hold).

Jackson Networks

A network of M/M/cᵢ queues where:

  • External Poisson arrivals at each node
  • Markovian routing (probabilistic routing between nodes)
  • Each node behaves as an independent M/M/c queue (with appropriate arrival rate)

Jackson's theorem: In steady state, each queue behaves as an independent M/M/c queue. The joint distribution is the product of individual queue distributions.

Burke's theorem: The departure process from an M/M/c queue in steady state is Poisson.

Open Jackson Networks

External arrivals and departures. Solve for arrival rates using traffic equations:

λᵢ = γᵢ + Σⱼ λⱼ pⱼᵢ

where γᵢ is external arrival rate and pⱼᵢ is routing probability from j to i.

Closed Jackson Networks

Fixed number of customers circulating. No external arrivals or departures.

Analyzed using Mean Value Analysis (MVA) or convolution algorithms.

Applications in CS

Capacity Planning

  • Web servers: How many servers needed to keep p99 latency under 200ms at peak load?
  • Database connection pools: Pool size vs. latency tradeoff.
  • Load balancers: M/M/c model with pooling shows benefits of shared queues.

System Design Guidelines

From queueing theory:

  1. Keep utilization below 70-80%: Latency increases superlinearly with utilization.
  2. Reduce variance: Consistent service times (M/D/1) are much better than variable (M/M/1).
  3. Pool resources: One large queue is better than many small ones.
  4. Add slack: Small utilization headroom prevents catastrophic latency spikes.
  5. Bound queue size: Finite queues (M/M/1/K) prevent memory exhaustion (at the cost of dropped requests).

Specific Applications

  • Thread pools: Fixed pool of c threads serving incoming requests → M/M/c.
  • Network routers: Packet buffers as queues. Buffer sizing uses queueing theory.
  • Disk I/O scheduling: Request queue for disk access. M/G/1 with various service distributions.
  • CPU scheduling: Ready queue modeled as M/G/1 or G/G/1.
  • Cloud auto-scaling: Scale servers to maintain target utilization/latency using queueing models.
  • Microservices: Each service is a queue. Jackson networks model service chains.
  • Message queues: Kafka, RabbitMQ — producer-consumer with buffering.
  • SLA/SLO setting: Use queueing theory to set achievable latency targets: "p99 < X ms at Y requests/sec."