Queueing Theory

Queueing theory mathematically models waiting lines (queues). It is fundamental to capacity planning, performance engineering, and system design in computer science.

Basic Concepts

A queueing system consists of:

Arrivals: Customers/requests arriving over time
Service: Processing of customers by servers
Queue: Buffer where customers wait
Discipline: Rule for selecting the next customer (FIFO, LIFO, priority, etc.)

Performance Metrics

λ: Arrival rate (customers per unit time)
μ: Service rate (customers per unit time per server)
ρ = λ/(sμ): Utilization (fraction of time servers are busy), s = number of servers
L: Average number of customers in the system (queue + service)
Lq: Average number of customers in the queue (waiting)
W: Average time in the system (waiting + service)
Wq: Average time in the queue (waiting only)
P₀: Probability system is empty

Stability condition: ρ < 1. If ρ ≥ 1, the queue grows without bound.

Kendall Notation

Queues are described by A/B/s/K/N/D:

Position	Meaning	Common Values
A	Arrival process	M (Markovian/Poisson), D (deterministic), G (general)
B	Service distribution	M, D, G
s	Number of servers	1, 2, ..., ∞
K	System capacity	∞ (default, omitted)
N	Population size	∞ (default, omitted)
D	Queue discipline	FIFO (default, omitted)

M = Memoryless = exponential inter-arrival/service times.

Common models: M/M/1, M/M/c, M/G/1, M/D/1, G/G/1.

Little's Law

The most important result in queueing theory:

L = λW

L = average number in system
λ = average arrival rate
W = average time in system

Similarly: Lq = λWq.

And: W = Wq + 1/μ (total time = waiting time + service time).

Remarkably general: Holds for any queueing system in steady state, regardless of arrival distribution, service distribution, number of servers, or queue discipline.

Example: A web server handles λ = 100 requests/sec with average response time W = 50ms. Average requests in the system: L = 100 × 0.05 = 5.

M/M/1 Queue

The simplest non-trivial queue. Poisson arrivals (rate λ), exponential service (rate μ), single server.

Steady-State Results

ρ = λ/μ                    (utilization)
P₀ = 1 - ρ                 (probability system is empty)
Pₙ = (1-ρ)ρⁿ              (probability of n customers — geometric!)
L = ρ/(1-ρ) = λ/(μ-λ)     (average number in system)
Lq = ρ²/(1-ρ) = λ²/(μ(μ-λ))  (average number in queue)
W = 1/(μ-λ)                (average time in system)
Wq = ρ/(μ-λ) = λ/(μ(μ-λ))   (average waiting time)

Key Observations

As ρ → 1, L → ∞ and W → ∞ (system becomes overwhelmed)
At ρ = 0.5: L = 1, W = 2/μ
At ρ = 0.9: L = 9, W = 10/μ
At ρ = 0.99: L = 99, W = 100/μ

The nonlinearity near ρ = 1 is critical: Going from 90% to 99% utilization increases latency by 10×. This is why systems should not be run near full capacity.

Response Time Distribution

The time in system T follows an exponential distribution:

P(T > t) = e^(-(μ-λ)t)

Percentiles: p99 = -ln(0.01)/(μ-λ) ≈ 4.6 × mean.

M/M/c Queue

Poisson arrivals (rate λ), exponential service (rate μ), c servers.

ρ = λ/(cμ)     (per-server utilization, must be < 1)

Erlang C Formula

Probability that an arriving customer must wait:

C(c, λ/μ) = P(wait) = [1/(1-ρ)] · (cρ)^c/c! / [Σₖ₌₀^(c-1) (cρ)^k/k! + (cρ)^c/(c!(1-ρ))]

Performance Metrics

Wq = C(c, λ/μ) / (cμ - λ)
W = Wq + 1/μ
Lq = λWq
L = λW

Pooling effect: One queue feeding c servers (M/M/c) outperforms c separate M/M/1 queues. Pooling always reduces average waiting time.

M/G/1 Queue

Poisson arrivals, general service distribution (only mean 1/μ and variance σ² needed), single server.

Pollaczek-Khinchine Formula

Lq = ρ²(1 + C²_s) / (2(1-ρ))

where C²_s = σ²μ² is the squared coefficient of variation of service time.

Wq = Lq/λ
W = Wq + 1/μ
L = λW

Insight: Higher service time variability (larger C²_s) → longer queues. This is why consistent performance (low variance) matters more than just fast average performance.

Service Distribution	C²_s	Lq
Deterministic (M/D/1)	0	ρ²/(2(1-ρ))
Exponential (M/M/1)	1	ρ²/(1-ρ)
Hyperexponential	> 1	Even worse

M/D/1 has half the queue length of M/M/1 at the same utilization!

Birth-Death Processes

A continuous-time Markov chain where transitions only occur between adjacent states (n → n+1 or n → n-1).

Birth rates: λₙ (rate of arrival when n customers present) Death rates: μₙ (rate of departure when n customers present)

Balance equations: λₙPₙ = μₙ₊₁Pₙ₊₁.

Steady-state: Pₙ = P₀ · Πₖ₌₀^(n-1) λₖ/μₖ₊₁, where P₀ = 1 / Σ Πₖ₌₀^(n-1) λₖ/μₖ₊₁.

M/M/1, M/M/c, M/M/1/K are all birth-death processes with specific λₙ, μₙ.

Erlang Formulas

Erlang B (M/M/c/c — Loss System)

No queue — customers arriving when all c servers are busy are lost (blocked).

Blocking probability (Erlang B formula):

B(c, A) = (A^c/c!) / Σₖ₌₀^c (A^k/k!)

where A = λ/μ is the offered load (in Erlangs).

Application: Telephone networks (blocked calls are dropped).

Erlang C (M/M/c — Delay System)

Customers arriving when all servers are busy wait in queue.

The probability of waiting is the Erlang C formula (given above in M/M/c).

Application: Call centers (callers wait on hold).

Jackson Networks

A network of M/M/cᵢ queues where:

External Poisson arrivals at each node
Markovian routing (probabilistic routing between nodes)
Each node behaves as an independent M/M/c queue (with appropriate arrival rate)

Jackson's theorem: In steady state, each queue behaves as an independent M/M/c queue. The joint distribution is the product of individual queue distributions.

Burke's theorem: The departure process from an M/M/c queue in steady state is Poisson.

Open Jackson Networks

External arrivals and departures. Solve for arrival rates using traffic equations:

λᵢ = γᵢ + Σⱼ λⱼ pⱼᵢ

where γᵢ is external arrival rate and pⱼᵢ is routing probability from j to i.

Web servers: How many servers needed to keep p99 latency under 200ms at peak load?
Database connection pools: Pool size vs. latency tradeoff.
Load balancers: M/M/c model with pooling shows benefits of shared queues.

System Design Guidelines

From queueing theory:

Keep utilization below 70-80%: Latency increases superlinearly with utilization.
Reduce variance: Consistent service times (M/D/1) are much better than variable (M/M/1).
Pool resources: One large queue is better than many small ones.
Add slack: Small utilization headroom prevents catastrophic latency spikes.
Bound queue size: Finite queues (M/M/1/K) prevent memory exhaustion (at the cost of dropped requests).

Specific Applications

Thread pools: Fixed pool of c threads serving incoming requests → M/M/c.
Network routers: Packet buffers as queues. Buffer sizing uses queueing theory.
Disk I/O scheduling: Request queue for disk access. M/G/1 with various service distributions.
CPU scheduling: Ready queue modeled as M/G/1 or G/G/1.
Cloud auto-scaling: Scale servers to maintain target utilization/latency using queueing models.
Microservices: Each service is a queue. Jackson networks model service chains.
Message queues: Kafka, RabbitMQ — producer-consumer with buffering.
SLA/SLO setting: Use queueing theory to set achievable latency targets: "p99 < X ms at Y requests/sec."

Queueing Theory

Basic Concepts

Performance Metrics

Kendall Notation

Little's Law

M/M/1 Queue

Steady-State Results

Key Observations

Response Time Distribution

M/M/c Queue

Erlang C Formula

Performance Metrics

M/G/1 Queue

Pollaczek-Khinchine Formula

Birth-Death Processes

Erlang Formulas

Erlang B (M/M/c/c — Loss System)

Erlang C (M/M/c — Delay System)

Jackson Networks

Open Jackson Networks

Closed Jackson Networks

Applications in CS

Capacity Planning

System Design Guidelines

Specific Applications