6 min read
On this page

Transport Layer

The transport layer provides end-to-end communication between applications on different hosts. Its two main protocols — TCP and UDP — offer fundamentally different service models.

UDP (User Datagram Protocol)

Connectionless, unreliable, minimal protocol. Just adds port numbers to IP.

UDP Header (8 bytes)

[Source Port (16)] [Dest Port (16)] [Length (16)] [Checksum (16)]

Properties

  • No connection setup: Send immediately. No handshake latency.
  • No reliability: No ACKs, no retransmission. Packets may be lost, duplicated, or reordered.
  • No flow/congestion control: Sender can transmit at any rate.
  • Message-oriented: Each send() produces exactly one datagram.
  • Low overhead: 8-byte header (vs TCP's 20+ bytes).

Applications

  • DNS: Small queries/responses. Retransmit on timeout.
  • Video/audio streaming: Real-time. Late packets are worse than lost packets.
  • Gaming: Low latency critical. Application handles reliability if needed.
  • DHCP: Client doesn't have an IP yet → can't do TCP.
  • SNMP: Simple queries.
  • NTP: Time synchronization.

TCP (Transmission Control Protocol)

Connection-oriented, reliable, ordered byte stream.

TCP Header (20-60 bytes)

[Src Port (16)][Dst Port (16)][Sequence Number (32)][Ack Number (32)]
[Offset(4)][Reserved(4)][Flags(8)][Window(16)][Checksum(16)][Urgent(16)]
[Options (0-40 bytes)]

Flags: SYN, ACK, FIN, RST, PSH, URG, ECE, CWR.

Three-Way Handshake (Connection Setup)

TCP three-way handshake and four-way teardown

Client                Server
  │── SYN (seq=x) ──────→│
  │                        │
  │←── SYN-ACK ───────────│  (seq=y, ack=x+1)
  │                        │
  │── ACK (ack=y+1) ─────→│
  │                        │
  │  Connection established │

Why three-way? Prevents old duplicate SYNs from establishing phantom connections. Both sides confirm they can receive.

Four-Way Teardown (Connection Close)

Client                Server
  │── FIN ─────────────→│
  │←── ACK ─────────────│
  │                      │  (server may send more data)
  │←── FIN ─────────────│
  │── ACK ─────────────→│
  │                      │
  │ TIME_WAIT (2×MSL)    │  (client waits to ensure final ACK received)

TIME_WAIT: Client stays in TIME_WAIT for 2×MSL (Maximum Segment Lifetime, typically 60 seconds). Ensures the final ACK is received and old segments from this connection expire.

Sequence Numbers and Acknowledgments

Byte-oriented: Sequence numbers count bytes, not segments.

Seq=100, data="Hello" (5 bytes) → receiver ACKs with Ack=105
Seq=105, data="World" (5 bytes) → receiver ACKs with Ack=110

Cumulative ACK: Ack=N means "I've received all bytes up to N-1. Send byte N next."

Selective ACK (SACK): TCP option. Acknowledges non-contiguous blocks (e.g., "I have bytes 100-199 and 300-399, missing 200-299"). Enables selective retransmission.

Flow Control: Sliding Window

Receiver window (rwnd): Advertised by the receiver in each ACK. Tells the sender how much buffer space is available.

Sender can send: min(cwnd, rwnd) - bytes_in_flight

Window = 0: Receiver is full. Sender stops and sends periodic window probes until window opens.

Window scaling (RFC 1323): TCP option. Allows window > 65535 bytes (16-bit field × 2^scale). Essential for high-bandwidth, high-latency links (BDP > 64KB).

Congestion Control

Prevent the sender from overwhelming the network (vs flow control which prevents overwhelming the receiver).

Congestion window (cwnd): Sender-side limit on bytes in flight.

Effective window = min(cwnd, rwnd)

Slow Start

Start with cwnd = 1 MSS (Maximum Segment Size, typically 1460 bytes). Double cwnd each RTT (exponential growth) until reaching ssthresh.

RTT 1: cwnd = 1 MSS
RTT 2: cwnd = 2 MSS
RTT 3: cwnd = 4 MSS
RTT 4: cwnd = 8 MSS
...until ssthresh

Congestion Avoidance

After cwnd ≥ ssthresh: increase cwnd by 1 MSS per RTT (linear growth — additive increase).

Multiplicative Decrease

On packet loss (detected by timeout or triple duplicate ACKs):

  • Timeout: cwnd = 1 MSS, ssthresh = cwnd/2. Restart slow start. (TCP Tahoe behavior)
  • Triple duplicate ACK: cwnd = cwnd/2, ssthresh = cwnd/2. Continue with congestion avoidance. (TCP Reno fast recovery)

AIMD (Additive Increase, Multiplicative Decrease): The fundamental TCP congestion control strategy. Proven to converge to fair sharing.

Fast Retransmit

If 3 duplicate ACKs received (for the same byte) → retransmit the missing segment immediately without waiting for timeout.

Fast Recovery (TCP Reno)

After fast retransmit: cwnd = cwnd/2 (not reset to 1). Skip slow start — go directly to congestion avoidance. Faster recovery than Tahoe.

Modern Congestion Control

TCP CUBIC (Linux default since 2.6.19): Cubic function of time since last loss event. More aggressive window growth after recovery. Better for high-BDP networks.

W(t) = C(t - K)³ + W_max

where K = ∛(W_max × β / C), β = multiplicative decrease factor (0.7).

TCP BBR (Google, 2016): Model-based congestion control. Estimates bottleneck bandwidth and RTT, then sends at the estimated rate. Doesn't rely on loss as a congestion signal.

BBR advantages: Better throughput on lossy links (wireless). Lower latency (keeps queues short). Better utilization of high-BDP paths.

BBR concerns: Fairness issues with competing CUBIC flows. Potential bufferbloat in some scenarios.

QUIC Protocol

Quick UDP Internet Connections (Google, standardized as RFC 9000). HTTP/3 is built on QUIC.

Key Features

  • Built on UDP: Bypasses TCP's ossification. Implemented in user space.
  • 0-RTT connection: Cached credentials from previous connection → send data immediately (1-RTT for first connection, 0-RTT for subsequent).
  • Multiplexed streams: Multiple independent streams in one connection. No head-of-line blocking (unlike HTTP/2 over TCP where one lost packet blocks ALL streams).
  • Built-in TLS 1.3: Encryption is mandatory and integrated (not layered on top).
  • Connection migration: Connection survives IP address changes (identified by connection ID, not 4-tuple). Mobile-friendly.
  • Improved loss recovery: Per-stream acknowledgments. Better RTT estimation (no retransmission ambiguity).

QUIC vs TCP+TLS

| Aspect | TCP + TLS 1.3 | QUIC | |---|---|---| | Handshake | 1-RTT (TCP) + 1-RTT (TLS) = 2-RTT | 1-RTT (combined), 0-RTT resume | | Head-of-line blocking | Yes (TCP is single stream) | No (independent streams) | | Connection migration | No (tied to IP:port) | Yes (connection ID) | | Implementation | Kernel (hard to update) | User space (easy to update) | | Encryption | Optional (TLS on top) | Mandatory (integrated) |

SCTP (Stream Control Transmission Protocol)

Message-oriented reliable transport. Supports multi-homing (multiple IP addresses per endpoint) and multi-streaming.

Used in telecommunications (SS7 signaling over IP), WebRTC data channels.

Reliable Data Transfer Principles

Stop-and-Wait ARQ

Send one segment, wait for ACK. Retransmit on timeout.

Utilization = 1/(1 + 2a) where a = propagation/transmission delay. Terrible for high-latency links.

Go-Back-N

Window of N unacknowledged segments. On error, retransmit from the errored segment onward.

Selective Repeat

Window of N. Only retransmit lost segments. Receiver buffers out-of-order segments.

TCP uses a hybrid: cumulative ACKs (like GBN) + SACK (like SR) + fast retransmit.

Port Numbers

| Range | Type | Examples | |---|---|---| | 0-1023 | Well-known (privileged) | HTTP(80), HTTPS(443), SSH(22), DNS(53) | | 1024-49151 | Registered | PostgreSQL(5432), MySQL(3306), Redis(6379) | | 49152-65535 | Dynamic/Ephemeral | Client-side source ports |

Socket: (IP address, port) pair. A connection is identified by a 4-tuple: (src_IP, src_port, dst_IP, dst_port).

Applications in CS

  • Web performance: Understanding TCP congestion control, handshake latency, and QUIC explains why pages load fast or slow.
  • API design: Choose TCP (reliability needed) vs UDP (low latency) vs QUIC (best of both).
  • Database connections: Connection pooling, TCP keepalives, connection timeouts.
  • Streaming: UDP/RTP for real-time, TCP for buffered playback, QUIC for adaptive.
  • Load balancing: L4 load balancers operate at TCP/UDP level. Connection draining, health checks.
  • Debugging: Wireshark, tcpdump — understanding TCP segments and flags is essential.