Transport Layer
The transport layer provides end-to-end communication between applications on different hosts. Its two main protocols — TCP and UDP — offer fundamentally different service models.
UDP (User Datagram Protocol)
Connectionless, unreliable, minimal protocol. Just adds port numbers to IP.
UDP Header (8 bytes)
[Source Port (16)] [Dest Port (16)] [Length (16)] [Checksum (16)]
Properties
- No connection setup: Send immediately. No handshake latency.
- No reliability: No ACKs, no retransmission. Packets may be lost, duplicated, or reordered.
- No flow/congestion control: Sender can transmit at any rate.
- Message-oriented: Each send() produces exactly one datagram.
- Low overhead: 8-byte header (vs TCP's 20+ bytes).
Applications
- DNS: Small queries/responses. Retransmit on timeout.
- Video/audio streaming: Real-time. Late packets are worse than lost packets.
- Gaming: Low latency critical. Application handles reliability if needed.
- DHCP: Client doesn't have an IP yet → can't do TCP.
- SNMP: Simple queries.
- NTP: Time synchronization.
TCP (Transmission Control Protocol)
Connection-oriented, reliable, ordered byte stream.
TCP Header (20-60 bytes)
[Src Port (16)][Dst Port (16)][Sequence Number (32)][Ack Number (32)]
[Offset(4)][Reserved(4)][Flags(8)][Window(16)][Checksum(16)][Urgent(16)]
[Options (0-40 bytes)]
Flags: SYN, ACK, FIN, RST, PSH, URG, ECE, CWR.
Three-Way Handshake (Connection Setup)

Client Server
│── SYN (seq=x) ──────→│
│ │
│←── SYN-ACK ───────────│ (seq=y, ack=x+1)
│ │
│── ACK (ack=y+1) ─────→│
│ │
│ Connection established │
Why three-way? Prevents old duplicate SYNs from establishing phantom connections. Both sides confirm they can receive.
Four-Way Teardown (Connection Close)
Client Server
│── FIN ─────────────→│
│←── ACK ─────────────│
│ │ (server may send more data)
│←── FIN ─────────────│
│── ACK ─────────────→│
│ │
│ TIME_WAIT (2×MSL) │ (client waits to ensure final ACK received)
TIME_WAIT: Client stays in TIME_WAIT for 2×MSL (Maximum Segment Lifetime, typically 60 seconds). Ensures the final ACK is received and old segments from this connection expire.
Sequence Numbers and Acknowledgments
Byte-oriented: Sequence numbers count bytes, not segments.
Seq=100, data="Hello" (5 bytes) → receiver ACKs with Ack=105
Seq=105, data="World" (5 bytes) → receiver ACKs with Ack=110
Cumulative ACK: Ack=N means "I've received all bytes up to N-1. Send byte N next."
Selective ACK (SACK): TCP option. Acknowledges non-contiguous blocks (e.g., "I have bytes 100-199 and 300-399, missing 200-299"). Enables selective retransmission.
Flow Control: Sliding Window
Receiver window (rwnd): Advertised by the receiver in each ACK. Tells the sender how much buffer space is available.
Sender can send: min(cwnd, rwnd) - bytes_in_flight
Window = 0: Receiver is full. Sender stops and sends periodic window probes until window opens.
Window scaling (RFC 1323): TCP option. Allows window > 65535 bytes (16-bit field × 2^scale). Essential for high-bandwidth, high-latency links (BDP > 64KB).
Congestion Control
Prevent the sender from overwhelming the network (vs flow control which prevents overwhelming the receiver).
Congestion window (cwnd): Sender-side limit on bytes in flight.
Effective window = min(cwnd, rwnd)
Slow Start
Start with cwnd = 1 MSS (Maximum Segment Size, typically 1460 bytes). Double cwnd each RTT (exponential growth) until reaching ssthresh.
RTT 1: cwnd = 1 MSS
RTT 2: cwnd = 2 MSS
RTT 3: cwnd = 4 MSS
RTT 4: cwnd = 8 MSS
...until ssthresh
Congestion Avoidance
After cwnd ≥ ssthresh: increase cwnd by 1 MSS per RTT (linear growth — additive increase).
Multiplicative Decrease
On packet loss (detected by timeout or triple duplicate ACKs):
- Timeout: cwnd = 1 MSS, ssthresh = cwnd/2. Restart slow start. (TCP Tahoe behavior)
- Triple duplicate ACK: cwnd = cwnd/2, ssthresh = cwnd/2. Continue with congestion avoidance. (TCP Reno fast recovery)
AIMD (Additive Increase, Multiplicative Decrease): The fundamental TCP congestion control strategy. Proven to converge to fair sharing.
Fast Retransmit
If 3 duplicate ACKs received (for the same byte) → retransmit the missing segment immediately without waiting for timeout.
Fast Recovery (TCP Reno)
After fast retransmit: cwnd = cwnd/2 (not reset to 1). Skip slow start — go directly to congestion avoidance. Faster recovery than Tahoe.
Modern Congestion Control
TCP CUBIC (Linux default since 2.6.19): Cubic function of time since last loss event. More aggressive window growth after recovery. Better for high-BDP networks.
W(t) = C(t - K)³ + W_max
where K = ∛(W_max × β / C), β = multiplicative decrease factor (0.7).
TCP BBR (Google, 2016): Model-based congestion control. Estimates bottleneck bandwidth and RTT, then sends at the estimated rate. Doesn't rely on loss as a congestion signal.
BBR advantages: Better throughput on lossy links (wireless). Lower latency (keeps queues short). Better utilization of high-BDP paths.
BBR concerns: Fairness issues with competing CUBIC flows. Potential bufferbloat in some scenarios.
QUIC Protocol
Quick UDP Internet Connections (Google, standardized as RFC 9000). HTTP/3 is built on QUIC.
Key Features
- Built on UDP: Bypasses TCP's ossification. Implemented in user space.
- 0-RTT connection: Cached credentials from previous connection → send data immediately (1-RTT for first connection, 0-RTT for subsequent).
- Multiplexed streams: Multiple independent streams in one connection. No head-of-line blocking (unlike HTTP/2 over TCP where one lost packet blocks ALL streams).
- Built-in TLS 1.3: Encryption is mandatory and integrated (not layered on top).
- Connection migration: Connection survives IP address changes (identified by connection ID, not 4-tuple). Mobile-friendly.
- Improved loss recovery: Per-stream acknowledgments. Better RTT estimation (no retransmission ambiguity).
QUIC vs TCP+TLS
| Aspect | TCP + TLS 1.3 | QUIC | |---|---|---| | Handshake | 1-RTT (TCP) + 1-RTT (TLS) = 2-RTT | 1-RTT (combined), 0-RTT resume | | Head-of-line blocking | Yes (TCP is single stream) | No (independent streams) | | Connection migration | No (tied to IP:port) | Yes (connection ID) | | Implementation | Kernel (hard to update) | User space (easy to update) | | Encryption | Optional (TLS on top) | Mandatory (integrated) |
SCTP (Stream Control Transmission Protocol)
Message-oriented reliable transport. Supports multi-homing (multiple IP addresses per endpoint) and multi-streaming.
Used in telecommunications (SS7 signaling over IP), WebRTC data channels.
Reliable Data Transfer Principles
Stop-and-Wait ARQ
Send one segment, wait for ACK. Retransmit on timeout.
Utilization = 1/(1 + 2a) where a = propagation/transmission delay. Terrible for high-latency links.
Go-Back-N
Window of N unacknowledged segments. On error, retransmit from the errored segment onward.
Selective Repeat
Window of N. Only retransmit lost segments. Receiver buffers out-of-order segments.
TCP uses a hybrid: cumulative ACKs (like GBN) + SACK (like SR) + fast retransmit.
Port Numbers
| Range | Type | Examples | |---|---|---| | 0-1023 | Well-known (privileged) | HTTP(80), HTTPS(443), SSH(22), DNS(53) | | 1024-49151 | Registered | PostgreSQL(5432), MySQL(3306), Redis(6379) | | 49152-65535 | Dynamic/Ephemeral | Client-side source ports |
Socket: (IP address, port) pair. A connection is identified by a 4-tuple: (src_IP, src_port, dst_IP, dst_port).
Applications in CS
- Web performance: Understanding TCP congestion control, handshake latency, and QUIC explains why pages load fast or slow.
- API design: Choose TCP (reliability needed) vs UDP (low latency) vs QUIC (best of both).
- Database connections: Connection pooling, TCP keepalives, connection timeouts.
- Streaming: UDP/RTP for real-time, TCP for buffered playback, QUIC for adaptive.
- Load balancing: L4 load balancers operate at TCP/UDP level. Connection draining, health checks.
- Debugging: Wireshark, tcpdump — understanding TCP segments and flags is essential.