Networking Fundamentals
Every system design interview and every production architecture depends on the network. Understanding protocols, latency, and how data moves between machines is foundational. This file covers what every system designer needs to know, without going deeper than necessary.
TCP & UDP
TCP (Transmission Control Protocol)
TCP provides reliable, ordered delivery of bytes between two endpoints.
- Three-way handshake to establish a connection (SYN, SYN-ACK, ACK)
- Retransmits lost packets automatically
- Flow control and congestion control prevent overwhelming the receiver or network
- Higher latency due to handshake and retransmission overhead
Use for: HTTP/HTTPS, database connections, file transfers, email — anything where losing data is unacceptable.
UDP (User Datagram Protocol)
UDP sends datagrams with no connection setup, no ordering guarantees, and no retransmission.
- No handshake — lower latency for first message
- No head-of-line blocking — lost packets don't stall subsequent ones
- Application must handle reliability if needed
Use for: DNS lookups, video streaming, voice/video calls, online games — where speed matters more than perfect delivery.
TCP vs UDP at a Glance
Property TCP UDP
Connection Yes (handshake) No
Ordering Guaranteed Not guaranteed
Reliability Retransmission None (app handles)
Latency Higher Lower
Use cases HTTP, DB, email DNS, streaming, gaming
HTTP & HTTPS
HTTP is the application-layer protocol that powers the web. HTTPS is HTTP over TLS, adding encryption and authentication.
HTTP/1.1
- One request per TCP connection (or pipelined, but poorly supported)
- Text-based headers
- Keep-alive connections reduce handshake overhead for multiple requests
HTTP/2
- Multiplexing: many requests and responses share a single TCP connection
- Binary framing: more efficient parsing
- Header compression (HPACK) reduces overhead for repeated headers
- Server push: server can send resources before the client asks
HTTP/3
- Runs over QUIC (which runs over UDP) instead of TCP
- Eliminates TCP head-of-line blocking: a lost packet in one stream doesn't stall others
- Faster connection establishment (0-RTT in some cases)
- Built-in encryption (TLS 1.3 is mandatory)
HTTPS & TLS
TLS (Transport Layer Security) encrypts traffic between client and server.
- TLS handshake adds 1-2 round trips (TLS 1.2) or 1 round trip (TLS 1.3)
- Certificates verify server identity, preventing man-in-the-middle attacks
- Always use HTTPS in production — the performance cost is negligible with modern hardware
HTTP Methods & Status Codes
GET Read a resource 200 OK
POST Create a resource 201 Created
PUT Replace a resource 200 OK / 204 No Content
PATCH Partially update a resource 200 OK
DELETE Remove a resource 204 No Content
4xx Client error (400 Bad Request, 401 Unauthorized, 404 Not Found, 429 Too Many Requests)
5xx Server error (500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable)
DNS (Domain Name System)
DNS translates domain names to IP addresses. It is the phone book of the internet.
Resolution Process
1. Browser checks local cache
2. OS checks its cache
3. Query goes to recursive resolver (usually ISP or 8.8.8.8)
4. Resolver queries root name servers -> TLD servers -> authoritative name server
5. Authoritative server returns the IP address
6. Result cached at every level with a TTL
Record Types
- A: Maps domain to IPv4 address
- AAAA: Maps domain to IPv6 address
- CNAME: Alias from one domain to another
- MX: Mail server for the domain
- TXT: Arbitrary text (used for SPF, DKIM, domain verification)
- NS: Delegates a zone to a name server
DNS in System Design
- Low TTL (30-60 seconds) for services that need fast failover
- High TTL (hours/days) for stable services to reduce DNS query load
- GeoDNS returns different IPs based on client location for global load balancing
- DNS is a single point of failure if your provider goes down — use multiple DNS providers
WebSockets
WebSockets provide full-duplex, persistent communication over a single TCP connection.
How They Work
- Client sends an HTTP Upgrade request
- Server agrees and the connection switches to the WebSocket protocol
- Both sides can send messages at any time without new HTTP requests
When to Use
- Real-time features: chat, live notifications, collaborative editing
- Streaming data: stock tickers, sports scores, live dashboards
- Any scenario where the server needs to push data to the client frequently
When Not to Use
- Simple request-response APIs — regular HTTP is simpler and better supported
- Infrequent updates — Server-Sent Events (SSE) are simpler for one-way server-to-client
Scaling WebSockets
WebSocket connections are stateful and long-lived. This complicates load balancing:
- Use sticky sessions or a connection registry so messages reach the right server
- Consider a pub/sub backbone (Redis Pub/Sub, Kafka) so any server can broadcast to any connected client
- Monitor connection counts — each connection consumes memory and a file descriptor
Latency
Latency is the time it takes for a request to travel from sender to receiver and back.
Latency Numbers Every Designer Should Know
L1 cache reference 1 ns
L2 cache reference 4 ns
Main memory reference 100 ns
SSD random read 15,000 ns (15 us)
HDD random read 5,000,000 ns (5 ms)
Round trip within same datacenter 500,000 ns (0.5 ms)
Round trip US coast to coast 40,000,000 ns (40 ms)
Round trip US to Europe 80,000,000 ns (80 ms)
Round trip US to Asia 150,000,000 ns (150 ms)
Reducing Latency
- CDN: Serve content from edge nodes close to users
- Caching: Avoid round trips to the database
- Connection pooling: Reuse TCP connections instead of establishing new ones
- Async processing: Don't make the user wait for work that can happen in the background
- Regional deployment: Run services in multiple regions
Bandwidth
Bandwidth is the maximum rate of data transfer over a network link.
Estimation
When designing systems, estimate bandwidth needs:
Example: Video streaming service
- 1 million concurrent viewers
- 5 Mbps per stream (1080p)
- Total bandwidth: 5 Tbps outbound
Example: Chat application
- 10 million active users
- 50 messages/day/user, average 200 bytes
- Daily bandwidth: ~100 GB (very modest)
Bandwidth vs Latency
High bandwidth doesn't mean low latency. A satellite link can have high bandwidth (Gbps) but high latency (600ms round trip). For interactive applications, latency matters more than throughput.
Content Delivery Networks (CDNs)
CDNs are geographically distributed caches that serve content from the edge node closest to the user.
How CDNs Work
1. User requests image.jpg from cdn.example.com
2. DNS resolves to nearest CDN edge node
3. Edge node checks cache:
- Cache hit: return immediately (~5ms)
- Cache miss: fetch from origin server, cache it, return (~100ms+)
4. Subsequent requests from nearby users are cache hits
What to Put on a CDN
- Static assets (images, CSS, JS, fonts, videos)
- API responses with appropriate Cache-Control headers (short TTL)
- HTML pages for static or semi-static sites
Cache-Control Headers
Cache-Control: public, max-age=31536000 Static assets (1 year, versioned filenames)
Cache-Control: public, max-age=60 Semi-dynamic API responses (1 minute)
Cache-Control: no-store Sensitive or highly dynamic data
Real-World: Cloudflare
Cloudflare operates over 300 PoPs (Points of Presence) worldwide. Beyond caching, it provides DDoS protection, WAF, and Workers (edge compute). Many companies use Cloudflare as both CDN and security layer.
Real-World: Netflix Open Connect
Netflix built its own CDN called Open Connect. Dedicated appliances are placed inside ISP networks, so streaming traffic never leaves the ISP's network. This reduces latency and transit costs.
Common Pitfalls
- Ignoring DNS TTL. Setting a TTL too high makes failover slow. Setting it too low increases DNS query volume. Match TTL to your failover requirements.
- Not using HTTP/2 or HTTP/3. Multiplexing eliminates the need for domain sharding and sprite sheets. Modern clients and servers support it — enable it.
- WebSockets for everything. WebSockets add statefulness and complexity. Use them only when the server needs to push data frequently. For occasional updates, SSE or polling is simpler.
- Underestimating latency. A system with five sequential network calls at 40ms each adds 200ms of unavoidable latency. Design for parallelism and minimize serial round trips.
- No CDN for static assets. Serving images from your origin server wastes bandwidth and increases latency. A CDN is one of the easiest performance wins.
- Ignoring bandwidth costs. Egress bandwidth from cloud providers is expensive. CDNs and compression (gzip, Brotli) reduce both latency and cost.
Key Takeaways
- TCP provides reliability; UDP provides speed. Most web systems use TCP (via HTTP). Real-time systems may need UDP (via WebRTC, QUIC).
- HTTP/2 and HTTP/3 are significant improvements over HTTP/1.1. Use them.
- DNS is the first hop in every request. Understand TTLs, GeoDNS, and the resolution chain.
- WebSockets enable real-time bidirectional communication but add complexity. Use them only when needed.
- Latency is the enemy of user experience. CDNs, caching, connection reuse, and regional deployment are your tools.
- Always estimate bandwidth when designing data-heavy systems like streaming or analytics pipelines.