Prerequisites
Before reading this, you may want to check out:
Case Study: Real-Time Chat System
A real-time chat system enables users to exchange messages instantly, supporting one-on-one conversations, group chats, and potentially channels or rooms. Services like WhatsApp, Slack, and Discord each represent different points in the design space, balancing features like message persistence, media sharing, and end-to-end encryption against scalability and latency requirements.
What makes a chat system particularly interesting from a system design perspective is the requirement for bidirectional, low-latency communication. Unlike request-response APIs, chat demands persistent connections between clients and servers, typically via WebSockets. This fundamentally changes how the system handles connection management, load balancing, and server state, since each server must maintain awareness of which users are connected to it.
The challenges compound with scale. Message ordering must be preserved within conversations even when messages arrive at different servers. Presence detection (showing who is online) requires efficient fan-out to potentially thousands of contacts. Push notifications must reach offline users across multiple platforms. And the entire system must remain responsive even when millions of users are simultaneously connected.
Key Challenges
- WebSocket connection management: Maintaining millions of persistent connections across a server fleet, handling graceful reconnection, and load balancing long-lived connections rather than short-lived HTTP requests.
- Presence detection: Tracking and broadcasting online/offline status efficiently, avoiding thundering-herd problems when popular users change state, and tolerating brief disconnections without flapping.
- Message ordering and delivery guarantees: Ensuring messages appear in a consistent order within each conversation, handling exactly-once delivery semantics, and synchronizing message history across multiple devices.
- Push notifications: Delivering messages to offline users via platform-specific notification services (APNs, FCM), managing device tokens, and respecting user preferences and rate limits.
- Group chat fan-out: Efficiently distributing a single message to hundreds or thousands of group members, some online and some offline, without creating write amplification bottlenecks.
Prerequisites
- 01-fundamentals -- Networking basics, protocols, and the client-server model that underpin persistent connection architecture.
- 02-scalability -- Horizontal scaling patterns and stateful service considerations for handling millions of concurrent connections.
- 07-messaging-systems -- Message queues, pub/sub patterns, and event-driven architecture used for message routing and delivery.