9 min read
On this page

Design a Real-Time Chat System

A real-time chat system delivers messages between users with minimal latency, handles presence (online/offline status), supports group conversations, & ensures messages are never lost. WhatsApp, Slack, & Telegram each handle billions of messages daily with delivery times under 200ms.

This document covers the full design from requirements through scaling strategies.

Real-time chat system architecture

Functional Requirements

  • One-on-one messaging between two users
  • Group chats with up to 500 members
  • Message delivery with at-least-once guarantee
  • Real-time delivery via persistent connections
  • Presence indicators (online, offline, last seen)
  • Push notifications for offline users
  • Message history & search
  • Read receipts & typing indicators

Non-Functional Requirements

  • Low latency — messages delivered within 200ms for online users
  • High availability — messaging must work even during partial failures
  • Message ordering — messages within a conversation appear in consistent order
  • Durability — no message loss, even during server failures
  • Scale to 500 million active users, 50 billion messages per day

Estimation

Traffic

  • 500 million daily active users (DAU)
  • 50 billion messages/day = ~580,000 messages/second average
  • Peak: ~1.5 million messages/second
  • Average message size: 200 bytes (text), 50 KB (images/media via CDN reference)

Storage

  • 50 billion messages x 200 bytes = 10 TB/day for text messages
  • Metadata (receipts, timestamps, sender info): ~100 bytes per message = 5 TB/day
  • Over 1 year: ~5.5 PB of message data (before compression)
  • With compression (messages compress well): ~2 PB/year

Connections

  • 500 million DAU with ~200 million concurrent at peak
  • Each user maintains one WebSocket connection
  • 200 million persistent connections distributed across thousands of servers

High-Level Design

Components

  • Client Apps — mobile & web clients maintaining WebSocket connections
  • WebSocket Gateway — stateful servers managing persistent connections
  • Chat Service — routes messages to the correct recipient(s)
  • Message Store — durable storage for all messages
  • Presence Service — tracks which users are online
  • Group Service — manages group membership & fan-out
  • Push Notification Service — delivers notifications to offline users
  • Media Service — handles image/video/file uploads via object storage & CDN

Message Flow: One-on-One

User A (sender)
  -> WebSocket Gateway (A's server)
  -> Chat Service
  -> Message Store (persist)
  -> Chat Service looks up User B's connection
  -> WebSocket Gateway (B's server)
  -> User B (recipient)

If User B is offline:
  -> Push Notification Service -> APNs / FCM -> User B's device

Message Flow: Group Chat

User A sends message to Group G
  -> WebSocket Gateway -> Chat Service -> Message Store (persist once)
  -> Group Service (fetch member list for Group G)
  -> Fan-out: for each online member, route through their WebSocket Gateway
  -> For offline members, queue push notifications

Detailed Design

WebSocket Connections

WebSockets provide full-duplex communication over a single TCP connection, eliminating the overhead of HTTP polling.

Connection Lifecycle

1. Client opens HTTPS connection to a gateway endpoint
2. HTTP Upgrade handshake -> WebSocket established
3. Client authenticates with a token over the WebSocket
4. Server registers the connection in a connection registry
5. Bidirectional messaging begins
6. On disconnect, server removes the entry & updates presence

Connection Registry

Every WebSocket Gateway instance tracks its local connections. A distributed registry (Redis cluster) maps user IDs to gateway instances.

Connection Registry (Redis)
user_id -> { gateway_host: "ws-gateway-042", connected_at: "...", device: "ios" }

When a message arrives for User B, the Chat Service queries the registry to find which gateway holds B's connection. If B has multiple devices, each device has its own entry.

Handling Disconnects & Reconnects

  • Clients send periodic heartbeats (every 30 seconds)
  • If 3 heartbeats are missed, the server closes the connection & cleans up the registry
  • On reconnect, the client provides a last_received_message_id to fetch missed messages
  • The server replays any messages the client has not acknowledged

Message Storage

Messages need durable storage with fast writes & efficient range queries (fetching conversation history).

Schema Design

messages
+-----------------+------------------------------------------------+
| message_id      | snowflake ID (time-sortable)    (PK)           |
| conversation_id | hash of sorted user IDs or group ID            |
| sender_id       | bigint                                         |
| content         | text (encrypted at rest)                       |
| content_type    | enum: text, image, video, file                 |
| created_at      | timestamp (derived from message_id)             |
| status          | enum: sent, delivered, read                    |
+-----------------+------------------------------------------------+

conversations
+-----------------+------------------------------------------------+
| conversation_id | PK                                             |
| type            | enum: one_on_one, group                        |
| members         | list of user IDs                               |
| last_message_at | timestamp (for sorting inbox)                  |
+-----------------+------------------------------------------------+

Database Choice

  • Cassandra or ScyllaDB — excellent for this workload. Partition by conversation_id, cluster by message_id (time-sorted). Reads for a conversation are sequential within a single partition.
  • Write-optimized LSM-tree storage handles 580K writes/sec across a large cluster
  • Time-based compaction works well since old messages are rarely updated

Message ID Generation

Use a Snowflake-like ID that embeds a timestamp. This gives:

  • Global uniqueness without coordination
  • Time-based ordering (messages sort correctly by ID)
  • Efficient range queries (fetch messages after ID X)
Snowflake ID (64 bits):
| 41 bits: timestamp (ms since epoch) | 10 bits: machine ID | 13 bits: sequence |

Delivery Guarantees

The system must guarantee at-least-once delivery. Exactly-once is achieved at the application level through deduplication.

Write Path

1. Client sends message with a client-generated UUID (idempotency key)
2. Chat Service writes to Message Store
3. Chat Service sends ACK back to sender ("message stored")
4. Chat Service forwards message to recipient's gateway
5. Recipient's client sends ACK back ("message received")
6. If no ACK within 5 seconds, retry delivery
7. If recipient is offline, queue for push notification & later sync

Deduplication

The client-generated UUID prevents duplicate messages if the sender retries. The Message Store checks the UUID before inserting. Recipients also deduplicate by message_id in case the same message is delivered twice.

Message Ordering

Ordering is straightforward for one-on-one chats — Snowflake IDs are monotonically increasing per server, and a single conversation typically flows through a consistent server.

For group chats, messages from different senders may arrive at different servers. The message_id (Snowflake) provides a globally consistent order since its timestamp component has millisecond precision. Ties (same millisecond) are broken by machine ID & sequence number.

Clients display messages ordered by message_id. If a message arrives out of order (a later ID appears before an earlier one), the client inserts it in the correct position.

Presence & Online Status

Presence must balance accuracy with scalability. Sending a status update for every connect/disconnect across 200 million users would generate enormous traffic.

Approach: Heartbeat-Based Presence

1. Client sends heartbeat every 30 seconds over WebSocket
2. Presence Service updates a Redis entry with TTL of 90 seconds
3. If TTL expires without renewal, user is marked offline

Presence Store (Redis):
user_id -> { status: "online", last_seen: "2026-03-15T10:30:00Z" }
TTL: 90 seconds

Fan-Out Problem

If User A has 500 contacts, fetching all 500 presence states on app open is expensive. Mitigations:

  • Lazy loading — fetch presence only for visible contacts on screen
  • Subscribe to presence changes only for the current conversation's members
  • For group chats, show presence only for the most recent active members
  • Batch presence queries (fetch 50 at a time)

Group Chats

Small Groups (under 100 members)

Write the message once to the Message Store. Fan-out on read is acceptable — each member fetches from the same conversation_id partition.

Fan-out on delivery (pushing to each online member's WebSocket) is handled in-process. The Chat Service iterates over the member list, looks up each user's gateway, & sends the message.

Large Groups (100-500 members)

Fan-out on delivery becomes expensive. Optimizations:

  • Use a message queue (Kafka partition per group) so the Chat Service publishes once & consumers handle delivery
  • Batch gateway lookups
  • Rate-limit push notifications for high-traffic groups (digest mode)

Membership Changes

When a user is added or removed, the Group Service updates the member list & publishes an event. All current members receive a system message. The new member receives the last N messages for context.

Push Notifications

For offline users, the system must deliver a push notification.

1. Chat Service determines recipient is offline (not in connection registry)
2. Chat Service publishes to a notification queue
3. Push Notification Service consumes the event
4. Looks up the user's device tokens (APNs for iOS, FCM for Android)
5. Sends the push notification via the platform's push service
6. Respects user notification preferences (muted chats, DND hours)

Key considerations:

  • Collapse multiple messages from the same conversation into one notification
  • Do not send notification content for encrypted chats — use a generic "New message" text
  • Handle token expiry & rotation (devices re-register tokens periodically)
  • Rate limit notifications per user to avoid spamming

End-to-End Encryption (Brief Overview)

WhatsApp-style E2E encryption uses the Signal Protocol:

  • Each device has a public/private key pair
  • Messages are encrypted on the sender's device with the recipient's public key
  • The server stores only ciphertext & cannot read message content
  • Group messages use a shared group key that rotates when membership changes

The system design accommodates encryption by treating message content as an opaque blob. Search & indexing happen on-device, not server-side.

Trade-Offs & Alternatives

Push vs Pull for Message Delivery

Approach Latency Server Load Complexity
WebSocket (push) Lowest (~100ms) High (persistent connections) High
Long polling Low (~500ms) Medium Medium
Short polling High (seconds) Highest (wasted requests) Low

WebSockets are the clear winner for a real-time chat system. Long polling is a reasonable fallback for environments where WebSockets are blocked.

Fan-Out on Write vs Fan-Out on Read

  • Fan-out on write: when a message is sent, copy it to each recipient's inbox. Fast reads but expensive writes for large groups.
  • Fan-out on read: store the message once, recipients query the conversation. Cheap writes but read amplification for conversations with many messages.

For chat, fan-out on read (single write, partition by conversation) is preferred because conversations are typically read sequentially & Cassandra handles this well.

Message Storage: Cassandra vs Other Options

  • Cassandra/ScyllaDB — best fit for write-heavy, partition-scannable workloads. Production-proven at WhatsApp scale.
  • HBase — similar properties but more operational overhead (HDFS dependency).
  • PostgreSQL with Citus — works at moderate scale, better for complex queries, but harder to scale for billions of messages/day.

Bottlenecks & Scaling

WebSocket Gateway Scaling

Each server can hold ~50,000-100,000 concurrent connections (limited by file descriptors & memory). For 200 million concurrent users: 2,000-4,000 gateway servers.

  • Use consistent hashing to assign users to gateways
  • When a gateway fails, clients reconnect to another (automatic via load balancer)
  • Stateless routing: the connection registry decouples message routing from specific gateway instances

Message Store Scaling

  • Partition by conversation_id — keeps all messages for a conversation on the same node
  • Time-based tiering: recent messages on SSD, older messages on HDD or object storage
  • At 10 TB/day, plan for a large Cassandra cluster (hundreds of nodes) with replication factor 3

Presence Service Scaling

  • Redis cluster with sharding by user_id
  • At 200 million online users, each entry ~100 bytes = 20 GB in Redis (fits in memory)
  • Heartbeat updates at 30-second intervals: ~6.7 million writes/second to Redis (sharded across nodes)

Hot Groups

A celebrity's group chat or a company-wide channel can have thousands of members receiving messages simultaneously. Mitigations:

  • Dedicated Kafka partitions for high-traffic groups
  • Cache the member list aggressively (it changes rarely)
  • Stagger push notification delivery to avoid thundering herd

Common Pitfalls

  • Using HTTP polling instead of WebSockets — polling adds latency, wastes bandwidth, & increases server load by orders of magnitude. Always use persistent connections for real-time chat.
  • Storing messages in a relational database without partitioning — a single PostgreSQL instance cannot handle 580K writes/sec. Use a distributed database or shard aggressively.
  • Broadcasting presence updates to all contacts — this creates an O(users x contacts) fan-out storm. Use lazy presence loading & subscribe only to active conversations.
  • Not handling message ordering in group chats — without globally sortable IDs (Snowflake), messages from different senders can appear in different orders for different recipients.
  • Ignoring reconnection & message sync — mobile networks are unreliable. The client must track last_received_message_id & sync missed messages on reconnect.
  • Sending push notification content for encrypted chats — this leaks message content through Apple/Google push infrastructure. Send a generic notification & let the app decrypt locally.

Key Takeaways

  • WebSockets are essential for real-time delivery. Maintain a distributed connection registry so any service can route messages to any connected user.
  • Use Snowflake IDs for messages — they provide uniqueness, ordering, & efficient range queries in a single identifier.
  • Guarantee at-least-once delivery with ACKs & retries. Achieve effective exactly-once through client-generated UUIDs & deduplication.
  • Presence is a fan-out problem. Solve it with heartbeat-based TTLs in Redis & lazy loading on the client side.
  • Group chat fan-out scales differently than one-on-one. Use message queues for large groups & rate-limit push notifications.
  • Partition message storage by conversation, not by user. This keeps conversation history reads fast & sequential.