WhatsApp System Design: Messaging, Ordering & Storage

01Real-Time: WebSocket vs Polling

02WebSocket Gateway Internals

03Cross-Server Routing

04Message Ordering Problem

05TIMEUUID Solution

06Per-Conversation vs Global Order

07Offline Sync Flow

08Client-Server State Sync

09Multi-Region Architecture

10Cross-Region Messaging

OVERVIEW

The Four Hardest Problems in Chat Design

Designing a chat system at WhatsApp's scale is one of the most common system design interview questions — and the one where candidates lose the most points. Requirements and estimation were covered in Design WhatsApp. Today we go deep into the four problems that interviewers specifically probe for: Real-Time Messaging Architecture, Message Ordering, Offline Sync, and Multi-Region Replication.

Concurrent Users

200M

~4,000 WebSocket gateway servers needed

Message Rate

1.2M/s

Cassandra writes at LOCAL_QUORUM

Connections/Server

50K

Each WebSocket gateway server

Cross-Region Latency

~300ms

India → US end-to-end message delivery

Deep Dive 1

Real-Time Messaging Architecture

TRANSPORT

WebSocket vs Long Polling: Why WebSocket Wins

Real-time messaging requires the server to push messages to the client the instant they arrive — the client cannot be polling. Three approaches exist: short polling (wasteful), long polling (acceptable), and WebSocket (optimal). WhatsApp, Slack, Discord, and every major chat system use WebSocket because it provides true bidirectional, full-duplex communication over a single persistent TCP connection.

Long Polling (HTTP request held open, reconnect after each response) vs WebSocket (persistent bidirectional connection)

Figure 1: Long Polling — HTTP request held open, reconnect after each response — vs WebSocket — persistent bidirectional TCP connection

Why Not SSE (Server-Sent Events)?

Server-Sent Events are server-to-client only. Chat requires client-to-server messaging too (sending messages). SSE would require a separate HTTP POST channel for sends, creating two connections per user. WebSocket handles both directions on one connection. Never suggest SSE for a chat system in an interview.

GATEWAY

WebSocket Gateway: Internal Architecture

The WebSocket Gateway is the most critical component in a chat system. Each gateway server manages approximately 50,000 concurrent WebSocket connections. With 200 million concurrent users, you need approximately 4,000 gateway servers. Each server runs four internal components working in concert.

WebSocket Gateway internals — Connection Manager (50K connections), Heartbeat Monitor, Session Registry, Message Router

Figure 2: WebSocket Gateway internals — Connection Manager, Heartbeat Monitor (25s ping), Session Registry updater, Message Router

Component	Responsibility	Key Detail
Connection Manager	Tracks all active WebSocket connections in memory	In-memory map: `user_id → socket_fd`. ~50K entries per server.
Heartbeat Monitor	Detects disconnected clients	Sends ping frame every 25 seconds. No pong = client dropped = clean up session.
Session Registry Updater	Writes user → server mapping to Redis	On connect: `SET session:{user_id} gateway-42 EX 300`. On disconnect: `DEL`.
Message Router	Delivers messages locally or forwards cross-server	If recipient on same server: direct delivery. Else: gRPC to correct gateway.

Interview Narration: Gateway Architecture

"Each WebSocket gateway manages 50K connections. When a connection arrives, the Connection Manager maps the user_id to the socket file descriptor in memory. Simultaneously, the Session Registry writes user_id → gateway_server_id to Redis with a 5-minute TTL (refreshed by heartbeats). The Heartbeat Monitor pings every 25 seconds — no pong means the connection is dead, we clean up Redis, and the client will reconnect. The Message Router checks if the recipient is local (direct delivery) or remote (gRPC to their gateway)."

ROUTING

Cross-Server Message Routing: The Key Challenge

The central challenge of distributed chat: User A is connected to Gateway Server 42. User B is connected to Gateway Server 789. How does a message from A reach B? The answer is the Redis session store: a mapping of user_id → gateway_server_id. The Chat Service looks up B's gateway, then routes the message directly to Gateway 789 via gRPC (~1ms) or through an internal Kafka topic (~5ms).

User A on Gateway 42 sends to User B on Gateway 789 — Redis session store enables cross-server routing via Chat Service

Figure 3: Cross-server routing — Redis session store maps user_id → gateway_server_id. Chat Service routes to the correct gateway via gRPC (~1ms)

The Key Insight Interviewers Look For

The Redis session store (user_id → gateway_server_id) is the most important component in a distributed chat system. Without it, you'd need to broadcast messages to all gateways — O(n) instead of O(1). Every strong-hire answer mentions this lookup by name. If an interviewer asks "how does the message get to the right server?", this is the answer.

Deep Dive 2

Message Ordering — The Problem and Its Solution

PROBLEM

Why Ordering Is Harder Than It Looks

Message ordering seems trivial but is one of the hardest problems in distributed messaging. Due to network jitter, packet retransmission, or load balancer routing, "How are you?" might arrive at the server before "Hello." If the system stores and delivers messages in arrival order, User B sees them out of order.

Figure 4: Network delay causes messages to arrive out of order. Server-assigned TIMEUUID restores correct ordering at storage time.

SOLUTION

Server-Assigned TIMEUUID: The Correct Fix

The server (not the client) assigns a monotonically increasing sequence number to each message as it arrives. In Cassandra, this is a TIMEUUID — a UUID that encodes the server's timestamp with microsecond precision plus a random component to break ties. The TIMEUUID is used as the clustering key in Cassandra, which stores messages sorted by this key within each partition.

Why Server-Assigned, Not Client-Assigned?

Client clocks are unreliable: User A's phone might be 5 minutes ahead. User B's phone might be 3 minutes behind. Client-assigned timestamps would produce ordering chaos. Server-assigned TIMEUUIDs use the server's NTP-synchronized clock, providing a consistent time reference within each conversation. This is a non-negotiable design choice.

Interview Narration: Message Ordering

"When a message arrives at the Chat Service, I assign it a TIMEUUID — this is a Cassandra-native UUID type that encodes the current server timestamp at microsecond precision, plus a random component to break ties within the same microsecond. I store this as the clustering key in Cassandra's messages table, partitioned by conversation_id. Cassandra physically sorts rows by clustering key, so all messages in a conversation are stored in chronological order automatically. I never trust client-side timestamps — client clocks drift and can be manipulated."

SCOPE

Per-Conversation Ordering vs Global Ordering

Per-conversation ordering (Cassandra partition key per conversation) vs global ordering (not needed for chat — bottleneck)

Figure 5: Per-conversation ordering via Cassandra partition key — all WhatsApp needs. Global ordering would be a bottleneck at 1.2M msgs/sec.

WhatsApp needs only per-conversation ordering — users compare message order within a single chat, never across different conversations. Global ordering would require a single coordination point across all 1.2 million messages/second — a fatal bottleneck. Cassandra's partition key (conversation_id) naturally provides per-conversation ordering with no coordination overhead.

Approach	Mechanism	Scalability	Use When
Per-conversation	Cassandra partition: `conversation_id`, clustering: TIMEUUID	Horizontal — each partition independent	Chat (WhatsApp, Slack, Discord) ✓
Global ordering	Single Kafka partition or Zookeeper counter	Single bottleneck — max ~500K/sec	Financial ledgers, audit logs

Deep Dive 3

Offline Sync — Retrieving Missed Messages on Reconnect

RECONNECT

The Offline Sync Flow: 6 Steps

While offline, messages continue arriving and are stored in Cassandra. When the user reconnects, the system must efficiently synchronize all missed messages without re-sending messages the user has already seen. The key mechanism is per-conversation cursors — the client tracks the last message_id it has seen in each conversation.

Offline sync timeline — user was online (last seq=42), goes offline (15 messages arrive), reconnects and syncs the delta

Figure 6: Offline sync timeline — user last saw sequence 42, 15 messages arrived while offline, reconnect fetches only the delta

User B opens the app. The client establishes a WebSocket connection to the nearest gateway server.
The client sends a sync request: {user_id, last_sync_timestamp, per_conversation_cursors}. Each cursor is the message_id of the last message B has seen in each conversation.
The server computes the delta: SELECT * FROM messages WHERE conversation_id = ? AND message_id > last_seen_cursor
The server first pushes unread counts per conversation (instant badge numbers), then streams actual messages in batches of 50, most recent first.
The client stores messages in its local SQLite database, updates cursors, and sends delivery acknowledgments.
If the user typed messages while offline (outbox), the client sends them now. The server deduplicates by client-assigned message_id.

STATE

Client-Server State Synchronization

Figure 7: Complete state synchronization model — client SQLite + cursors + outbox syncing with server Cassandra + Redis

State	Location	Details
Message history	Client: SQLite \| Server: Cassandra	Cassandra is source of truth. Client is a cache of recent messages.
Per-conversation cursors	Client: local storage	Last seen `message_id` per conversation. Drives delta query on reconnect.
Pending outbox	Client: SQLite queue	Messages typed while offline. Sent on reconnect. Server deduplicates by client UUID.
Unread counts	Server: Redis	`HINCRBY unread:{user_id} {conv_id} 1` on every message arrival. Pushed first on sync.
Session state	Server: Redis	`user_id → gateway_server_id`. Cleared on disconnect, set on connect.

Why Batches of 50 (Most Recent First)?

If a user was offline for 3 days with 2,000 missed messages across 40 conversations, dumping all messages at once would block the UI. Streaming in batches of 50, most-recent-first, means the user sees the newest messages immediately while older messages load in the background. The unread count badge is pushed before any messages, so the UI is responsive from the first render.

Deep Dive 4

Multi-Region Replication — Global Architecture

ARCHITECTURE

Three-Region Deployment

WhatsApp serves 2 billion users across every country. Multi-region deployment places infrastructure in 3+ regions. Messages within a region take ~100ms. Cross-region messages require ~200ms for the inter-region hop. Each region contains a complete, independent stack — no single-region dependency for normal operations.

Figure 8: Three-region architecture — India, US East, Europe. Full stack per region. Async Cassandra replication + Kafka MirrorMaker between regions.

Region 1

India (Primary)

WebSocket Gateway cluster, Chat Service, Cassandra nodes, Redis, Kafka. Serves India + South Asia.

Region 2

US East

Complete independent stack. Replicates from India async. Serves Americas. LOCAL_QUORUM consistency.

Region 3

Europe

GDPR-compliant stack. European user data stored locally. Async replication for delivery routing.

FLOW

Cross-Region Message Flow: 8 Steps, ~300ms

Figure 9: Cross-region message — User A (India) sends to User B (US). 8 steps, ~300ms total end-to-end latency.

Message stored in India's Cassandra using LOCAL_QUORUM (~5ms)
Chat Service checks India Redis — User B not connected locally (no session found)
Message published to Kafka cross-region topic in India
Kafka MirrorMaker replicates to US Kafka cluster (~100–200ms)
US Chat Service consumes message from Kafka
US Chat Service looks up User B's session in US Redis → finds Gateway 789
Message routed via gRPC to Gateway 789 (~1ms)
Gateway 789 delivers to User B's WebSocket connection. Total: ~300ms

Design Decision	Choice	Reason	When to Switch
Read consistency	LOCAL_QUORUM	All reads are local. No cross-region read latency needed for chat history.	Never — cross-region reads add 200ms latency with no benefit
Write replication	Local + async replication	Cassandra async replication for durability. Kafka for real-time cross-region delivery.	Sync if financial data requiring global consistency
Conflict resolution	Last-Write-Wins (TIMEUUID)	Safe because chat messages are append-only — no updates.	CRDT if mutable data (e.g., presence, shared docs)
Session store	Per-region, NOT replicated	Avoids global session replication complexity. User is online in one region at a time.	Never — cross-region session replication adds 200ms per connect

CHECKLIST

12-Point Design Checklist

Figure 10: 12-point design checklist — use this to verify your answer covers all four deep-dive areas before wrapping up

Class Summary: The Four Hard Problems, Solved

Real-Time Messaging: WebSocket (not SSE, not polling). 50K connections/server. ~4,000 servers for 200M concurrent users. Sticky LB routing required. Gateway runs Connection Manager + Heartbeat Monitor (25s ping) + Session Registry updater + Message Router.
Cross-Server Routing: Redis session store maps user_id → gateway_server_id. Chat Service lookups Redis, routes via gRPC (~1ms) or Kafka (~5ms). This is the #1 insight interviewers look for.
Message Ordering: Server-assigned TIMEUUIDs (never client timestamps). Cassandra: partition key = conversation_id, clustering key = TIMEUUID. Per-conversation ordering only — global ordering is a bottleneck at 1.2M msgs/sec.
Offline Sync: Per-conversation cursors on reconnect. Delta query: message_id > last_seen_cursor. Unread counts pushed first (Redis HINCRBY). Messages streamed in batches of 50, most recent first. Outbox pattern + client UUID deduplication.
Multi-Region: 3 regions, full independent stack each. LOCAL_QUORUM reads and writes. Async Cassandra replication. Kafka MirrorMaker for cross-region real-time delivery. Session store per-region (not replicated globally). LWW via TIMEUUID for conflict resolution.

Want to Land at Google, Microsoft or Apple?

Watch Pranjal Jain's free 30-min training — the exact GROW Strategy that helped 1,572+ engineers go from TCS/Infosys to top product companies with a 3–5X salary hike.

DSA + System Design roadmap 1:1 mentorship from ex-Microsoft 1,572+ placed · 4.9★ rated

Watch Free Training →

Pre-Class Post-Class → coming soon

WhatsApp System Design
Real-Time Messaging, Ordering & Offline Storage Deep Dive

Contents