500M
DAU
1.2M
Writes / sec
<300ms
Delivery latency
36 PB
Storage
Part 1

Design WhatsApp

Real-Time Messaging • WebSocket • Cassandra • Kafka Fan-Out • Presence • Read Receipts

Section 1

Requirements & Scale

WHY

The #2 Interview Question

WhatsApp is the second most commonly asked system design question after URL Shortener. It tests fundamentally different skills: real-time bidirectional communication (WebSocket vs REST), massive write throughput (1.2M writes/sec vs 40/sec), offline message handling, group messaging fan-out, presence tracking, and read receipts.

If the URL shortener tests your caching and read-optimization skills, WhatsApp tests your real-time messaging and write-scaling skills.

WhatsApp requirements — 6 functional features, 7 scale numbers. Write-heavy (1.2M writes/sec), real-time delivery.
Figure 1: WhatsApp requirements — 6 functional features, 7 scale numbers. Write-heavy at 1.2M writes/sec with real-time delivery.
DimensionURL ShortenerWhatsApp
Read/Write ratio100:1 (read-heavy)~1:1 (write-heavy)
ProtocolREST (stateless)WebSocket (stateful, persistent)
Write throughput~40 writes/sec1.2M writes/sec
Storage3 TB (5 years)36 PB
Key challengeCache optimizationReal-time routing + write scale
DatabasePostgreSQL (relational)Cassandra (wide-column, LSM-tree)
The Single Most Important Insight

These two systems are architecturally opposite. URL shortener = stateless REST + read cache. WhatsApp = stateful WebSocket + write throughput. When an interviewer asks you to design a messaging system, the first thing to establish is: "This is a write-heavy, real-time system — so our architecture will be fundamentally different from a URL shortener." Saying this immediately demonstrates architectural maturity.

Section 2

High-Level Design

COMPONENTS

Six Components, One Key Insight

WhatsApp HLD — WebSocket Gateway, Chat Service, Redis sessions, Cassandra messages, Kafka events, Push, Presence
Figure 2: WhatsApp HLD — WebSocket Gateway, Chat Service, Redis session store, Cassandra, Kafka, Push Notification, Presence Service
ComponentResponsibilityKey Detail
WebSocket Gateway Cluster Maintains persistent WebSocket connections with clients. Handles connect, heartbeat, disconnect, and message routing. 200M concurrent connections ÷ 50K per server = ~4,000 gateway servers
Chat Service Business logic: validate message, store in Cassandra, look up recipient’s gateway, route the message. Stateless — any instance handles any message. Horizontally scalable.
Session Store (Redis) Maps user_id → gateway_server_id. Updated on every connect/disconnect. The key insight — this is how the Chat Service routes a message to the correct gateway. Sub-millisecond lookups.
Message Store (Cassandra) All messages persisted, partitioned by conversation_id, clustered by timestamp. Write-once, read-once. LSM-tree architecture handles 1.2M writes/sec across the cluster. 36 PB total storage.
Kafka Group message fan-out, push notification delivery, analytics events. Decouples message receipt from delivery. One group message → one Kafka event → fan-out worker delivers to up to 256 members.
Presence Service (Redis) Tracks online/offline status. Each client sends a heartbeat every 25 seconds. Key: presence:{user_id} with 30-second TTL. Key expiry = user offline.
Why Redis Session Store Is the Key Insight

In a WebSocket-based architecture, persistent connections make routing non-trivial. When User A sends a message to User B, the Chat Service cannot broadcast to all 4,000 gateway servers — that would be 4,000 unnecessary deliveries. Instead, it does a single Redis lookup: GET session:{user_b_id} returns the specific gateway server ID. One targeted delivery. This Redis session store is what makes the architecture work at scale, and it’s what interviewers are listening for.

Section 3

Message Delivery: Online, Offline & Group

PATHS

Three Delivery Paths for Three Scenarios

Three delivery paths — online (WebSocket, ~200ms), offline (store + push notification), group (Kafka fan-out)
Figure 3: Three delivery paths — online (~100–200ms via WebSocket), offline (store + push notification), group (Kafka fan-out)
Path 1: Online Delivery (~100–200ms)
  1. User A sends a message via their WebSocket connection to the gateway.
  2. Gateway forwards to Chat Service.
  3. Chat Service stores in Cassandra (write-ahead for durability).
  4. Chat Service looks up User B’s gateway server in Redis (GET session:{user_b_id}).
  5. Routes the message to B’s specific gateway server.
  6. Gateway pushes message to B’s device via existing WebSocket connection.
  7. B’s device ACKs receipt → triggers “delivered” status update back to A.
Path 2: Offline Delivery (Store + Push)

If User B is not in the session store (not connected), the Chat Service stores the message in Cassandra and sends a push notification via APNS (iOS) or FCM (Android). When B opens the app and reconnects, the client fetches all unread messages: SELECT * FROM messages WHERE conversation_id = ? AND message_id > last_seen_id. This gives B all missed messages in order, regardless of how long they were offline.

Path 3: Group Delivery (Kafka Fan-Out)

For a group message (up to 256 members), the Chat Service publishes the message to Kafka (topic: group.{group_id}). A Group Fan-Out Worker consumes the event, reads the group’s member list, and delivers to each member:

  • Online members: message routed to their WebSocket gateway via the session store.
  • Offline members: push notification via APNS/FCM.

One Kafka event triggers up to 256 deliveries. Without Kafka, the Chat Service would need to make 256 synchronous calls, blocking for each one.

Section 4

Message Schema & Read Receipts

SCHEMA

Cassandra Schema & Receipt State Machine

Cassandra schema (8 columns, partition by conversation_id) and receipt state machine (sent → delivered → read)
Figure 4: Cassandra schema (8 columns, partition by conversation_id) and read receipt state machine: sent ✓ → delivered ✓✓ → read (blue)
Cassandra Messages Table
CREATE TABLE messages (
  conversation_id  UUID,
  message_id       TIMEUUID,          -- time-ordered, globally unique
  sender_id        UUID      NOT NULL,
  content          TEXT,
  content_type     VARCHAR,           -- text | image | video | audio
  status           VARCHAR,           -- sent | delivered | read
  created_at       TIMESTAMP NOT NULL,
  is_deleted       BOOLEAN   DEFAULT FALSE,
  PRIMARY KEY (conversation_id, message_id)
)
WITH CLUSTERING ORDER BY (message_id DESC)  -- newest first
AND default_time_to_live = 7776000;          -- 90-day auto-expiry

Why Cassandra over PostgreSQL? At 1.2M writes/sec, PostgreSQL’s B-tree index maintenance under heavy write load becomes a bottleneck. Cassandra’s LSM-tree writes are always sequential (append to MemTable, flush to SSTable) — no random I/O, no index contention. Partition key conversation_id ensures all messages for a conversation land on the same node, making chat history reads a single-partition scan.

Receipt StateTriggerSymbolImplementation
SentServer stored the message in Cassandra✓ (single grey tick)Chat Service updates status, sends event back to sender’s WebSocket
DeliveredRecipient’s device ACKed receipt via WebSocket✓✓ (double grey tick)Gateway receives ACK, Chat Service updates Cassandra, notifies sender
ReadRecipient opened the chat✓✓ (blue)Client sends read event, Chat Service updates Cassandra, notifies sender
Group Receipt Batching

For groups, generating one receipt event per member per message would create a message storm — a 256-member group with 100 messages/minute would produce 25,600 receipt events/minute just for one user’s messages. Instead, receipts are batched: one “delivered to 180/256” event per message, updated as each member ACKs. The sender sees an aggregate counter, not 256 individual events.

Section 5

Common Interview Follow-Ups

DEPTH

Five Questions with Production-Grade Answers

Five follow-up questions with production-grade answers — presence, groups, ordering, media, encryption
Figure 5: Five common follow-up questions — presence, group scaling, message ordering, media delivery, end-to-end encryption
Follow-Up QuestionProduction-Grade Answer
How does presence work at scale? Redis key presence:{user_id} with 30-second TTL. Heartbeat every 25 seconds resets TTL. Key expiry = offline. Only show presence to contacts — not all 500M users. Filter in the application layer. ~10 contacts × 500M users = 5B presence queries/day, mostly served from Redis at sub-millisecond latency.
How do groups scale beyond 256 members? WhatsApp caps groups at 256 members specifically because fan-out cost is O(n). For larger groups (Channels, Broadcast Lists), switch to a pub/sub model: members subscribe to a topic, the server publishes once. Recipients pull on open. Do not push to all subscribers synchronously.
How do you guarantee message ordering? Cassandra TIMEUUID as the clustering key: time-ordered and globally unique. For conversations, ordering is per-conversation (not global). Clients use the TIMEUUID to sort — they never trust their local clock. For the same millisecond, TIMEUUID has a UUID suffix that breaks ties.
How is media delivered? Media is never sent through the Chat Service. Sender uploads to an S3-equivalent object store, gets a CDN URL. The message payload contains only the CDN URL + metadata (thumbnail, file size). Recipient downloads from CDN directly. This keeps the Chat Service lean — it only handles metadata, never binaries.
How does end-to-end encryption work? Signal Protocol: each device has a public/private key pair. The server stores only public keys. Messages are encrypted client-side using the recipient’s public key. The server never has the plaintext — it stores and forwards encrypted blobs. The server cannot read your messages even if subpoenaed.
Part 2
Part 2

Kafka Deep Dive

Internals • Partitions • Consumer Groups • Replication • 8 Use Cases • Cheat Sheet

Section 6

Kafka Internal Architecture

INTERNALS

Why Kafka Handles 1M+ Messages/Sec

Apache Kafka is the industry-standard event streaming platform, used by LinkedIn (1 trillion messages/day), Netflix, Uber, and virtually every large-scale system. It appears in nearly every system design answer: for event-driven architecture, async processing, analytics pipelines, and message delivery. Understanding why it’s fast gives you the authority to recommend it confidently in interviews.

Kafka internal architecture — broker cluster with partitions, controller for metadata, and five performance secrets
Figure 6: Kafka internal architecture — broker cluster with partitions, controller for metadata, and five performance mechanisms
#MechanismHow It WorksSpeedup
1 Append-Only Commit Log Each partition is an append-only log. Messages are never modified — only appended. All writes are sequential. Sequential disk I/O achieves 600+ MB/s on modern SSDs. 6× faster vs random I/O (~100 MB/s)
2 Log Segments Log divided into segments (default 1 GB). Active segment receives writes; older segments are immutable. Time-based (7 days default) or size-based retention limits disk usage. Bounded disk use + replay capability
3 Zero-Copy sendfile() transfers data from disk page cache directly to network socket, bypassing user-space copy. Traditional: disk → kernel → user → socket. Zero-copy: disk → kernel → network. 50%+ CPU reduction for consumer reads
4 Page Cache Kafka uses OS page cache instead of JVM heap. Writes go to page cache first (RAM). Consumer reads of recent data (99% of reads) served at memory speed, not disk speed. Memory-speed reads for active data
5 Batching + Compression Producers batch multiple messages before sending. Compression (Snappy, LZ4, Zstd) applied per batch, reducing network transfer by 60–80%. Per-message overhead becomes per-batch overhead. 60–80% network reduction
The One Thing to Memorize About Kafka’s Speed

Kafka is fast because of sequential I/O. A traditional message queue does random I/O (update message status, delete after consumption). Kafka only appends. The OS is highly optimized for sequential writes — it can pre-fetch ahead, write in large pages, and use the full bandwidth of the storage device. This is why Kafka can sustain 1M+ messages/sec on commodity hardware. Everything else (zero-copy, page cache, batching) amplifies this core advantage.

Section 7

Partitions & Consumer Groups

SCALE

The Foundation of Kafka’s Scalability

6 partitions consumed by two independent groups — Chat Delivery (3 consumers) and Analytics (2 consumers)
Figure 7: 6 partitions consumed by two independent consumer groups — Chat Delivery (3 consumers) and Analytics (2 consumers)

Kafka’s partition model is the foundation of its scalability and its most commonly tested concept in interviews. A topic is divided into N partitions. Each partition is an ordered log. Messages with the same partition key always go to the same partition (hash(key) % N). Within a consumer group, each partition is assigned to exactly one consumer.

The Five Rules of Partitions (Memorize These)
  1. Max parallelism = number of partitions. If you have 6 partitions and 10 consumers in a group, 4 consumers sit idle.
  2. Ordering is per-partition only. No global ordering across partitions. If you need all messages from one user in order, use the user ID as the partition key.
  3. Multiple consumer groups are independent. The same topic can be consumed by a Chat Delivery group AND an Analytics group simultaneously, each at their own offset.
  4. Cannot reduce partitions without recreating the topic. Start with 12–64 partitions. Over-provision — it’s cheap.
  5. Partition key determines ordering. For WhatsApp: partition by conversation_id so all messages in a conversation go to the same partition, preserving ordering.
ScenarioPartition Key ChoiceWhy
Chat messagesconversation_idAll messages in a conversation stay ordered on one partition
URL click eventsshort_codeAll clicks for one URL are processed by the same consumer — no cross-partition aggregation needed
User activity eventsuser_idEnsures per-user event ordering for session reconstruction
Log aggregationnull (round-robin)No ordering needed; maximize throughput across all partitions
Section 8

Replication & Durability

DURABILITY

acks=all — Never Lose a Message

Partition replication across 3 brokers — Leader handles R/W, ISR followers replicate. Three acks levels.
Figure 8: Partition replication across 3 brokers — Leader handles reads/writes, ISR followers replicate synchronously. Three acks levels explained.

Kafka replicates each partition across multiple brokers (replication factor, typically 3). One replica is the Leader (handles all reads and writes) and the others are Followers in the In-Sync Replica set (ISR). The producer’s acks setting controls the durability guarantee.

acks settingDurabilityLatencyUse When
acks=0 None — fire and forget. Message lost if broker crashes before writing. Lowest (~1ms) Metrics/logs where some loss is acceptable
acks=1 Leader writes to disk, ACKs. Lost if leader crashes before followers replicate. ~5ms Moderate durability, lower latency
acks=all All ISR replicas write to disk before ACK. No data loss if any single broker fails. ~10ms Critical data: messages, payments, orders
For WhatsApp Messages: Always Use acks=all

Use acks=all with min.insync.replicas=2. Every message is stored on at least 2 brokers before the producer gets an ACK. If one broker dies, the message survives on the other. The added latency (~10ms vs ~5ms) is negligible compared to network latency between client and server (~100ms). Messages are too important to lose. In an interview: “For this topic, I’d use replication factor 3, acks=all, min.insync.replicas=2 to ensure zero message loss at a negligible latency cost.”

Section 9

Kafka Use Cases in System Design

APPLY

8 Use Cases That Appear in Every Interview

Eight Kafka use cases that appear in system design interviews
Figure 9: Eight Kafka use cases that appear in system design interviews
Use CaseHow Kafka HelpsExample
Event-Driven Architecture Services communicate via events instead of direct calls. Producer doesn’t know or care who consumes. Order placed → Kafka → inventory, billing, notification services consume independently
Message / Task Queue Competing consumers within a group. One message processed by exactly one consumer. Built-in retry + Dead Letter Queue. Email sending, video transcoding, background jobs
Stream Processing Real-time ETL with Kafka Streams or Flink. Windowed aggregations, joins, transformations on live data. Fraud detection (sliding window over card transactions), real-time dashboards
Log Aggregation Collect logs from thousands of servers into centralized topics. ELK stack or Splunk consumes downstream. Microservice logs → Kafka → Elasticsearch for search and alerting
Change Data Capture (CDC) Debezium reads DB transaction log, publishes every row change to Kafka. Downstream systems stay in sync without polling. PostgreSQL changes → Kafka → Elasticsearch (search sync), Redis (cache invalidation)
Notification Fan-Out One event triggers multiple notification channels without blocking the main flow. User action → Kafka → push notification + email + SMS consumers in parallel
Analytics Pipeline Click/event data flows through Kafka to ClickHouse or Druid for aggregation and dashboarding. URL click events → Kafka → Flink → ClickHouse → analytics dashboard
Chat / Feed Fan-Out One message triggers delivery to multiple recipients asynchronously. Group message → Kafka → fan-out worker delivers to 256 members
Summary

Pre-Class Summary & Kafka Cheat Sheet

RECAP

Everything You Need Before Class

WhatsApp: Key Facts to Know Cold
  • Scale: 500M DAU, 100B messages/day, 1.2M writes/sec, <300ms delivery latency, 36 PB storage.
  • Protocol: WebSocket (not REST) — bidirectional, persistent, required for real-time push.
  • Routing secret: Redis Session Store maps user_id → gateway_server_id. This is the key insight.
  • Storage: Cassandra, partitioned by conversation_id. LSM-tree handles 1.2M writes/sec.
  • Group delivery: Kafka fan-out worker. One event → up to 256 deliveries.
  • Presence: Redis key with 30s TTL, heartbeat every 25s. Key expiry = offline.
  • Receipts: sent ✓ → delivered ✓✓ → read (blue ✓✓). Groups batch receipts to avoid message storms.
Kafka: Key Facts to Know Cold
  • Why fast: Append-only commit log = sequential I/O (600+ MB/s). Plus: zero-copy, page cache, batching + compression.
  • Partitions: Max parallelism = partition count. Ordering within partition only. Partition key for per-entity ordering.
  • Consumer groups: Independent groups consume the same topic at their own offset. Multiple use cases from one topic.
  • Durability: RF=3, acks=all, min.insync.replicas=2 for critical data. Zero data loss at ~10ms extra latency.
  • Scale: Start with 12–64 partitions. Cannot reduce without recreating. Over-provision up front.
  • 8 use cases: Event-driven architecture, task queue, stream processing, log aggregation, CDC, notification fan-out, analytics pipeline, chat fan-out.
Complete Kafka interview cheat sheet — 12 key facts to know
Figure 10: Complete Kafka interview cheat sheet — 12 key facts to know cold for any system design interview

Want to Land at Google, Microsoft or Apple?

Watch Pranjal Jain's free 30-min training — the exact GROW Strategy that helped 1,572+ engineers go from TCS/Infosys to top product companies with a 3–5X salary hike.

DSA + System Design roadmap 1:1 mentorship from ex-Microsoft 1,572+ placed · 4.9★ rated
Watch Free Training →