Concept 1

Synchronous vs Asynchronous Communication

Why Async: The Fundamental Shift

In a synchronous system, Service A calls Service B and waits for a response before continuing. This is simple but creates tight coupling: A depends on B's availability, B's latency, and B's capacity. If B is slow, A is slow. If B is down, A fails. If B cannot keep up with traffic, requests pile up and both services degrade.

In an asynchronous system, Service A sends a message to a queue and immediately moves on. It does not wait for Service B. Service B processes the message whenever it is ready — seconds, minutes, or even hours later. The queue acts as a buffer between them, absorbing traffic spikes, surviving temporary failures, and allowing each service to scale independently.

Synchronous vs asynchronous communication — blocking vs non-blocking
Figure 1: Synchronous — A waits for B (blocked). Asynchronous — A sends to queue and moves on (never blocked).
AspectSynchronous (HTTP/gRPC)Asynchronous (Message Queue)
CouplingTight (A must know B's address)Loose (A only knows the queue)
BlockingA waits for B's responseA continues immediately
Failure handlingIf B is down, A failsQueue buffers; B processes when back
Traffic spikesB overwhelmed → A failsQueue absorbs spike; B processes at steady pace
LatencyA's latency includes B's latencyA's latency is just the queue write (~5ms)
Response to callerImmediate response with resultImmediate ACK, result available later
ComplexitySimple (standard HTTP/gRPC)More complex (queue infra, consumer logic)
Best forReads, queries, real-time responsesWrites, tasks, events, heavy processing
The Flash Sale Problem

An e-commerce site runs a flash sale. In 1 second, 100,000 users click 'Buy.'

Synchronous: Order Service calls Payment, Inventory, and Email Services sequentially. Payment Service handles 5,000 req/sec. Result: 95,000 users get timeout errors.

Asynchronous: Order Service publishes 100,000 order.created events to Kafka in 1 second. Payment, Inventory, and Email services consume at their own pace (5K/sec each). All 100,000 orders processed within 20 seconds. Zero failures.

Interview Tip: Identify the Async Opportunity

In any system design, ask: 'Does the user need to wait for this operation?' If no, make it async.

  • User uploads a video? Return 'processing' immediately, encode in background.
  • User places an order? Return 'order received' immediately, process payment async.
  • User sends a message? Return 'sent' immediately, deliver async.

Concept 2

Message Queue Architecture

A message queue has three components: producers (send messages), the broker (stores and delivers messages), and consumers (process messages). The producer serializes a message (JSON, Protobuf) and sends it to a named queue or topic on the broker. The broker persists the message to disk and delivers it to one or more consumers. The consumer processes the message and sends an acknowledgment (ACK) to the broker, which then removes or marks the message as consumed.

Message queue architecture — producers, broker, consumers, ACK flow
Figure 2: Message queue architecture — producers send to broker; consumers pull, process, and ACK; broker removes on ACK
01

Key Properties

Durability: Messages are persisted to disk on the broker. Even if the broker restarts, messages are not lost. Kafka retains messages for a configurable period (default 7 days) even after consumption, allowing replay.

Ordering: Messages within a single partition (Kafka) or queue (RabbitMQ) are delivered in FIFO order. Across partitions, ordering is not guaranteed. Use a partition key to ensure all related messages go to the same partition.

At-Least-Once Delivery: The default guarantee. The broker delivers each message at least once. If the consumer crashes before sending an ACK, the broker redelivers the message. This means consumers may see duplicates and must handle them (idempotent processing).

Scalability: Add more partitions (Kafka) or queues (RabbitMQ) for parallel processing. Add more consumers to increase throughput. Kafka can handle 1 million+ messages per second on a 3-node cluster.

Concept 3

Six Core Use Cases

Six use cases for message queues
Figure 3: Six core use cases — decouple services, buffer spikes, async processing, event sourcing, fan-out, and task scheduling
1. Decouple Services

Order Service publishes one "order.created" event. Email, Inventory, Analytics each consume independently. Adding a new consumer requires zero changes to the producer.

2. Buffer Traffic Spikes

Queue absorbs 100x spike. Messages queue up; consumers process at max sustainable throughput. User gets immediate "order received." Queue is a shock absorber.

3. Async Heavy Processing

Video encoding, PDF generation, ML inference — API returns immediately ("processing"), worker consumes message and does the heavy work in background.

4. Event Sourcing

Store every state change as an immutable event in Kafka. Complete audit trail, time-travel debugging, rebuild state at any point in time. Used in banking and trading.

5. Fan-Out (Pub/Sub)

One Instagram post event → Feed Service, Notification Service, Analytics Service, Search Service. One publish, four consumers, zero coupling between them.

6. Task Scheduling & Retries

Delayed tasks ("send email in 30 min"), scheduled jobs ("daily reports"), and retry logic (exponential backoff). DLQ for permanent failures after max retries.

Concept 4

Messaging Models: P2P and Pub/Sub

Point-to-point vs pub/sub messaging models
Figure 4: Point-to-Point — each message goes to exactly one consumer. Pub/Sub — each message goes to all subscriber groups.
02

Point-to-Point (Work Queue / Competing Consumers)

Each message is delivered to exactly one consumer. If there are 3 workers, message 1 goes to worker A, message 2 to worker B, message 3 to worker C (round-robin). Workers compete for messages, distributing workload evenly. Use for task processing where each task should be executed once: sending an email, processing a payment, encoding a video. RabbitMQ queues and Kafka with a single consumer group both implement this model.

03

Publish-Subscribe (Fan-Out / Broadcast)

Each message is delivered to ALL subscribers. If there are 3 subscriber groups (Email, Analytics, Search), all three get a copy of every message. Each subscriber group is independent — it processes the message at its own pace and maintains its own offset. Use for events that multiple services need to react to: order created, user signed up, product updated.

AspectPoint-to-PointPub/Sub
DeliveryOne consumer per messageAll subscribers get a copy
Use caseTask processing (one executor)Event broadcasting (many reactors)
Consumer relationshipCompeting (work distribution)Independent (each gets everything)
Kafka implementationSingle consumer groupMultiple consumer groups
ExampleSend email, process paymentOrder event → email + analytics + feed
Kafka Supports Both Models Natively

A Kafka topic with one consumer group acts as a work queue (P2P). The same topic with multiple consumer groups acts as pub/sub. This is why Kafka is the default choice in system design interviews — it handles both patterns simultaneously without any configuration change.

Concept 5

Apache Kafka Deep Dive

Apache Kafka is a distributed event streaming platform. Unlike traditional message queues that delete messages after consumption, Kafka retains messages for a configurable period (default 7 days), allowing consumers to replay events. Kafka is designed for high throughput (1M+ messages/sec), durability (replicated across brokers), and horizontal scalability (add partitions for parallelism). It is the backbone of data infrastructure at LinkedIn, Netflix, Uber, and Airbnb.

Kafka architecture — topics, partitions, producers, consumer groups, brokers
Figure 5: Kafka architecture — producers write to topic partitions; each partition assigned to exactly one consumer in a group; messages retained for replay
04

Topics and Partitions

A topic is a named stream of messages (e.g., "orders", "user-events"). Each topic is divided into partitions — ordered, immutable logs of messages. When a producer sends a message, it specifies a partition key. Kafka hashes the key to determine which partition receives the message. Messages with the same key always go to the same partition, guaranteeing ordering for that key.

05

Consumer Groups and Parallelism

A consumer group is a set of consumers that cooperatively process a topic. Each partition is assigned to exactly one consumer in the group. If the topic has 6 partitions and the consumer group has 3 consumers, each consumer handles 2 partitions. If you add a 4th consumer, partitions are redistributed (rebalanced). If you have more consumers than partitions, extra consumers sit idle.

Rule: Max Parallelism = Number of Partitions

The maximum number of consumers that can actively process a topic in parallel equals the number of partitions. Adding a 7th consumer to a 6-partition topic wastes a consumer. Partition count is also hard to change later — start with more partitions than you think you need (12–64) to allow future scaling.

06

Offsets and Replays

Each message in a partition has a sequential offset (0, 1, 2, 3…). Each consumer tracks its current offset — the position of the last successfully processed message. If a consumer crashes and restarts, it resumes from its last committed offset, ensuring no messages are skipped. Because Kafka retains messages (unlike RabbitMQ which deletes after ACK), you can reset a consumer's offset to replay historical events. This is invaluable for debugging, backfilling data, and rebuilding state.

07

Partitions, Ordering, and Parallelism

Kafka partitions and ordering — guaranteed within partition, not across partitions
Figure 6: Ordering is guaranteed within a partition but NOT across partitions. Use partition keys for per-entity ordering with cross-entity parallelism.

This is the most important Kafka concept for interviews: messages are ordered within a partition but NOT across partitions. If you need order processing events (created → paid → shipped → delivered) to be processed in order for each order, use order_id as the partition key. All events for order #42 go to the same partition and are processed in sequence. Events for different orders go to different partitions and are processed in parallel — giving you both ordering (per entity) and throughput (across entities).

Kafka Partitioning at Uber

Uber's ride events use rider_id as the partition key. All events for rider #42 (request → match → pickup → dropoff → payment) go to the same partition and are processed in order. Events for different riders are processed in parallel across 256 partitions, achieving 100K+ events/sec.

If they used a random partition key, events might be processed out of order (payment before pickup), causing incorrect state.

Concept 6

Delivery Guarantees & Error Handling

Three delivery guarantees — at-most-once, at-least-once, exactly-once
Figure 7: At-most-once (may lose), at-least-once (may duplicate — the default), exactly-once (neither — most expensive)

At-Most-Once (Fire and Forget): The producer sends a message and does not wait for ACK. If the message is lost, it is gone — no retries. Fastest but no durability. Use for non-critical data: metrics, logging, analytics where occasional loss is acceptable. Analogous to UDP.

At-Least-Once (The Default): The producer retries until it receives an ACK. The broker retries delivery until the consumer ACKs. If the consumer processes but crashes before ACKing, the broker redelivers — causing a duplicate. The consumer must be idempotent: processing the same message twice must produce the same result. Use an idempotency key (message ID) to detect and skip duplicates. This is the default for 90% of production systems.

Exactly-Once (The Holy Grail): Each message is processed exactly once — no loss, no duplicates. Kafka achieves this with idempotent producers (deduplicates retries by sequence number), transactional writes (atomic writes across multiple partitions), and exactly-once consumer semantics (commit offset and process atomically). This is expensive and complex — only use for financial transactions where duplicates cause real-world harm (double charges, double transfers).

GuaranteeLost Messages?Duplicates?PerformanceUse Case
At-most-oncePossibleNeverFastestMetrics, logs, telemetry
At-least-onceNeverPossibleGood (default)Most systems (+ idempotent consumer)
Exactly-onceNeverNeverSlowestPayments, financial transactions
Interview Tip: Default to At-Least-Once + Idempotent Consumer

'I use at-least-once delivery with idempotent consumers. Each message has a unique message_id. The consumer checks a deduplication table (Redis SET with message_id) before processing. If the ID exists, skip it. This gives effectively-once semantics without the overhead of true exactly-once.' This is how Netflix, Uber, and most production systems operate.

08

Dead Letter Queue: Handling Permanent Failures

Dead letter queue — failed messages after max retries with exponential backoff
Figure 8: Messages that fail processing after max retries (with exponential backoff) are moved to the DLQ for manual review and replay

Not every message can be processed successfully. The message format might be corrupted, a downstream service might have a permanent bug, or the data might be invalid. After a configurable number of retries with exponential backoff (1s → 2s → 4s → 8s), the message is moved to a Dead Letter Queue (DLQ). The DLQ is a separate queue where failed messages are stored for manual inspection. An alert notifies the on-call engineer, who fixes the issue and replays the messages from the DLQ.

Retry Strategy: Exponential Backoff with Jitter
On failure, wait an exponentially increasing amount of time before retrying: 1s, 2s, 4s, 8s, 16s. This prevents a failing consumer from hammering a downstream service that is already struggling. Add random jitter to the backoff to prevent multiple consumers from retrying at exactly the same time (thundering herd).

Always Monitor Your DLQ

An unmonitored DLQ is a graveyard. Set up alerts: if the DLQ receives more than N messages per hour, page the on-call engineer. Regularly review DLQ messages — they reveal bugs, data quality issues, and integration problems. A growing DLQ is a sign that something in the system is broken.

Concept 7

Kafka vs RabbitMQ vs SQS

Kafka vs RabbitMQ vs SQS comparison
Figure 9: Three major messaging systems — Kafka (event streaming), RabbitMQ (traditional broker), SQS (managed AWS service)
FeatureApache KafkaRabbitMQAWS SQS
ModelDistributed log (append-only)Message broker (queues)Managed queue service
Message retentionConfigurable (7 days default)Deleted after ACK14 days max
ReplayYes (reset offset)NoNo
OrderingPer-partition FIFOPer-queue FIFOFIFO queues available
Throughput1M+ msg/sec50K–100K msg/secUnlimited (auto-scales)
RoutingTopic + partition keyExchanges + routing keysStandard or FIFO queues
OperationsSelf-managed or Confluent CloudSelf-managed or CloudAMQPFully managed by AWS
Best forEvent streaming, data pipelines, logsTask queues, RPC, complex routingSimple tasks on AWS, serverless
Used byLinkedIn, Netflix, Uber, AirbnbMany smaller/medium systemsAWS-native applications
Quick Decision Framework
  • Default for system design interviews: Kafka. Handles both work queues (single consumer group) and pub/sub (multiple groups), supports replay, and is the industry standard for event-driven architectures at scale.
  • Simple task queue with complex routing: RabbitMQ. If you need messages routed based on headers, patterns, or priority, RabbitMQ's exchange system is more flexible.
  • Managed, minimal ops, AWS-native: SQS. Zero infrastructure management. Pairs well with Lambda for serverless architectures.

Pre-Class Summary

Decision Guide & Cheat Sheet

Async messaging decision guide — does the caller need an immediate response?
Figure 10: Decision guide — start with 'does the caller need an immediate response?' then choose P2P or pub/sub based on consumer count

The decision to use async messaging starts with one question: Does the caller need an immediate response? If yes (user waiting for search results, login, a read query), use synchronous HTTP/gRPC. If no (the user can continue while work happens), use a message queue. Then choose the model: if one service should process each message, use a work queue (point-to-point). If multiple services need to react, use pub/sub (fan-out).

Sync vs Async: Synchronous (HTTP/gRPC) is simple but creates tight coupling. Asynchronous (message queue) decouples services, absorbs traffic spikes, and enables independent scaling. Use async for anything the user does not need to wait for.

Message Queue Architecture: Producers → broker (Kafka/RabbitMQ/SQS) → consumers. Broker stores durably, delivers to consumers, consumers ACK. Key properties: durability, ordering (within partition), at-least-once delivery, horizontal scalability.

Point-to-Point vs Pub/Sub: P2P delivers each message to one consumer (work distribution). Pub/Sub delivers to all subscribers (event broadcast). Kafka supports both via consumer groups.

Kafka Architecture: Topics split into partitions (ordered logs). Partition key determines routing and ordering. Each partition assigned to one consumer in a group. Offsets track position. Messages retained for replay. Max parallelism = number of partitions.

Delivery Guarantees: At-most-once (may lose), at-least-once (may duplicate, DEFAULT), exactly-once (expensive). Default to at-least-once with idempotent consumers using message ID deduplication.

Dead Letter Queue: Failed messages go to DLQ after max retries with exponential backoff. Monitor DLQ with alerts. Fix and replay. Essential for production reliability.

ScenarioPatternSystemWhy
User places orderAsync (event)KafkaDecouple from payment, inventory, email
User uploads videoAsync (task queue)Kafka / SQSHeavy processing; return 'processing' immediately
Send welcome emailAsync (task queue)SQS / RabbitMQUser doesn't wait for email delivery
Cache invalidationPub/Sub (fan-out)KafkaMultiple services need to invalidate their caches
Real-time analyticsAsync (stream)KafkaHigh-volume event ingestion + processing
Retry failed paymentAsync + DLQKafka / SQSExponential backoff; DLQ for permanent failures
User loginSynchronousHTTP / gRPCUser needs immediate yes/no response
Search querySynchronousHTTP / gRPCUser is waiting for search results
Health checkSynchronousHTTPImmediate response required for monitoring

Track Your DSA Progress — It's Free

Stop solving random questions. Start with the right 206 questions across 16 patterns — structured, curated, and completely free.

206 curated questions 16 patterns covered Google login · Free forever
Create Free Account →