System design is the single biggest differentiator between SDE1 and SDE2 offers at Indian product companies. DSA gets you in the door; system design determines your level and compensation. Yet most engineers prepare for it the wrong way — memorizing "design Twitter" answers without understanding the underlying patterns that repeat across every question.
This guide teaches you the 12 core patterns that appear in 90% of system design questions at Google, Amazon, Flipkart, Swiggy, Razorpay, Zomato, and other Indian product companies. Learn these patterns, and you can design any system — not just the ones you memorized.
1. The Universal System Design Framework (Use This Every Time)
Before learning patterns, internalize this 7-step framework. Every system design interview, regardless of company or problem, should follow this structure. It shows the interviewer you think like a senior engineer — requirements first, implementation last.
Clarify Requirements (5 min)
Functional requirements: what must the system do? Non-functional: scale (DAU, QPS), latency SLAs, availability target (99.9% vs 99.999%). Ask: "Is this read-heavy or write-heavy? Strong or eventual consistency?"
Capacity Estimation (3 min)
Back-of-envelope: QPS = DAU × requests/day ÷ 86400. Storage = entities × size × retention. Bandwidth = QPS × payload size. These numbers drive every architectural decision.
Define Core APIs (5 min)
Identify the 3–5 most important endpoints. Define request/response structure. This forces clarity on what the system actually does before you design how it does it.
Design the Data Model (5 min)
Identify core entities and relationships. Choose SQL vs NoSQL with justification. Define the primary key and most important indexes. Schema decisions have long-term consequences — make them explicit.
High-Level Architecture (15 min)
Draw the system: Client → CDN → Load Balancer → Services → Cache → DB → Message Queue. Explain every component choice. Don't just list technologies — explain why you chose Redis over Memcached, or Kafka over RabbitMQ.
Deep Dive (15 min)
The interviewer will guide you to 1–2 specific components to design in detail. This is where you demonstrate pattern depth. Common deep-dives: the caching strategy, the database sharding approach, the message queue design.
Trade-offs & Failure Scenarios (5 min)
What breaks first under load? What's your single point of failure? How do you handle a database failure? What's the consistency trade-off you've accepted? Senior engineers think about failure modes — this section shows maturity.
2. The 12 Core System Design Patterns
Most user-facing systems are 80–90% reads. The pattern: put a cache layer in front of the database. Cache-aside (lazy loading) is the most common strategy — check cache first, miss → fetch from DB → populate cache.
- L1 Cache: In-process memory cache (Guava Cache, Caffeine) — sub-millisecond, but per-instance, inconsistent
- L2 Cache: Redis cluster — shared across instances, millisecond latency, supports rich data types
- L3 Cache: CDN (CloudFront, Akamai) — for static assets and API responses that change rarely
- Cache Invalidation Strategies: TTL (simple, stale risk) | Write-through (consistent, extra write) | Write-behind (async, faster writes, data loss risk)
- Cache Stampede Problem: Thundering herd on cache miss → use probabilistic early expiration or request coalescing
When writes burst (order placement, payment events, user actions) you can't synchronously write to all downstream systems. Decouple with a message queue: write fast to the queue, process asynchronously.
- Kafka (preferred for high-throughput, durable, replay): Order events, payment events, analytics pipelines. Retention enables replay for debugging and backfill.
- RabbitMQ (preferred for task queues, routing complexity): Email/SMS sending, background jobs, flexible message routing with exchanges.
- Fan-out Pattern: One event → multiple consumers. Example: "order placed" → [inventory service, notification service, analytics service, warehouse service] all consume independently.
- Backpressure: If consumers are slow, queue grows. Solutions: auto-scaling consumers, circuit breaker on producer side, dead-letter queue for failed messages.
When a single database can't handle the load (typically >10M rows or >10K QPS), horizontally partition data across multiple shards. Each shard holds a subset of the data.
- Range-based sharding: Users A–M on shard 1, N–Z on shard 2. Simple but creates hotspots (most users might be A–M).
- Hash-based sharding:
shard = hash(userId) % N. Uniform distribution but rebalancing when adding shards is expensive. - Consistent hashing: Virtual ring — adding/removing a shard only affects neighboring shards. Preferred for dynamic scaling.
- Shard Key Selection: Choose a key with high cardinality and even distribution. Bad shard key = hot shard. E.g., sharding tweets by userId distributes load evenly; sharding by creation date creates temporal hot shards.
- Cross-shard queries: JOINs across shards are expensive. Denormalize or maintain a global index for cross-shard lookups.
Rate limiting controls how many requests a client can make in a time window. Essential for API protection, fair usage, DDoS mitigation.
- Token Bucket: Each user gets N tokens/second. Each request consumes 1 token. Tokens accumulate up to a max bucket size. Allows burst traffic within burst limit. Most commonly used.
- Sliding Window Counter: Count requests in the last N seconds using a rolling window. More accurate than fixed window but requires more storage (sorted set in Redis per user).
- Fixed Window Counter: Reset counter every minute. Simple but boundary attack: 100 req in last second of minute 1 + 100 req in first second of minute 2 = 200 req in 2 seconds.
- Distributed Rate Limiting: Use Redis with atomic INCR + TTL. For multi-node: central Redis cluster acts as the rate limit state store. Lua scripts for atomic check-and-increment.
For live data (chat messages, order tracking, stock prices, collaborative docs), polling every few seconds is wasteful. Use persistent connections for server-push.
- WebSocket: Full-duplex persistent connection. Client and server can both send messages anytime. Best for: chat, real-time gaming, collaborative editing, live bidding.
- Server-Sent Events (SSE): Server pushes to client over HTTP. Client can't send back. Best for: live dashboards, order status updates, notification streams. Simpler than WebSocket.
- Long Polling: Client makes request, server holds it open until data is available, then responds. Client immediately makes another request. Simple fallback when WebSocket isn't available.
- Connection management at scale: Each WebSocket connection is a stateful socket. 1M concurrent connections need ~1000 servers (1K connections/server is typical). Use a connection service with Redis for routing messages to the right server.
When users type a query and need to find matching documents across millions of records, traditional SQL LIKE queries don't scale. Use a search engine with an inverted index.
- Inverted Index: Maps each word → list of documents containing that word. Query "red sneakers" → intersect documents containing "red" and documents containing "sneakers".
- Elasticsearch: Distributed inverted index built on Lucene. Handles full-text search, faceted filtering (price range, brand), geo-spatial queries, and aggregations.
- Relevance Ranking: TF-IDF (term frequency × inverse document frequency) as baseline. Layer on: freshness boost, personalisation signals, click-through rates.
- Synchronization challenge: Primary data lives in MySQL/PostgreSQL. Sync to Elasticsearch via CDC (Change Data Capture) with Debezium → Kafka → Elasticsearch consumer. Eventual consistency is acceptable for search.
- Spell correction: Edit distance (Levenshtein) for "did you mean?" features. Trie for autocomplete prefix matching.
In distributed systems, network failures cause retries. Without idempotency, a retry can cause double charges, duplicate orders, or double sends. Every financial/critical operation must be idempotent.
- Idempotency Key: Client generates a unique key per operation (UUID). Server stores processed keys in Redis with TTL. On retry with same key: return cached result instead of reprocessing.
- Pattern in Payment: Client sends
POST /payment { amount: 100, idempotency_key: "uuid-123" }. Server processes payment and storesuuid-123 → {success, txn_id}. On retry: return the stored result — no double charge. - Exactly-Once in Kafka: Kafka supports exactly-once semantics (EOS) via transactions. Producer assigns a transactional ID; broker tracks processed offsets. Consume-transform-produce pipeline is atomic.
- Saga Pattern: For distributed transactions across services (order → payment → inventory), use a saga: sequence of local transactions with compensating transactions on failure. Avoids distributed locks.
When a user posts content that needs to appear in all their followers' feeds, you face the fan-out problem. Two strategies, each with trade-offs.
- Fan-out on Write (Push): When user A posts → immediately write to all followers' feed caches. Pro: fast feed reads. Con: huge write amplification for celebrities (Sachin Tendulkar has 20M followers — one tweet = 20M cache writes). Solution: hybrid approach below.
- Fan-out on Read (Pull): When user reads feed → fetch posts from all followees in real-time. Pro: simple writes. Con: slow reads for users following many people.
- Hybrid (used by Twitter/Instagram): Fan-out on write for regular users (<1K followers). Fan-out on read for celebrity accounts (verified/high-follower users). Merge the two at read time.
- Timeline Cache: Pre-built feed stored in Redis sorted set (post_id + timestamp as score). Feed read = ZREVRANGE on the sorted set. Extremely fast.
Finding the nearest driver, restaurant, or store requires efficient geo-spatial queries. Standard SQL queries on (lat, lng) don't scale — you need spatial indexing.
- Geohash: Encode (lat, lng) into a short string (e.g., "ttnq"). Nearby points have similar prefixes. Redis supports geohash natively:
GEOADD,GEODIST,GEORADIUS. - Quadtree: Recursively divide the map into 4 quadrants. Each node represents a region. Leaf nodes hold location data. Good for static data; expensive to update for moving objects.
- Real-Time Location Updates: Driver sends GPS location every 5 seconds → Kafka → Location service → Redis geohash. Rider app queries Redis for drivers within N km. For 1M drivers: 1M updates/5s = 200K writes/sec to Redis — use Redis Cluster.
- Proximity Search on Maps: Grid-based partitioning → divide the city into N×N cells. Assign entities to cells. Query: find all entities in current cell + adjacent 8 cells.
Distributed systems need globally unique IDs for orders, users, transactions. Auto-increment in a single DB doesn't scale. UUID is unique but not sortable by time.
- UUID v4: 128-bit random. Guaranteed unique globally. Downside: not sortable by time, large storage, bad for DB index locality.
- Twitter Snowflake: 64-bit ID = 41 bits timestamp + 10 bits machine ID + 12 bits sequence. Sortable by creation time. 4096 IDs/millisecond per machine. Used by Twitter, Discord, many others.
- ULID (Universally Unique Lexicographically Sortable ID): 128-bit, URL-safe, sortable. Better than UUID for databases — maintains insert locality.
- Clock Skew Problem: NTP drift can cause two machines to generate the same Snowflake ID. Solutions: use a logical clock, or a centralized ID service (expensive but safe).
- Instagram Approach: Postgres stored procedure generates IDs using epoch + shard ID + sequence. Avoids centralized bottleneck while maintaining sortability.
In microservices, one slow service can cascade failures across the entire system. Circuit breakers prevent this — they stop calling a failing service and return a fallback response instead.
- Circuit Breaker States: CLOSED (normal) → OPEN (too many failures, fast-fail all requests) → HALF-OPEN (let a few through to test if service recovered).
- Failure Threshold: Open circuit when error rate > N% in last M requests, or when last N requests all failed.
- Fallback Strategies: Return cached result (stale but available); return a default/empty response; return a simplified response; redirect to a backup service.
- Timeout: Never let a service call run indefinitely. Set timeouts at every layer (HTTP client timeout, DB query timeout, Kafka consumer timeout).
- Bulkhead Pattern: Limit the thread pool size for each downstream service. If service A is slow, it can only consume its allocated threads — it won't starve calls to services B and C.
Analytics and ML features need both real-time data (for live dashboards) and historical data (for trend analysis, model training). Lambda Architecture handles both.
- Lambda Architecture: Batch layer (historical accuracy) + Speed layer (real-time, approximate) + Serving layer (merges both). Pro: robust. Con: dual code paths, complex maintenance.
- Kappa Architecture: Stream-only with reprocessing. Keep all events in Kafka (long retention). Reprocess historical data by replaying from the beginning. Simpler but requires Kafka storage for months/years.
- Batch Processing: Apache Spark (large-scale ETL, daily reporting). Run nightly or hourly. High latency but can process petabytes.
- Stream Processing: Apache Flink or Spark Streaming (real-time analytics). Process events as they arrive. Use for: live dashboards, real-time fraud detection, surge pricing.
- Data Lake vs Data Warehouse: Lake = raw events in S3/HDFS (schema-on-read, cheap, flexible). Warehouse = processed, structured data in Redshift/BigQuery (schema-on-write, fast queries, expensive).
3. Company-Specific System Design Questions in India 2026
| Company | Most Asked System Design Questions | Key Patterns to Emphasize |
|---|---|---|
| Google India | Design Google Search, Design YouTube, Design Google Maps, Design Distributed File System | MapReduce, Bigtable, consistent hashing, inverted index, geo-spatial |
| Microsoft India | Design OneDrive, Design Teams Chat, Design Azure Service Bus, Design CI/CD Pipeline | WebSockets, distributed storage, message queues, microservices |
| Amazon India | Design Order Management, Design Product Search, Design DynamoDB, Design Rate Limiter | Exactly-once, idempotency, consistent hashing, Dynamo paper concepts |
| Flipkart | Design Cart & Checkout, Design Big Billion Days infrastructure, Design Delivery Tracking | Flash sale queue, inventory locking, event-driven, geo-spatial |
| Swiggy / Zomato | Design Delivery Partner Matching, Design Restaurant Search, Design Surge Pricing, Design ETA prediction | Geo-spatial, real-time location, ML pipeline, Kafka fan-out |
| Razorpay / PhonePe | Design Payment Gateway, Design UPI system, Design Reconciliation Engine, Design Fraud Detection | Idempotency, saga pattern, exactly-once, circuit breaker, real-time ML |
| Walmart Global Tech | Design Inventory Management, Design Price Engine, Design Store Locator, Design Supply Chain | Caching, sharding, geo-spatial, batch processing, eventual consistency |
| Salesforce India | Design Multi-Tenant CRM, Design Workflow Engine, Design API Rate Limiter, Design Audit Trail | Multi-tenancy, row-level security, event sourcing, rate limiting |
4. HLD vs LLD — What Each Level Requires
| Aspect | HLD (High-Level Design) | LLD (Low-Level Design) |
|---|---|---|
| Focus | System architecture, component interactions, technology choices | Class hierarchy, methods, design patterns, API contracts |
| Output | Architecture diagram with services, DBs, queues, CDN | Class diagram, interface definitions, database schema |
| Scale concern | How does this handle 10M requests/day? | How does the OrderProcessor class handle concurrent requests? |
| Tools discussed | Redis, Kafka, Elasticsearch, Kubernetes, S3, CDN | Design patterns, SOLID principles, thread safety, OOP |
| Level required | SDE2 and above, all companies | SDE1+ at Flipkart (MCR), SDE2+ at most companies |
| Example question | "Design Swiggy's order tracking at 1M orders/day" | "Design the class structure for a parking lot system" |
5. The 6 Most Common System Design Interview Mistakes in India
The #1 failure. "Design Twitter" → immediately starts drawing microservices. But is it 1000 users or 100M? Read-heavy or write-heavy? Without asking, your design has no foundation and the interviewer will redirect you anyway — wasting time and creating a bad first impression.
"I'll use Kafka, Redis, Elasticsearch, Kubernetes, Cassandra, and gRPC." Why? What problem does each solve? A design with fewer, well-justified technologies beats a buzzword salad. Interviewers test: "Why Kafka over RabbitMQ here?" If you can't answer, the technology choice is a red flag.
Most candidates design the happy path but never discuss: What happens if Redis goes down? What's the consistency model when two users edit simultaneously? What's the latency SLA and how do you enforce it? Senior engineers obsess over failure modes. NFRs are where you demonstrate seniority.
"I'll use a CQRS event-sourced microservices architecture with a two-phase commit distributed transaction." That's over-engineered for a startup. Great engineers design the simplest system that meets the requirements, not the most technically impressive one. Ask: "What's the MVP architecture, and what would we add at 10× scale?"
Verbal descriptions of complex systems are impossible to follow. Always draw — even in a virtual interview using a whiteboard tool. A diagram forces clarity, helps the interviewer follow your thinking, and gives you a shared reference for the deep-dive discussion.
Design the whole system, then say "any questions?" — and discover the interviewer wanted you to go deeper on the search component, not the notification system. Check in after each step: "Does this approach make sense? Should I go deeper anywhere here?" Interviewers want collaboration, not a monologue.
6. Frequently Asked Questions
🏗️ Build System Design Skills with PrepFlix
Combine strong DSA foundations with system design knowledge to crack SDE2+ interviews at India's top product companies. Start with the PrepFlix DSA track and work up to system design.
Start Your Prep →