Concept 1

Why Caching and How It Works

The Single Most Impactful Optimization

Caching is the technique of storing frequently accessed data in a faster storage layer so that future requests for that data are served more quickly. It is the single most impactful performance optimization in system design. A well-implemented caching layer can reduce database load by 95%, cut response times from 50ms to 1ms, and allow a system to handle 10x–100x more traffic without adding more database servers.

The fundamental principle is the memory hierarchy: data that is closer to the CPU is faster to access. An L1 cache read takes 1 nanosecond. A Redis read takes 0.5 milliseconds. A database read from SSD takes 5 milliseconds. A cross-network API call takes 50–200 milliseconds. Caching moves data from slow layers (disk, network) to fast layers (memory, edge servers).

Memory speed hierarchy — L1 cache to cross-network API call
Figure 1: The memory hierarchy — each layer is orders of magnitude slower than the one above it. Caching moves data up the hierarchy.
Caching Impact: Real Numbers

Without caching: 10,000 requests/sec → 50ms per DB query = 500 concurrent DB connections = database at capacity.

With Redis cache (95% hit rate): 500 requests/sec hit DB (5% misses) → 50ms = 25 concurrent DB connections. The same database now serves 20x more traffic. Redis handles the other 9,500 requests/sec at <1ms each.

Concept 2

Cache Read/Write Patterns

01

Cache-Aside (Lazy Loading): The Default Pattern

Cache-aside is the most common caching pattern and should be your default in system design interviews. The application is responsible for reading from and writing to the cache. On a read: the app checks the cache first. If the data is there (cache hit), it returns immediately. If not (cache miss), the app queries the database, stores the result in the cache, and then returns. The cache is populated lazily — only data that is actually requested gets cached.

When to use: Read-heavy workloads (90%+ reads). This is the default for most web applications: user profiles, product catalogs, API responses, configuration data. Used by virtually every production system.

How it handles writes: On a write, the application updates the database and then either deletes the cache key (most common, called cache invalidation) or updates the cache key (less common, called cache refresh). Deleting is preferred because it avoids the risk of the cache and DB getting out of sync if one write fails.

Cache-aside pattern — application manages cache reads and writes
Figure 2: Cache-aside (lazy loading) — the application checks cache first, queries DB on miss, and populates cache for future requests
02

Write-Through and Write-Behind

Write-Through: In write-through caching, every write goes to both the cache and the database synchronously. The write is not confirmed to the client until both the cache and database have been updated. This guarantees the cache is always consistent with the database — there is never stale data. The trade-off is higher write latency (two writes per operation) and the fact that data may be cached that is never read.

Use when: Data consistency is critical and writes are relatively infrequent. User profiles, permissions, configuration that must never be stale. Often combined with cache-aside for reads.

Write-Behind (Write-Back): In write-behind caching, writes go only to the cache. The cache then asynchronously flushes changes to the database in batches. The client gets an immediate response (cache-speed write) without waiting for the database. This dramatically improves write performance but introduces the risk of data loss: if the cache crashes before flushing, uncommitted writes are lost.

Use when: Write speed matters more than durability. View counts, like counts, analytics metrics, real-time dashboards. The data is valuable but not catastrophic to lose a few seconds of.

Read-Through: Read-through is similar to cache-aside but the cache itself is responsible for loading data from the database on a miss (rather than the application). The application always reads from the cache; the cache transparently fetches from the DB when needed. This simplifies the application code but couples the cache to the data source.

Write-through vs write-behind caching patterns
Figure 3: Write-through (synchronous, consistent) vs write-behind (asynchronous, fast writes with data loss risk)
PatternRead PathWrite PathConsistencyBest For
Cache-AsideApp checks cache, then DBApp writes DB, deletes cache keyEventual (stale window = TTL)Most read-heavy systems (default)
Write-ThroughApp reads from cacheApp writes cache + DB synchronouslyStrong (always in sync)Data that must never be stale
Write-BehindApp reads from cacheApp writes cache; async flush to DBWeak (data loss risk)Write-heavy, loss-tolerant metrics
Read-ThroughApp reads cache (auto-loads)Varies (combine with write-through)Depends on write strategySimplified application code
Interview Tip: Default to Cache-Aside

'I implement cache-aside with Redis. On read: check Redis, on miss query PostgreSQL and populate Redis with a 1-hour TTL. On write: update PostgreSQL, then delete the Redis key so the next read gets fresh data.'

Concept 3

Cache Invalidation & Eviction Policies

03

Cache Invalidation: The Hardest Problem

Phil Karlton famously said: "There are only two hard things in Computer Science: cache invalidation and naming things." Cache invalidation is hard because you must decide when cached data is stale and needs to be refreshed. Too aggressive (short TTL) and you lose the caching benefit. Too lax (long TTL) and users see outdated data.

TTL-Based Expiry: Set a Time-To-Live (TTL) on each cached key. After the TTL expires, the key is automatically deleted, and the next read triggers a cache miss that repopulates from the database. This is the simplest and most common strategy. Choose TTL based on how stale the data can tolerate: product prices (5 minutes), user profiles (1 hour), static content (24 hours).

Event-Based Invalidation: When the database changes, explicitly delete the corresponding cache key. The next read triggers a cache miss and repopulates with fresh data. This is more accurate than TTL-based expiry because the cache is invalidated immediately when data changes, not after an arbitrary timeout. The challenge is ensuring every write path knows which cache keys to invalidate.

Cache invalidation strategies — TTL-based vs event-based
Figure 4: Cache invalidation — TTL-based expiry (simple, predictable) vs event-based invalidation (accurate, immediate)
The Cache Stampede Problem

When a popular cache key expires, hundreds or thousands of concurrent requests suddenly find an empty cache and all query the database simultaneously. This can overwhelm the database, causing a cascade of failures.

Solutions: use a lock so only one request repopulates the cache while others wait (request coalescing / lock-and-load), add a random jitter to TTLs so keys do not expire simultaneously, or use a "stale-while-revalidate" approach where the cache serves the stale value while one request fetches fresh data in the background.

Production example: A major e-commerce site had a product page cached with TTL=300s. During a flash sale, the key expired and 50,000 users hit the uncached page simultaneously. All 50,000 queries hit the database, which crashed, taking down the entire site.

04

Eviction Policies

When a cache is full and a new key needs to be stored, the cache must evict (remove) an existing key. The eviction policy determines which key gets removed. Choosing the right policy has a significant impact on cache hit rate.

Cache eviction policies — LRU, LFU, FIFO, Random
Figure 5: Eviction policies — LRU (least recently used), LFU (least frequently used), FIFO, and random eviction compared
PolicyEvictsBest ForWeaknessRedis Config
LRUOldest accessMost web workloadsCan evict popular but temporarily idle keysallkeys-lru
LFUFewest total accessesStable hot datasetsNew items start cold (need time to build count)allkeys-lfu
FIFOOldest insertionSimple, predictableIgnores access patterns entirelyN/A (manual)
RandomRandom keyMassive scale, low overheadUnpredictable, may evict hot keysallkeys-random
TTL-basedKeys closest to expiryMixed TTL workloadsOnly evicts keys with TTL setvolatile-ttl
No evictionNothing (returns error)When data loss is unacceptableCache becomes useless when fullnoeviction
Interview Tip: Redis Configuration

'I configure Redis with maxmemory 16GB and allkeys-lru eviction policy. I also set TTLs on all keys: 1 hour for user data, 24 hours for product data.' Never use noeviction (the default) in production — it causes Redis to return errors when full instead of gracefully evicting old data.

Concept 4

Caching Layers in a System

05

Six Layers of Caching

A production system uses multiple layers, each catching a percentage of requests and reducing the load on layers below. Together they achieve ~98% cache hit rate.

Six layers of caching in a production system
Figure 6: Six caching layers — browser → CDN → reverse proxy → in-process → Redis → database buffer pool. Each layer catches a portion of requests.
  • Layer 1: Browser Cache (~50% of requests) — Controlled via HTTP Cache-Control headers. Cache-Control: max-age=3600 stores the response locally for 1 hour. Zero server resources required.
  • Layer 2: CDN Edge (~25% of remaining) — CDN edge servers (Cloudflare, CloudFront) cache responses at 300+ global locations, returning responses in 1–5ms without reaching your origin server.
  • Layer 3: Reverse Proxy Cache (~10%) — Nginx or Varnish cache entire HTTP responses at the reverse proxy level. Useful for API responses that are the same for many users.
  • Layer 4: Application In-Process Cache (~5%) — A small LRU cache within the application process itself (Python dict, Java ConcurrentHashMap, Go sync.Map). Extremely fast (<1ms, no network hop) but not shared across instances.
  • Layer 5: Distributed Cache – Redis/Memcached (~8%) — Shared, distributed cache accessible by all application instances. Sub-millisecond latency. Primary application cache for most systems.
  • Layer 6: Database Buffer Pool (~2%) — The database itself caches recently accessed data pages in its buffer pool (PostgreSQL's shared_buffers, MySQL's InnoDB buffer pool). Automatic, no application changes required.
Facebook's Caching Architecture

Over 1,000 Memcached servers caching hundreds of terabytes of data. Overall cache hit rate exceeds 99%, meaning less than 1% of read requests ever reach a database.

Concept 5

Content Delivery Networks (CDN)

06

CDN: The Global Caching Layer

A CDN is a geographically distributed network of edge servers that caches content close to users. CDNs reduce latency by 80–90%, offload 60–80% of traffic from your origin, and provide DDoS protection.

CDN architecture — edge servers distributed globally serving users
Figure 7: CDN edge servers distributed globally — users are served from the nearest edge, reducing latency from 200ms to 5ms

What CDNs Cache:

  • Static assets (images, CSS, JS, fonts, videos) — long TTLs (24 hours to 1 year), use fingerprinted filenames for cache busting.
  • Dynamic API responses (product listings, trending content, public feeds) — shorter TTLs (1–60 minutes).
  • Full HTML pages — for content-heavy sites; eliminates all server-side processing. Edge-side Includes (ESI) allow caching templates with dynamic fragments.
07

Pull CDN vs Push CDN

Pull CDN vs Push CDN — auto-fetch vs pre-positioned content
Figure 8: Pull CDN (fetches from origin on first request) vs Push CDN (content pre-positioned at edges before requests arrive)
AspectPull CDNPush CDN
How it worksEdge fetches from origin on first requestYou upload content to edges proactively
First requestSlow (cache miss, fetches from origin)Fast (content pre-positioned)
Setup effortMinimal (just point DNS to CDN)More work (upload pipeline needed)
Storage costOnly caches what's requestedCaches everything you upload
Best forWeb apps, APIs, dynamic contentVideo streaming, large file downloads
InvalidationPurge API or wait for TTLRe-upload new version
ExamplesCloudflare, CloudFront (default)Netflix Open Connect, Akamai
Netflix's Push CDN – Open Connect

Netflix pre-positions content to edge servers inside ISP networks worldwide before a new show launches. When you press Play, the video streams from a server inside your ISP's network — not from Netflix's data center. This is why Netflix can serve 250 million subscribers without their own global server infrastructure.

Concept 6

HTTP Cache Headers

HTTP cache headers — Cache-Control, ETag, Last-Modified
Figure 9: HTTP cache headers — how browsers and CDNs decide what to cache, for how long, and when to revalidate
HeaderWhat It DoesExampleUse Case
Cache-Control: max-age=NCache for N seconds (browser + CDN)max-age=3600Static assets, stable API responses
Cache-Control: s-maxage=NCDN-only cache time (overrides max-age)s-maxage=600Different TTL for CDN vs browser
Cache-Control: no-cacheCache but revalidate before servingno-cacheDynamic content, must-be-fresh
Cache-Control: no-storeNever cache this responseno-storePasswords, payment data, PII
Cache-Control: privateOnly browser can cache, not CDNprivate, max-age=300User-specific data, dashboards
ETag / Last-ModifiedContent fingerprint for revalidationETag: "abc123"304 Not Modified (save bandwidth)
Common Caching Mistake

Cache-Control: no-cache does NOT mean "do not cache" — it means "cache but revalidate." To truly prevent caching, use Cache-Control: no-store. This is one of the most common interview pitfalls.

Interview Tip: Cache Headers Strategy

'Static assets get Cache-Control: public, max-age=31536000 with fingerprinted filenames. API responses for product listings get Cache-Control: public, s-maxage=300. User-specific data gets Cache-Control: private, no-store.'

Caching decision tree — how to choose the right caching strategy
Figure 10: Caching decision tree — use this framework in interviews to systematically choose the right pattern, TTL, and eviction policy

Pre-Class Summary

Caching Cheat Sheet

Cache Patterns: Cache-aside (lazy loading) is the default for 90% of systems. Write-through for consistency-critical data. Write-behind for write-heavy metrics.

Invalidation: TTL-based is simplest. Event-based is most accurate. Watch out for cache stampede on popular key expiry — use jitter and lock-based refill.

Eviction: LRU is the best default. LFU for stable hot datasets. Always configure Redis maxmemory and eviction policy. Never use noeviction (default) in production.

Cache Layers: Browser → CDN → reverse proxy → in-process → Redis/Memcached → database buffer pool. Together they achieve 98%+ cache hit rate.

CDN: Pull CDNs (Cloudflare, CloudFront) auto-fetch from origin. Push CDNs (Netflix Open Connect) pre-position content. Always use no-store for sensitive data.

Data TypeWhere to CacheTTLPattern
Static assets (JS, CSS, images)CDN + Browser1 year (fingerprinted)Cache-Control: immutable
Product catalogCDN + Redis1–24 hoursCache-aside, event invalidation
User profilesRedis30–60 minutesCache-aside, delete on update
Shopping cartRedis (persistent)No TTL (session-bound)Direct Redis read/write
Session tokensRedis30 minutes (extend on activity)Direct Redis read/write
Rate limit countersRedisFixed window (1 minute)Atomic INCR + TTL
Search resultsCDN + Redis5–15 minutesCache-aside, short TTL
Real-time inventoryRedis10 seconds (or pub/sub)Short TTL, verify at checkout
Passwords / payment dataNEVER CACHEN/ACache-Control: no-store

Track Your DSA Progress — It's Free

Stop solving random questions. Start with the right 206 questions across 16 patterns — structured, curated, and completely free.

206 curated questions 16 patterns covered Google login · Free forever
Create Free Account →