AI System Design & Interview Prep — Prepflix Cheat Sheet

ML System Design Framework

6-Step Design Framework

Clarify requirements — scale, latency, accuracy tradeoff, constraints
Define the ML objective — what are we optimizing? (clicks, revenue, safety)
Data — sources, collection, labeling, volume, freshness
Feature engineering — what signals matter? How to compute them?
Model selection — simple baseline → complexity justified by gains
Serving & monitoring — latency, throughput, drift detection, A/B testing

Key Design Tradeoffs

Precision vs RecallContext-dependent (fraud: high recall, ads: high precision)

Latency vs AccuracySimple model online + complex model offline

Real-time vs BatchStreaming features are expensive, use wisely

Freshness vs CostHow stale can features/model be?

Exploration vs ExploitationMulti-armed bandit or ε-greedy for recommendations

The interviewer wants to see you think about business impact, not just model accuracy. Always tie technical choices back to the business metric.

Recommendation System Design

Architecture (Netflix/YouTube style)

Candidate Generation — narrow 10M+ items to ~1000 (recall over precision)
Ranking — score 1000 candidates with rich features (ML model)
Re-ranking / Business rules — diversity, freshness, safety filters

Collaborative FilteringMatrix factorization — similar users, similar items

Content-basedItem features — metadata, embeddings

Two-tower modelSeparate user/item encoders, dot product similarity

ANN lookupFAISS/ScaNN for fast nearest-neighbor at scale

Key Challenges

Cold start: New users/items have no history → use content-based fallback or popular items
Data sparsity: Most user-item pairs unobserved → implicit feedback (watch time > explicit ratings)
Position bias: Users click top results regardless of quality → correct with inverse propensity scoring
Feedback loop: Model only learns from what it already recommends → add exploration
Popularity bias: Popular items crowd out niche ones → add diversity penalty

Proxy metric vs business metric: CTR is easy to optimize but doesn't equal revenue. Watch time ≠ satisfaction. Always connect your loss to the ultimate goal.

Search & Ranking Systems

Search Pipeline

Query understanding (intent, entity extraction, spell correction)
Document retrieval (BM25 + dense vector search = hybrid)
Learning to Rank (LTR) — pointwise, pairwise, listwise
Re-ranking with business rules

BM25Sparse keyword matching, fast, no embeddings

Dense RetrievalSemantic search via embeddings (FAISS)

Hybrid SearchBM25 + dense + weighted fusion (RRF)

Cross-encoderSlow but accurate reranker (BERT query+doc)

Learning to Rank Losses

Pointwise

Predict relevance score for each doc independently. MSE/logistic loss. Simple but ignores ranking structure.

Pairwise

Predict which of two docs is more relevant. RankNet, LambdaRank. Better but O(n²) pairs.

Listwise (LambdaMART)

Optimize ranking metrics directly (NDCG, MAP). Best results but complex. Used by Microsoft, Yahoo.

NDCG@K = DCG@K / IDCG@K where DCG@K = Σ (2^rel_i − 1) / log₂(i+1)

Behavioral Questions (STAR)

Situation — set the context briefly (1-2 sentences)
Task — your specific responsibility
Action — what YOU did (not the team)
Result — quantify impact (X% improvement, $Y saved)

Common ML Engineer Behavioral Q's

"Tell me about a model you built from scratch"
"Describe a time your model failed in production"
"How did you handle a dataset with severe class imbalance?"
"Tell me about a time you disagreed with stakeholders on metrics"

ML System Design Interview Tips

Always start by asking clarifying questions
Propose a simple baseline before complex models
Explicitly discuss data collection and labeling challenges
Show awareness of production: latency, throughput, monitoring
Discuss failure modes and how to detect them
End with A/B testing strategy

Red Flags to Avoid: Jumping to neural networks without justification. Ignoring data quality issues. Not discussing class imbalance. Forgetting about online vs batch serving tradeoffs. Skipping monitoring/drift detection.

The Rule of Three: For any ML system design, have answers ready for: (1) how to evaluate offline, (2) how to evaluate online (A/B), and (3) how to monitor in production. Interviewers always ask all three.