Back to Tracker

AI System Design & Interview Prep

ML System Design · Recommendation Systems · ML Theory Q&A · Interview Strategy

Interview Module3 Weeks8 LessonsPrepflix AI Roadmap
ML System Design Framework

6-Step Design Framework

  1. Clarify requirements — scale, latency, accuracy tradeoff, constraints
  2. Define the ML objective — what are we optimizing? (clicks, revenue, safety)
  3. Data — sources, collection, labeling, volume, freshness
  4. Feature engineering — what signals matter? How to compute them?
  5. Model selection — simple baseline → complexity justified by gains
  6. Serving & monitoring — latency, throughput, drift detection, A/B testing

Key Design Tradeoffs

Precision vs RecallContext-dependent (fraud: high recall, ads: high precision)
Latency vs AccuracySimple model online + complex model offline
Real-time vs BatchStreaming features are expensive, use wisely
Freshness vs CostHow stale can features/model be?
Exploration vs ExploitationMulti-armed bandit or ε-greedy for recommendations
The interviewer wants to see you think about business impact, not just model accuracy. Always tie technical choices back to the business metric.
Recommendation System Design

Architecture (Netflix/YouTube style)

  1. Candidate Generation — narrow 10M+ items to ~1000 (recall over precision)
  2. Ranking — score 1000 candidates with rich features (ML model)
  3. Re-ranking / Business rules — diversity, freshness, safety filters
Collaborative FilteringMatrix factorization — similar users, similar items
Content-basedItem features — metadata, embeddings
Two-tower modelSeparate user/item encoders, dot product similarity
ANN lookupFAISS/ScaNN for fast nearest-neighbor at scale

Key Challenges

  • Cold start: New users/items have no history → use content-based fallback or popular items
  • Data sparsity: Most user-item pairs unobserved → implicit feedback (watch time > explicit ratings)
  • Position bias: Users click top results regardless of quality → correct with inverse propensity scoring
  • Feedback loop: Model only learns from what it already recommends → add exploration
  • Popularity bias: Popular items crowd out niche ones → add diversity penalty
Proxy metric vs business metric: CTR is easy to optimize but doesn't equal revenue. Watch time ≠ satisfaction. Always connect your loss to the ultimate goal.
Search & Ranking Systems

Search Pipeline

  1. Query understanding (intent, entity extraction, spell correction)
  2. Document retrieval (BM25 + dense vector search = hybrid)
  3. Learning to Rank (LTR) — pointwise, pairwise, listwise
  4. Re-ranking with business rules
BM25Sparse keyword matching, fast, no embeddings
Dense RetrievalSemantic search via embeddings (FAISS)
Hybrid SearchBM25 + dense + weighted fusion (RRF)
Cross-encoderSlow but accurate reranker (BERT query+doc)

Learning to Rank Losses

Pointwise

Predict relevance score for each doc independently. MSE/logistic loss. Simple but ignores ranking structure.

Pairwise

Predict which of two docs is more relevant. RankNet, LambdaRank. Better but O(n²) pairs.

Listwise (LambdaMART)

Optimize ranking metrics directly (NDCG, MAP). Best results but complex. Used by Microsoft, Yahoo.

NDCG@K = DCG@K / IDCG@K where DCG@K = Σ (2^rel_i − 1) / log₂(i+1)
Top ML Theory Interview Questions
Q: Why does L1 produce sparse weights but L2 doesn't?
L1's gradient is constant (±λ), so small weights get zeroed out. L2's gradient scales with weight size (2λw), so small weights just get shrunk, never to exactly 0.
Q: What's the difference between generative and discriminative models?
Generative models learn P(X,Y) = P(X|Y)P(Y) (e.g., Naive Bayes, GANs). Discriminative models learn P(Y|X) directly (e.g., Logistic Regression, SVM, Neural Nets). Discriminative usually has better accuracy; generative can generate new samples.
Q: Explain vanishing gradients and how to fix them.
In deep networks, gradients get multiplied through many layers. With sigmoid/tanh (max derivative <1), gradients approach 0 and early layers don't learn. Fix: ReLU activations, residual connections, batch norm, gradient clipping.
Q: Why is cross-entropy better than MSE for classification?
MSE with sigmoid has flat gradient when output is near 0 or 1 (saturated), causing slow learning. Cross-entropy's gradient = (ŷ − y), which is always proportional to the error. Also, MSE is not a proper scoring rule for probabilities.
Q: What is the kernel trick in SVMs?
Computing K(x,x') = φ(x)·φ(x') without explicitly computing the (potentially infinite-dimensional) feature map φ(x). The algorithm only needs inner products, not the actual coordinates. This enables non-linear classification at the cost of a kernel computation.
Q: How does attention in Transformers differ from RNN memory?
RNNs compress the entire past into a fixed-size hidden state (lossy, sequential). Attention accesses all past positions directly with learned weights, with O(1) path length between any two positions. No sequential dependency = parallelizable training.
Mock Interview Strategy

Behavioral Questions (STAR)

  • Situation — set the context briefly (1-2 sentences)
  • Task — your specific responsibility
  • Action — what YOU did (not the team)
  • Result — quantify impact (X% improvement, $Y saved)

Common ML Engineer Behavioral Q's

  • "Tell me about a model you built from scratch"
  • "Describe a time your model failed in production"
  • "How did you handle a dataset with severe class imbalance?"
  • "Tell me about a time you disagreed with stakeholders on metrics"

ML System Design Interview Tips

  • Always start by asking clarifying questions
  • Propose a simple baseline before complex models
  • Explicitly discuss data collection and labeling challenges
  • Show awareness of production: latency, throughput, monitoring
  • Discuss failure modes and how to detect them
  • End with A/B testing strategy
Red Flags to Avoid: Jumping to neural networks without justification. Ignoring data quality issues. Not discussing class imbalance. Forgetting about online vs batch serving tradeoffs. Skipping monitoring/drift detection.
The Rule of Three: For any ML system design, have answers ready for: (1) how to evaluate offline, (2) how to evaluate online (A/B), and (3) how to monitor in production. Interviewers always ask all three.