CAP theorem is one of the most cited — and most misunderstood — concepts in system design interviews. Candidates recite it like a mantra but cannot use it to make an actual design decision. This post fixes that.
By the end, you will be able to explain exactly what CAP theorem says, where it is misapplied, which real-world databases sit on which side of the trade-off, and — most importantly — how to use it fluently when an interviewer asks you to design a distributed system at Google, Amazon, or any product company.
What Is the CAP Theorem? (And Why Every Engineer Gets It Wrong)
The CAP theorem was proposed by computer scientist Eric Brewer in 2000 and formally proved by Gilbert and Lynch in 2002. It states:
Here is where most engineers go wrong: they treat it as a free choice among three options. It is not. In any real distributed system running over a network, partition tolerance is not optional. Networks drop packets. Nodes go unreachable. Your switches fail. The question is never "do I want partition tolerance?" — the question is always "when a partition happens, what do I sacrifice: consistency or availability?"
CAP does not mean you make this trade-off once at design time and forget about it. Modern systems like DynamoDB and Cassandra let you tune it per operation, choosing strong or eventual consistency based on what each use case demands. Understanding this nuance is what separates a good system design answer from a great one.
Breaking Down Each Property: Consistency, Availability, Partition Tolerance
Let's define each property precisely, because interviewers will probe the definitions themselves.
C — Consistency (Linearizability)
Every read receives the most recent write, or an error. All nodes in the distributed system see the same data at the same time. If you write a value to Node A, any subsequent read from Node B must return that updated value — not a stale one.
This is not the same as ACID consistency in relational databases. CAP consistency is specifically about linearizability — a much stricter, operation-level ordering guarantee across distributed nodes.
A — Availability
Every request receives a response — not an error — though the response may contain stale data. An available system guarantees that no request is ever left hanging indefinitely. The system stays operational and responds, even if some nodes are down.
Crucially, "availability" in the CAP theorem does not mean "high availability" in the SRE sense (five nines, etc.). It means specifically that every non-failing node returns a response to every request within a reasonable time.
P — Partition Tolerance
The system continues operating even when arbitrary network messages are dropped or delayed between nodes — i.e., when a network partition splits the cluster into two groups that cannot communicate.
As stated above, you cannot build a distributed system that is not partition tolerant. If you assume the network is perfect, you are not building a distributed system — you are building a local one.
Why You Can Only Choose Two (The Real Explanation)
Here is a concrete scenario to make this concrete. You have two database nodes, Node A and Node B, replicating data between each other. A network partition occurs — they cannot communicate.
A write comes in to Node A. Now you have a decision to make:
- Option 1 — Prioritize Consistency (CP): Refuse the write (or return an error) until Node B is reachable and you can guarantee both nodes agree. The system is consistent but temporarily unavailable.
- Option 2 — Prioritize Availability (AP): Accept the write to Node A, serve reads from both nodes even though Node B has stale data. The system is available but inconsistent — until the partition heals and nodes sync.
There is no third option. You cannot simultaneously tell Node A to accept the write and guarantee Node B also reflects it immediately when they cannot communicate. This is not a software limitation — it is a fundamental constraint of distributed computing, as proven by the FLP impossibility result and CAP's formal proof.
CP vs AP Systems: Real-World Examples from Companies
Memorising which database sits in which category is valuable, but understanding why matters far more. Here is the definitive breakdown:
| System | Type | Why |
|---|---|---|
| Apache Cassandra | AP | Uses eventual consistency by default. Writes are accepted on any available node. During a partition, nodes continue serving stale reads. Tunable consistency (ONE, QUORUM, ALL) lets you shift toward CP per query, but the default and design intent is availability. |
| Amazon DynamoDB | AP | Eventual consistency is the default read model. Strongly consistent reads are available on request but at higher latency and cost. DynamoDB's design prioritises being always-on over strict consistency. |
| Apache HBase | CP | Built on HDFS and ZooKeeper. Guarantees strong consistency — a write must be confirmed by the master before it is acknowledged. During a partition, HBase prefers to become unavailable rather than serve stale data. Used at Facebook for messages where row-level consistency was critical. |
| Apache ZooKeeper | CP | ZooKeeper is a coordination service (distributed lock, leader election, config management). Correctness is everything — if two clients see different leaders simultaneously, the entire cluster breaks. ZooKeeper uses a Paxos-based protocol (ZAB) and will reject reads/writes if a quorum is not reachable. It chooses correctness over availability. |
| etcd | CP | The key-value store backing Kubernetes. Uses the Raft consensus algorithm. Like ZooKeeper, it requires a quorum to make progress. Kubernetes cannot afford split-brain scenarios where two nodes think they are the leader simultaneously. |
| MongoDB | CP | In its default configuration with a replica set and majority write concern, MongoDB is CP. Primary election via a consensus protocol ensures linearisable reads from the primary. Reading from secondaries introduces eventual consistency, making it effectively tunable. |
| CouchDB / CouchBase | AP | CouchDB uses multi-version concurrency control (MVCC) and allows conflicting writes that are resolved later using deterministic merge rules. Designed for offline-first scenarios and mobile sync — availability is paramount. |
| Redis (Cluster mode) | AP | Redis Cluster can accept writes on a primary and acknowledge before replication to replicas completes. During a partition, acknowledged writes can be lost if the primary fails. Redis prioritises low latency and availability over strict consistency. |
| Google Spanner | CP | Spanner is externally consistent (a stronger form of linearisability) using TrueTime — GPS and atomic clocks to bound clock uncertainty. It achieves high availability through massive hardware redundancy, but the consistency model is strict CP. |
| Traditional RDBMS (single node) | CA | A standalone PostgreSQL or MySQL instance has no network partitions, so it can offer both consistency and availability. The moment you add replication across nodes, you are in CAP trade-off territory. |
How Interviewers Actually Ask About CAP Theorem (With Model Answers)
CAP theorem questions in FAANG and product company interviews rarely come as a direct "explain CAP theorem" prompt. They are embedded in system design scenarios. Here are the patterns you will encounter and how to handle each.
Preparing for System Design Interviews?
Prepflix's System Design curriculum covers distributed systems fundamentals, real architecture case studies, and mock interview practice — taught by Pranjal Jain (ex-Microsoft, IIT Kanpur).
PACELC: The More Realistic Extension of CAP
In 2010, Daniel Abadi proposed PACELC as a more complete model. Here is what it says:
This is the insight CAP theorem misses: even without a partition — in steady-state operation — there is a trade-off between latency and consistency. To serve a strongly consistent read, you often need to contact multiple nodes or wait for acknowledgements. That costs time. To serve a low-latency read, you read from the nearest replica, which may be slightly stale.
Real-world PACELC classifications:
- Cassandra: PA/EL — Chooses availability during partitions, and latency (eventual consistency) in normal operation.
- HBase: PC/EC — Chooses consistency during partitions, and consistency (at the cost of latency) in normal operation.
- DynamoDB: PA/EL (default) or PC/EC (with strong consistency) — Tunable per operation.
- Google Spanner: PC/EC — Always consistent. Achieves low latency through sheer infrastructure investment (co-located data centres, TrueTime).
- MySQL with async replication: PA/EL — Primary accepts writes without waiting for replica sync. Replicas may lag.
Mentioning PACELC in an interview signals that you think in terms of real operating conditions, not just theoretical failure modes. Use it when discussing latency requirements alongside consistency requirements.
CAP Theorem in Practice: How to Apply It in a System Design Interview
Here is a concrete decision framework for applying CAP reasoning in your answers. Walk through these steps out loud — interviewers want to see your thinking process.
Step 1: Identify the data and its access pattern
What kind of data are you storing? Who reads it, who writes it, and how often? A user's bank balance has different requirements from a user's "last seen" timestamp. The data type largely determines the consistency requirement.
Step 2: Ask "what is the cost of stale data?"
This is the real question behind the CP vs AP decision. If stale data leads to monetary loss, double-booking, or data corruption — you need CP. If stale data just means a slightly delayed feed refresh or an old profile photo — AP is fine and gives you better performance.
Examples by domain:
- Needs CP: Financial transactions, inventory reservations (hotel/flight booking), distributed locking, leader election, user authentication tokens.
- Fine with AP: Social media feeds, product recommendations, view/like counts, user activity logs, DNS records, CDN cached content, shopping cart (in most e-commerce contexts).
Step 3: State the trade-off explicitly
Do not just say "I'll use Cassandra." Say: "I'll use Cassandra here because this is an AP use case — the cost of occasionally stale data is low, and we need to handle high write throughput globally with minimal downtime. If there is a partition, I prefer that users get a slightly stale response over getting an error."
Step 4: Acknowledge the edge cases
Real systems are not purely CP or AP. Show this awareness. Cassandra can be made strongly consistent for critical operations. MongoDB can read from secondaries for lower latency. DynamoDB offers strongly consistent reads. Acknowledge this configurability — it shows real-world system design thinking.
Step 5: Bring in PACELC if latency is a requirement
If the system design has a latency SLA (p99 < 100ms, for example), use PACELC to show that even without partitions, there is a consistency vs latency trade-off you are making. This demonstrates you think beyond failure modes to steady-state performance.
Common Mistakes Candidates Make When Discussing CAP Theorem
After coaching hundreds of engineers through system design interviews, here are the mistakes I see most often:
Mistake 1: Treating partition tolerance as optional
Saying "I want a CA system" in a distributed context is a red flag. It tells the interviewer you do not understand that network partitions are inevitable. Always assume partitions will happen. The choice is CP or AP, not CA vs CP vs AP.
Mistake 2: Confusing CAP consistency with ACID consistency
ACID consistency is about database invariants — foreign keys, constraints, transactions leaving the database in a valid state. CAP consistency (linearisability) is about all nodes seeing the same data at the same time. These are completely different properties. Mixing them up in an interview signals surface-level knowledge.
Mistake 3: Applying CAP to non-partitioned subsystems
CAP applies to distributed systems with network communication between nodes. It does not apply to a single-node Redis instance, a local in-memory cache, or a single PostgreSQL database. Applying CAP reasoning to these systems is a category error.
Mistake 4: Picking CP "because correctness is always more important"
This is a lazy default. CP means your system becomes unavailable during partitions. For a high-traffic social media application, that trade-off is catastrophic — you would rather serve slightly stale data than return errors to millions of users. The right answer always depends on the specific business context and the cost of each failure mode.
Mistake 5: Not knowing the internals of why a system is CP or AP
Knowing that "Cassandra is AP" is table stakes. Knowing why — leaderless replication, gossip protocol, tunable consistency levels, last-write-wins conflict resolution — is what earns respect in a senior engineering interview. Always be able to go one level deeper than the label.
Mistake 6: Never mentioning PACELC
Sticking only to CAP makes it sound like you learned the concept from a YouTube summary. PACELC extends CAP to normal operating conditions and shows you understand that distributed systems design is about steady-state trade-offs, not just partition recovery.
- Partitions will happen — CP or AP, not CA.
- The real question: what is the cost of stale data vs the cost of unavailability?
- Most real systems are tunable — know how.
- Use PACELC to discuss latency vs consistency in steady state.
- Go one level deeper than labels: know the protocols (Raft, ZAB, gossip, quorum).
Get Interview-Ready with Prepflix
This is the level of depth that clears FAANG and top product company interviews. Prepflix's program covers System Design and DSA end-to-end, with real mock interviews and feedback — taught by Pranjal Jain (IIT Kanpur, ex-Microsoft, ex-Samsung).