An Idiot’s Explanation Of The CAP Theorem

· 598 words · 3 minute read

The CAP theorem is a beautiful thing. However, I initially misunderstood it and believe other people might have as well. After surfing the web for a while I could not find a source that could easily explain to the common man (i.e. mere mortals like myself) how the CAP theorem should be applied in practice. Therefore, here is an attempt at doing so.

CAP Review 🔗

If you google CAP theorem you will get great explanations of what it is (I personally like the wikipedia one). Nonetheless, it essentially boils down to the following statement: if a system is distributed, you can expect operations on it to be either consistent or available, but not both.

The main reason why I personally misunderstood the theorem was because I thought that I should try and categorize distributed systems into CAP corners. Meaning that I would label systems as either CA, CP or AP.

In practice, the theorem should be applied per operation. A distributed system is actually capable of displaying all CAP properties, just not at the same time. I’ll repeat myself, a distributed system is actually capable of displaying all CAP properties, just not at the same time. Therefore, categorizing a system into a CAP corner is fundamentally wrong.

A Practical Explanation 🔗

CA 🔗

CA means that you want data in your system to be consistent and available. This signifies that partition tolerance should be forgone. An example of this in practice is having your application being located alongside its master databases. You can then have your application read and write to these databases and enjoy both the consistency and availability perks.

CP 🔗

CP means that you want consistent data at all costs. As such if there is a network partition we’ll rather wait and hence forgo availability. The only reason why someone would choose CP over CA is because we would like our application to be distributed across multiple locations.

The following are 3 examples of the CP corner in practice:

  1. All reads go to the master node no matter what or where you are. If there is a network partition, a read will not be possible until you can talk to the master node again

  2. Wait for the master node to replicate the latest data to a slave before performing a read. In this scenario, the read can hang waiting for replication to catch up

  3. Make the writes hang until data is copied to the relevant slaves. This means writes will wait in the event of a partition

In all the scenarios presented above a network partition means that the system will be stalled. In addition, when there are no partitions happening, the system will still suffer from the burden of round-trip latency across different locations (data centers).

AP 🔗

AP essentially means that you can tolerate eventual consistency or that you do not require data to be up to date. As such, you can stand up read-only nodes in all places where your application is and just read from them. If there are network partitions data will not be up to date but you will still be able to serve traffic and do it fast.

Conclusion 🔗

The behavior of the distributed systems your application interacts with might sway to a CAP corner (some might favor CA over CP, others CP over CA, you get the point). Regardless, it is possible for distributed systems to support all CAP modes of operation. Therefore, the best way to evaluate a system is to ask what operations fall into which CAP corner and see if such a system can serve your application’s needs.