by bdarnell on 6/27/17, 3:09 PM with 44 comments
by wwilson on 6/27/17, 5:53 PM
I'll just come out and say it: the 'A' in CAP is boring. It does not mean what you think it means. Lynch et al. probably chose the definition because it's one for which the 'theorem' is both true and easy to prove. This is not the impossibility result with which designers of distributed systems should be most concerned.
My heuristic these days is that worrying about the CAP theorem is a weak negative signal. (EDIT: This is not a statement about CockroachDB's post, which doubtless is designed to reassure customers who are misinformed on the topic. I'm familiar with that situation, and it makes me feel a deep sympathy for them.)
(Disclosure: I work on a CockroachDB competitor. Also none of this is Google's official position, etc., etc. For that, here's the whitepaper by Eric Brewer that we released along with the Cloud Spanner beta launch https://static.googleusercontent.com/media/research.google.c...).
by thraxil on 6/27/17, 5:06 PM
One of the best papers I've come across in the last few years.
by Dave_Rosenthal on 6/27/17, 4:28 PM
I think the overloaded term "availability" has been a big source of confusion for many trying to understand the implications of the CAP theorem at a simple level.
For example, a simple PAXOS implementation is "high availability" (continues working even when individual machines fail) but sacrifices "availability" in the CAP sense.
by itcmcgrath on 6/27/17, 4:22 PM
* I've reviewed ~400 databases over the last month and it's surprising (?) how many of them are all the best of every use case and are the [fastest|first|only|best]
by ainar-g on 6/27/17, 4:22 PM
by YZF on 6/27/17, 5:13 PM
The more interesting trade-off is using consensus algorithms for availability and durability. You can keep going as long as you have a quorum of nodes but you pay an extra rtt (at least). Having multiple replicas (in either consistent or eventually consistent systems) costs in linearly more expensive writes and storage (typically, unless you use some sort of erasure coding.)
by falcolas on 6/27/17, 5:03 PM
So, what happens to readers who are partitioned away from the node which holds that data? Can they not read the data for that lease duration? If they can't, then yeah, CP is a good description.
...
So the design doc seems to hold this up - reads must go to the lease holder, until the lease expires. Nice.
EDIT: Design doc link:
https://github.com/cockroachdb/cockroach/blob/master/docs/de...
by zimbatm on 6/27/17, 6:17 PM
* CP is a database
* AP is a cache
Anyone else pretending AP is a database is lying (unless it's a content-addressable store) :p
by marknadal on 6/27/17, 7:50 PM
"The only time that a CAP-Available system would be available when a CAP-Consistent one would not is when one of the datacenters can’t talk to the other replicas, but can talk to clients, and the load balancer keeps sending it traffic. By considering the deployment as a whole, high availability can be achieved without the CAP theorem’s requirement of responses from a single partitioned node."
It is true that if you assume your client app is not important that a CP system is the right choice. And I would also say this /was/ true up till about 2004 when Gmail was released. But it definitely stopped being true in 2007 when the iPhone was released and you started having installed apps.
Since then, users have slowly grown to expect both mobile apps and SPAs to work regardless of whether the servers work, regardless of load balances, regardless of connectivity.
If you look at the market trends, things are increasingly going in this direction. From self-driving cars, to IoT devices, to drone delivery, to even traditionally server-dependent productivity tools like gDocs and others - people need to get work done even if the internet to your server doesn't exist.
Will banking applications still need mostly server-dependent behavior? Yes. Is CP still important? Yes. But it is biased to say that CP systems are better. Choose the right tool for the right job. CockroachDB and RethinkDB are definitely the right choice for a strongly consistent database, but they aren't the right choice for everything. My database is an AP system, but it should not be used for many apps out there. Neither of these are "better", they are just tradeoffs you have to decide upon.