from Hacker News

The part of Postgres we hate the most: Multi-version concurrency control

by andrenotgiant on 4/26/23, 5:10 PM with 143 comments

by mmaunder on 4/26/23, 7:06 PM
I must admit as a web practitioner since 1994 I have a bit of an issue with this:
> In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons!
Different DB's, different strengths and it's not a zero sum came as implied. MySQL was popular before Google was born - we used it heavily at eToys in the 90s for massive transaction volume and replacing it with Oracle was one of the reasons for the catastrophic failure of eToys circa 2001. MongoDB gained traction not because it's an alternative to MySQL or PostgreSQL. And PostgreSQL's marketshare today is on a par with Mongo and both are dwarfed by MySQL which IMO is the true darling of web DB's given it's global popularity.
by fdr on 4/26/23, 9:53 PM
One of the weird things about Postgres MVCC is that it is "optimized for rollback," as one person memorably quipped to me. This is not to imply a design principle, it's more a description of how things ended up, and the general argument behind this quip is Postgres lacks "UNDO" segments.
On the one hand, this does make the model Postgres uses admirably simple: the WAL is all "REDO," and the heap is all you need to accomplish any kind of read, but at the expense that stuff that normally would be copied off to a sequential UNDO log and then vaporized when the transaction commits and all possible readers have exited remains comingled with everything else in the main database heap, needing to be fished out again by VACUUM for purging and figuring out how to reclaim numerical space for more transactions.
There may be other solutions to this, but it's one unusual quality Postgres has relative to other MVCC databases, many of which sport an UNDO log.
There are downsides to UNDO, however: if a read needs an old copy of the tuple, it needs to fish around in UNDO, all the indices and synchronization need to account for this, and if there's a rollback or crash recovery event (i.e. mass-rollback of all transactions open at the time), everything has to be shuffled back into the main database storage. Hence the memorable initial comment: "Postgres is optimized for rollback."
by cyberax on 4/26/23, 8:57 PM
Yup. A lot of heavy users of Postgres eventually hit the same barrier. Here's another take from Uber: https://www.uber.com/blog/postgres-to-mysql-migration/
I had a similar personal experience. In my previous job we used Postgres to implement a task queuing system, and it created a major bottleneck, resulting in tons of concurrency failures and bloat.
And most dangerously, the system failed catastrophically under load. As the load increased, most transactions ended up in concurrent failures, so very little actual work got committed. This increased the amount of outstanding tasks, resulting in even higher rate of concurrent failures.
And this can happen suddenly, one moment the system behaves well, with tasks being processed at a good rate, and the next moment the queue blows up and nothing works.
I re-implemented this system using pessimistic locking, and it turned out to work much better. Even under very high load, the system could still make forward progress.
The downside was having to make sure that no deadlocks can happen.
by elijaht on 4/26/23, 9:16 PM
> Another problem with the autovacuum in PostgreSQL is that it may get blocked by long-running transactions, which can result in the accumulation of more dead tuples and stale statistics. Failing to clean expired versions in a timely manner leads to numerous performance problems, causing more long-running transactions that block the autovacuum process. It becomes a vicious cycle, requiring humans to intervene manually by killing long-running transactions.
Oh man, a previous company I worked at had an issue with a hot table (frequent reads + writes) interfering with autovacuum. Many fires over a six month period arose from all of that. I was (luckily) only on an adjacent team, so I don't know the details, other than vacuums taking over 24 hours! I'm sure it could have been prevented, but it seemed horrible to debug
by dale_glass on 4/26/23, 8:25 PM
That's interesting, MVCC was the thing that drew me to Postgres to begin with!
Way back I was working on an in-house inventory app written in Visual Basic against SQL Server 2000, I think. That one just put locks on tables. It had the "charming" characteristic of that if you weren't very, very careful with Enterprise Manager, loading a table in the GUI put a lock on it and just keep on holding it until that window was closed.
Then the running app would eventually snag on that lock, maybe keep holding some other lock that something else would snag on, and 5 minutes later I'd hear one of the operators screaming "Nothing is working! I can't take any orders!" from the room next to me.
by garyclarke27 on 4/26/23, 6:41 PM
Clever Clickbait - Of course at the end of the article they offer a solution - their product (and of course it’s AI enhanced) to the problem they have overhyped.
by spprashant on 4/26/23, 6:52 PM
This post has a valid point. But the last line makes it clear why they care so much about it.
Yeah, table bloat and transaction ID wraparounds are terrible, but easily avoidable if you follow a few simple guidelines. Typically in my experience, best way to avoid these issues are to set sensible vacuum settings and track long running queries.
I do hate the some of the defaults in the Postgres configuration are too conservative for most workloads.
by Max-Ganz-II on 4/26/23, 6:55 PM
MVCC for Amazon Redshift;
(pdf) https://www.redshiftresearchproject.org/white_papers/downloa...
(html) https://www.redshiftresearchproject.org/white_papers/downloa...
I've been told, very kindly, by a couple of people that it's the best explanation they've ever seen. I'd like to get more eyes on it, to pick up any mistakes, and it might be useful in and of itself anyway to reader, as MVCC on Redshift is I believe the same as MVCC was on Postgres before snapshot isolation.
by audioheavy on 4/26/23, 8:41 PM
My main takeaway from this article: as popular as Postgres and MySQL are, and understanding the legacy systems built for them, it will always require deep expertise and "black magic" to achieve enough performance and scale for hyper scale use cases. It justifies the (current) trend to have DB's built for distributed tx/writes/reads that you don't have to become a surgeon to scale. There are other DBs and DBaaS that, although not OSS, have solved this problem in a more cost-efficient way than having a team of surgeons.
by zinclozenge on 4/26/23, 9:23 PM
I guess the question is, which MVCC strategy would be the "right" one to pick for a modern relational database? The paper linked focuses on main memory databases, and being main memory allows you to do things you can't do when disk based.
by jasonhansel on 4/27/23, 3:23 AM
IMHO part of the issue is that Postgres was built on the assumption that snapshot isolation would be widely used. I don't think this has proven to be the case.
Snapshot isolation isn't as robust and straightforward as strict serializability, but it also isn't as performant as READ COMMITTED. It seems like the worst of both worlds.
by runlaszlorun on 4/27/23, 6:08 AM
As an aside, Andy Pavlo (one the authors here) has his CMU database course videos up on YouTube and they are tremendous. I’ve spent 2 decades developing web applications but am not exaggerating when I say that I’m 10x more knowledgable on databases having watched his courses during Covid.
by avinassh on 4/27/23, 5:05 AM
This was a fun read. But now I have a couple of questions
1. Since MySQL keeps delta to save storage costs, wouldn't read and writes slower because now I have to build the full version from the delta
2. On secondary indexes, they highlight the reads will be slower and also say:
> Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.
What are the other advantages they have to make reads faster?
Compared to MySQL, I remember reading that Postgres MVCC lets you alter the table without locking. Now I found out that MySQL also does not require locks. So, how are they doing?
Are there any similar posts which explain MySQL MVCC architecture?
by h1fra on 4/26/23, 8:20 PM
Once you figured out all the point in this article, it's a matter of fine tuning, can take some times but eventually it will works. The only thing I still struggle with is the Table Bloat.
On managed Postgres (i.e: gcp, aws) you pay for the disk, but when you can't run a VACUUM FULL because it locks the table, you end up with a lot of allocated storage for nothing and you can't shrink the disk size (at least on gcp). Storage is cheap but still feels like a waste.
by Ym9oYW4 on 4/26/23, 5:28 PM
So why Postgres chooses the worst MVCC design compared to MySQL and Oracle? Is this because of legacy reasons or other factors?
by no_wizard on 4/26/23, 6:21 PM
Can the MVCC implementation be swapped via Postgres extensions?
by vp8989 on 4/26/23, 7:58 PM
Am I correct in thinking that PG's MVCC implementation results in a worse story around offloading some mild OLAP workloads to a replica without affecting the primary? Anecdotally, it seems that MySQL handles this better but I don't understand the internals of both enough to explain why that is.
https://aws.amazon.com/blogs/database/manage-long-running-re...
by umairshahid on 5/1/23, 1:07 PM
Here is a rebuttal of many of the points raised by Uber against PostgreSQL: https://www.2ndquadrant.com/en/blog/thoughts-on-ubers-list-o...
by zeroimpl on 4/27/23, 2:03 AM
> Oracle and MySQL do not have this problem in their MVCC implementation because their secondary indexes do not store the physical addresses of new versions. Instead, they store a logical identifier (e.g., tuple id, primary key) that the DBMS then uses to look up the current version’s physical address.
This doesn’t have anything to do with MVCC. I’m sure PostgreSQL could implement an index format that piggybacks on another index rather than pointing at the physical page directly, without overhauling MVCC.
by Temporary_31337 on 4/27/23, 12:54 PM
I work for a db vendor and Acid compliance (also implemented with MVCC) is a big selling point. Yet, most use cases I later see don’t require such rigid controls on updates. This means customers are paying for this as transactionally consistent updates are more expensive than eventually consistent ones.
by kerblang on 4/26/23, 10:47 PM
Question: Why would I need more than one extra version of the same row? I would think that with transactional locking everybody else is waiting on the first update to commit before getting their own changes in, unless the db is somehow trying to lock columns-per-row instead of entire rows.
by jonsnow_dba on 4/27/23, 3:34 PM
We have MySQL/MariaDB in RDS and ever since we migrated MariaDB to 10.6.12 we get at least 1 table corruption per day. Only work around available is to restart database just like windows-95.