by andrenotgiant on 4/26/23, 5:10 PM with 143 comments
by mmaunder on 4/26/23, 7:06 PM
> In the 2000s, the conventional wisdom selected MySQL because rising tech stars like Google and Facebook were using it. Then in the 2010s, it was MongoDB because non-durable writes made it “webscale“. In the last five years, PostgreSQL has become the Internet’s darling DBMS. And for good reasons!
Different DB's, different strengths and it's not a zero sum came as implied. MySQL was popular before Google was born - we used it heavily at eToys in the 90s for massive transaction volume and replacing it with Oracle was one of the reasons for the catastrophic failure of eToys circa 2001. MongoDB gained traction not because it's an alternative to MySQL or PostgreSQL. And PostgreSQL's marketshare today is on a par with Mongo and both are dwarfed by MySQL which IMO is the true darling of web DB's given it's global popularity.
by fdr on 4/26/23, 9:53 PM
On the one hand, this does make the model Postgres uses admirably simple: the WAL is all "REDO," and the heap is all you need to accomplish any kind of read, but at the expense that stuff that normally would be copied off to a sequential UNDO log and then vaporized when the transaction commits and all possible readers have exited remains comingled with everything else in the main database heap, needing to be fished out again by VACUUM for purging and figuring out how to reclaim numerical space for more transactions.
There may be other solutions to this, but it's one unusual quality Postgres has relative to other MVCC databases, many of which sport an UNDO log.
There are downsides to UNDO, however: if a read needs an old copy of the tuple, it needs to fish around in UNDO, all the indices and synchronization need to account for this, and if there's a rollback or crash recovery event (i.e. mass-rollback of all transactions open at the time), everything has to be shuffled back into the main database storage. Hence the memorable initial comment: "Postgres is optimized for rollback."
by cyberax on 4/26/23, 8:57 PM
I had a similar personal experience. In my previous job we used Postgres to implement a task queuing system, and it created a major bottleneck, resulting in tons of concurrency failures and bloat.
And most dangerously, the system failed catastrophically under load. As the load increased, most transactions ended up in concurrent failures, so very little actual work got committed. This increased the amount of outstanding tasks, resulting in even higher rate of concurrent failures.
And this can happen suddenly, one moment the system behaves well, with tasks being processed at a good rate, and the next moment the queue blows up and nothing works.
I re-implemented this system using pessimistic locking, and it turned out to work much better. Even under very high load, the system could still make forward progress.
The downside was having to make sure that no deadlocks can happen.
by elijaht on 4/26/23, 9:16 PM
Oh man, a previous company I worked at had an issue with a hot table (frequent reads + writes) interfering with autovacuum. Many fires over a six month period arose from all of that. I was (luckily) only on an adjacent team, so I don't know the details, other than vacuums taking over 24 hours! I'm sure it could have been prevented, but it seemed horrible to debug
by dale_glass on 4/26/23, 8:25 PM
Way back I was working on an in-house inventory app written in Visual Basic against SQL Server 2000, I think. That one just put locks on tables. It had the "charming" characteristic of that if you weren't very, very careful with Enterprise Manager, loading a table in the GUI put a lock on it and just keep on holding it until that window was closed.
Then the running app would eventually snag on that lock, maybe keep holding some other lock that something else would snag on, and 5 minutes later I'd hear one of the operators screaming "Nothing is working! I can't take any orders!" from the room next to me.
by garyclarke27 on 4/26/23, 6:41 PM
by spprashant on 4/26/23, 6:52 PM
Yeah, table bloat and transaction ID wraparounds are terrible, but easily avoidable if you follow a few simple guidelines. Typically in my experience, best way to avoid these issues are to set sensible vacuum settings and track long running queries.
I do hate the some of the defaults in the Postgres configuration are too conservative for most workloads.
by Max-Ganz-II on 4/26/23, 6:55 PM
(pdf) https://www.redshiftresearchproject.org/white_papers/downloa...
(html) https://www.redshiftresearchproject.org/white_papers/downloa...
I've been told, very kindly, by a couple of people that it's the best explanation they've ever seen. I'd like to get more eyes on it, to pick up any mistakes, and it might be useful in and of itself anyway to reader, as MVCC on Redshift is I believe the same as MVCC was on Postgres before snapshot isolation.
by audioheavy on 4/26/23, 8:41 PM
by zinclozenge on 4/26/23, 9:23 PM
by jasonhansel on 4/27/23, 3:23 AM
Snapshot isolation isn't as robust and straightforward as strict serializability, but it also isn't as performant as READ COMMITTED. It seems like the worst of both worlds.
by runlaszlorun on 4/27/23, 6:08 AM
by avinassh on 4/27/23, 5:05 AM
1. Since MySQL keeps delta to save storage costs, wouldn't read and writes slower because now I have to build the full version from the delta
2. On secondary indexes, they highlight the reads will be slower and also say:
> Now this may make secondary index reads slower since the DBMS has to resolve a logical identifier, but these DBMS have other advantages in their MVCC implementation to reduce overhead.
What are the other advantages they have to make reads faster?
Compared to MySQL, I remember reading that Postgres MVCC lets you alter the table without locking. Now I found out that MySQL also does not require locks. So, how are they doing?
Are there any similar posts which explain MySQL MVCC architecture?
by h1fra on 4/26/23, 8:20 PM
On managed Postgres (i.e: gcp, aws) you pay for the disk, but when you can't run a VACUUM FULL because it locks the table, you end up with a lot of allocated storage for nothing and you can't shrink the disk size (at least on gcp). Storage is cheap but still feels like a waste.
by Ym9oYW4 on 4/26/23, 5:28 PM
by no_wizard on 4/26/23, 6:21 PM
by vp8989 on 4/26/23, 7:58 PM
https://aws.amazon.com/blogs/database/manage-long-running-re...
by umairshahid on 5/1/23, 1:07 PM
by zeroimpl on 4/27/23, 2:03 AM
This doesn’t have anything to do with MVCC. I’m sure PostgreSQL could implement an index format that piggybacks on another index rather than pointing at the physical page directly, without overhauling MVCC.
by Temporary_31337 on 4/27/23, 12:54 PM
by kerblang on 4/26/23, 10:47 PM
by jonsnow_dba on 4/27/23, 3:34 PM