from Hacker News

Readyset: A MySQL and Postgres wire-compatible caching layer

by lsferreira42 on 2/21/24, 10:09 AM with 69 comments

  • by BrentOzar on 2/21/24, 2:37 PM

    In the Microsoft SQL Server space, several of these vendors have come and gone. My clients have been burned badly by 'em, so a few quick lessons learned:

    Be aware that there are hundreds of open issues[0] and dozens of pull requests [1], some of which involve clients being unable to connect or not supporting all components of the SQL language. Just because your database supports something, doesn't mean your caching layer will.

    It gets really ugly when a new version of your database comes out, with brand new features and language enhancements, and the caching layer doesn't support it. It may take months, or in some cases years, before the caching layer is feature-complete with the underlying database. If you want to use some of those language enhancements, then your app may have to maintain two connection strings - one for the caching layer, and one for direct database queries that the caching layer doesn't support.

    Your support teams need to learn how to diagnose problems with the caching layer. For example, this issue [2] talks about the level of work involved with understanding why newly inserted data isn't showing up in selects.

    I hope they succeed and deliver the concept, because it's one of the holy grails of databases.

    [0]: https://github.com/readysettech/readyset/issues [1]: https://github.com/readysettech/readyset/pulls [2]: https://github.com/readysettech/readyset/issues/39

  • by timsuchanek on 2/21/24, 12:47 PM

    This is one of the deepest deep tech startups I've seen in a long time. I had the pleasure to meet some of the folks at RustConf in Portland.

    Readyset is basically reimplementing a full database, at the absolute bleeding edge of db research, enabling global partial replication of any kind of data.

    A solution desperately needed, as databases grow.

    You can think of it as an intelligent LRU cache in front of your database. An important step towards fast globally distributed applications.

    I hope this project will get more publicity and adoption - it's very well deserved.

  • by dpcx on 2/21/24, 2:13 PM

    Pretty sure that this is the database that Jon Gjengset[0] was working on as part of his thesis project. There have been several videos shared by him during talks about the system. It's a really interesting concept.

    edit: Here's[1] a video where he talks about the concept

    [0]: https://www.youtube.com/@jonhoo [1]: https://www.youtube.com/watch?v=GctxvSPIfr8

  • by thom on 2/21/24, 3:38 PM

    Someone knowledgeable might know: is this just incremental view updates? To what extent is the cache intelligent if parameters, where clauses, or aggregates change?

    I really love this space and have been impressed with Materialize, but even if you can make some intermediate state incremental, if your workload is largely dynamic you end up needing to jump the whole way to OLAP platforms. I’m hopeful that we’re closer and closer to having our cake and eating it here, and that the operational data warehouse is only round the corner.

  • by alwaysrusty on 2/21/24, 3:50 PM

    From a tech perspective, this is really cool. From a use case perspective, could someone help me understand why a developer would adopt something like this over a database like Clickhouse, outside of some fintech use cases where milliseconds of latency really matter? I'd be worried about introducing an additional point of failure to the data stack. And, if this is like Materialize, I'd be worried about this not suppporting ad hoc queries -- only precomputed ones.
  • by larsnystrom on 2/21/24, 1:22 PM

    They have a pretty good writeup on why you’d want to use this here: https://blog.readyset.io/dont-use-kv-stores/
  • by 10000truths on 2/21/24, 1:58 PM

    This sounds like it has heavy overlap with IVM. How does Readyset distinguish itself from existing solutions like pg_ivm or Materialize?
  • by exabrial on 2/21/24, 1:40 PM

    One of the things we love about using JPA (We use EclipseLink) is it comes with caching, for free, and it’s transparent. You can mark any field as a cache index and it automatically tries the cache first. Updates are published and loaded into every nodes cache automatically, and you get fallback protection in the form of incremental version numbers on rows.

    The one thing it can’t handle however is range update queries or native queries that perform updates.

    You can just avoid them in your architecture… OR maybe this is the solution we’ve been looking we’ve been looking for! definitely going to give this a spin!

    Documentation looks very complete and I like there’s a UI to view the query cache.

  • by zer00eyz on 2/21/24, 12:48 PM

    I dont understand what the use case is for this.

    If I have a front end, I would hope that the formated response is what were caching. Be that HTML or JSON.

    If I cant read from that cache then I should be reading from fresh data all together? right?

  • by joeatwork on 2/21/24, 2:12 PM

    This isn’t an open source project (which isn’t a bad thing! Just a non-obvious thing if you don’t scroll down their whole readme)
  • by ltbarcly3 on 2/21/24, 7:45 PM

    Trying to add transparent caching to a transactional database is just a bad idea and cannot work. Anyone who says it works for them is just in the period after putting it in place and before when they realize why it cannot work.

    If it was possible to just slap a cache in between you and the db and magically make shit fast, DB vendors would have done that 20 years ago. Billions of dollars a year is put into relational db development. Papers are published every week, from theoretical ways to model and interact with data to practical things like optimizing query execution plans.

    Unless Readyset can point to a patent or a paper that has fundamentally revolutionized how database will be built from today forward it is going to be crap and will burn you.

  • by KingOfCoders on 2/21/24, 3:05 PM

    This might be good tech and a good company.

    Once we used a distributed caching system in a startup which was open source. Then the open source version got cut features we needed, so we bought a license. Then the startup was bought up by a large software company and the license costs went 10x YoY with a one week notice. As our migration away from this tech was not done, because it was very complicated and tied into our application we had to pay. Luckily we also had been bought and the very large costs were not a problem. I would never again use something from a company that is crucial to our operations.

  • by qaq on 2/21/24, 1:02 PM

    But the read side is already fairly trivial to scale with read replicas
  • by AYBABTME on 2/21/24, 5:14 PM

    Love the work. Looks quite similar to what PlanetScale Boost does[1]. Basically the same but as a front-end to someone's existing database? (disclaimer: I work at PS).

    [1]: https://planetscale.com/blog/how-planetscale-boost-serves-yo...

  • by giovannibonetti on 2/21/24, 1:23 PM

    I imagine a good use case for this at its current stage would be for powering up a monitoring dashboard that runs ad-hoc queries against your operational DB. I've seen this situation in a previous Fintech company I worked at, where we had some people staring at dashboards all day long looking for issues in any of the subsystems.
  • by mdaniel on 2/21/24, 5:25 PM

    I just wanted to give a high five for having Jepsen tests for this: https://github.com/readysettech/readyset/tree/stable-240117/...
  • by notapoolshark on 2/21/24, 3:40 PM

    Slight tangent, but this reminds me of discussions I've seen in the Postgres email servers about native support for real-time materialized views. Does anyone know if we can expect to see something like this in a future version of Postgres?
  • by redwood on 2/21/24, 12:45 PM

    Anyone successfully using? There are a few other services out there like PolyScale. It will be interesting to see if any of these introduce some form of write support over time
  • by fastest963 on 2/21/24, 2:05 PM

    What are some advantages to ReadySet versus read replicas from YugabyteDB or CockroachDB? A downside is that it appears to require a separate cloud subscription.
  • by potamic on 2/21/24, 2:06 PM

    > ReadySet is licensed under the BSL 1.1 license, converting to the open-source Apache 2.0 license after 4 years.

    What does this mean?