by elithrar on 5/11/22, 12:59 PM with 228 comments
by slashdev on 5/11/22, 3:26 PM
I'm guessing this is a single master database with multiple read replicas. That means it's not consistent anymore (the C in ACID). Obviously reads after a write will see stale data until the write propogates.
I'm a bit curious how that replication works. Ship the whole db? Binary diffs of the master? Ship the SQL statements that did the write and reapply them? Lots of performance and other tradeoffs here.
What's the latency like? This likely doesn't run in every edge location. Does the database ship out on the first request. Get cached with an expiry? Does the request itself move to the database instead of running at the edge - like maybe this runs on a select subset of locations?
So many questions, but no details yet.
by kurinikku on 5/11/22, 1:19 PM
https://tailscale.com/blog/database-for-2022
by jgrahamc on 5/11/22, 1:49 PM
by losvedir on 5/11/22, 2:18 PM
* How exactly is the read replication implemented? Is it using litestream behind the scenes to stream the WAL somewhere? How do the readers keep up? Last I saw you just had to poll it, but that could be computationally expensive depending on the size of the data (since I thought you had to download the whole DB), and could potentially introduce a bit of latency in propagation. Any idea what the metrics are for latency in propagation?
* How are writes handled? Does it do the Fly thing about sending all requests to one worker?
I don't quite know what a "worker" is but I'm assuming it's kind of like a Lambda? If you have it replicated around the world, is that one worker all running the same code, and Cloudflare somehow manages the SQL replicating and write forwarding? Or would those all be separate workers?
by hn_ei_ser_23 on 5/11/22, 2:09 PM
But it is really hard getting some useful information from this article. I can't even tell if it is not there or just buried in all this marketing hot air.
So, what is it really? Is there one Write-Master that is asynchronously replicated to all other locations? Will writes be forwarded to this master and then replicated back?
I'm very curious about how it performs in real life. Especially considering the locking behavior (SQLite has always the isolation level 'serializable' iirc). The more you put in a transaction or the longer you have to wait for another process to finish their writes, the more likely you have to deal with stale data.
But overall I'm very excited. Also by the fly.io announcement, of course. Lots of innovation and competition. Good times for customers.
by infogulch on 5/11/22, 2:28 PM
One thing I've noticed that many commenters miss about read-replicated SQLite is assuming that the only valid model is having one, giant, centralized database with all the data. Lets be honest with ourselves, the vast majority of applications hold personal or B2B data and don't need centralized transactions, and at scale will use multi-tenant primary keys or manual sharding anyways. For private data, a single SQLite database per user / business will easily satisfy the write load of all but the most gigantic corporations. With this model you have unbounded compute scaling for new users because they very likely don't need online transactions across multiple databases at once.
Some questions:
Will D1 be able to deliver this design of having many thousands of separate databases for a single application? Will this be problematic from a cost perspective?
> since we're building on the redundant storage of Durable Objects, your database can physically move locations as needed
Will D1 be able to easily migrate the "primary" at will? CockroachDB described this as "follow the sun" primary.
by fzaninotto on 5/11/22, 2:26 PM
The demo is also a bit buggy: orders are duplicated as many times as there are products, but clicking on the various lines of the same order leads to the same record, where the user can only see the first product...
I also think the demo would have more impact if it wasn't read-only (although I understand that this could lead to broken pages if visitors mess up with the data).
Anyway, kudos to the CloudFlare team!
by ranguna on 5/11/22, 1:55 PM
I see cloudflare people are on this post, any chance to compar D1 vs postgres in terms of DB features?
Insert ... Returning
Stored procedures and triggers
Etc etc
Would be really helpful to get a comparison like cockroachDB did here https://www.cockroachlabs.com/docs/stable/postgresql-compati...
Or even better, a general sql compatibility matrix like this https://www.cockroachlabs.com/docs/stable/sql-feature-suppor...
Kudos to the cloudflare team!
by the_duke on 5/11/22, 2:14 PM
sqlite is a great embedded database and thanks to use by browsers and on mobile the most used database in the world by orders of magnitude.
But it also comes with lots of limitations.
* there is no type safety, unless you run with the new strict mode, which comes with some significant drawbacks (eg limited to the handful of primitive types)
* very narrow set of column types and overall functionality in general
* the big one for me: limited migration support, requiring quite a lot of ceremony for common tasks (eg rewriting a whole table and swapping it out)
These approaches (like fly.io s) with read replication also (apparently?) seem to throw away read after write consistency. Which might be fine for certain use cases and even desirable for resilience, but can impact application design quite a lot.
With sqlite you have do to a lot more in your own code because the database gives you fewer tools. Which is usually fine because most usage is "single writer, single or a few local readers". Moving that to a distributed setting with multiple deployed versions of code is not without difficulty.
This seems to be mitigated/solved here though by the ability to run worker code "next to the database".
I'm somewhat surprised they went this route. It probably makes sense given the constraints of Cloudflares architecture and the complexity of running a more advanced globally distributed database.
On the upside: hopefully this usage in domains that are somewhat unusual can lead to funding for more upstream sqlite features.
by ngrilly on 5/11/22, 6:10 PM
That's important to understand because that's one of the key advantages of SQLite compared to the usual client-server architecture of databases like PostgreSQL or MySQL: https://www.sqlite.org/np1queryprob.html
by samwillis on 5/11/22, 1:12 PM
Its perfect for content type sites that want search and querying.
Anyone from CF here, is it using Litestream (https://litestream.io) for its replication or have you built your own replication system?
I assume this first version is somewhat limited on write performance having a single "main" instance and SQLite laking concurrent writes? It seems to me that using SQLite sessions[0] would be a good way to build an eventually consistent replication system for SQLite, would be perfect for an edge first sql database, maybe D2?
by endisneigh on 5/11/22, 2:03 PM
Also, any plans to support PATCH x-update-range so SQLite can be used entirely in the browser via SQLite.js?
Can someone enlighten me with the types of use cases this would be better for vs say Postgres?
by lucasyvas on 5/11/22, 2:53 PM
You weren't lying, and this is super cool - the SQLite hype train also seems to be in full force.
by rmbyrro on 5/11/22, 2:17 PM
In 2-3 years from now, these services will be so mature and strong they will be crushing the cloud market.
They're turning dreams into reality, one after another.
by jpcapdevila on 5/11/22, 4:29 PM
CF people around, I would love to chat, if anyone is interested please reach out at: jp@javascriptdb.com
I'll be applying to this beta for sure!
by mwcampbell on 5/11/22, 1:37 PM
Also, I wonder how hard it will be to migrate existing PostgreSQL databases and SQL statements. Of course, I understand if Cloudflare is focused on greenfield applications.
by ryanto on 5/11/22, 2:08 PM
From the blog post it says read-only replicas are created close to users and kept up to date with the latest data.
- How should I think about this in terms of CAP? If there's a write and I query a replica what happens?
- How are writes handled? Do they go to a single location or are they handled by various locations?
I'm excited to try this. It's so cool to see databases being distributed "on CDNs" for lack of a better term.
by tyingq on 5/11/22, 2:25 PM
That's interesting to me. It opens the door for Cloudflare to offer something more like a "normal" serverless offering. One that can run containers, or least natively run Python/Golang/Java/etc, like AWS Lambda does. And with this ecosystem described above that can conditionally route between the lighter edge Workers and the heavier central serverless functions. To me, that's the tipping point where they start to threaten larger portions of AWS.
by SheinhardtWigCo on 5/11/22, 4:32 PM
Good: Workers, KV, Durable Objects, Cron Triggers
Bad: Spectrum, Zaraz, R2, D1
by lucasyvas on 5/11/22, 3:13 PM
I'm thinking of sqlx in Rust (or any other language binding / ORM for that matter), which has compile time schema safety. This is a nice capability, and because this interface seems non-standard (possibly for good reason), I guess we are being asked to give some of those things up.
I am getting a bit ahead of myself on the Rust part (presumably that will eventually be supported as part of workers-rs), but I think the feelings still stand if you consider the JS ecosystem.
Edit: I may actually be wrong, but presumably the entire surface isn't covered because there's no file opening, etc.
by irq-1 on 5/11/22, 8:49 PM
The key is to let the user decide what really needs ACID and what doesn't. If someone wants to make the next Facebook or Reddit they'll need huge write throughput and if some votes or updates are lost, that may be a good trade-off.
[1] You could add a BEW file (like WAL file) to sqlite for Best Effort Writes.
by didip on 5/11/22, 4:23 PM
* How do you replicate it consistently?
* Who has the master privilege (or masters if sharded)? What's the failover story?
I am guessing a blob store is involved, but I have gaps in my understanding here.
by frogger8 on 5/11/22, 1:18 PM
One thing I hope to see in the future is a better product filtering experience. When I worked on a jquery product filter I realized the DOM bloat was the main problem.
I wonder if D1 can help devs build instant product filtering pages that don’t require the reload like microcenter or Newegg does.
by _kyran on 5/11/22, 2:12 PM
by greenie_beans on 5/11/22, 1:54 PM
edit: maybe one day! this looks cool regardless
by aeyes on 5/11/22, 5:02 PM
Are there any limitations, for example on the number of tables or size of the database?
by xwdv on 5/11/22, 3:02 PM
by pier25 on 5/11/22, 2:54 PM
Is the data replicated to all regions?
by dinkleberg on 5/11/22, 2:57 PM
by jcuenod on 5/11/22, 4:04 PM
by ralusek on 5/11/22, 3:03 PM
by robertlagrant on 5/11/22, 2:19 PM
by estensen on 5/11/22, 2:50 PM
by whitepaint on 5/11/22, 2:34 PM
by philholden on 5/11/22, 2:11 PM
by jzer0cool on 5/11/22, 9:59 PM
by benjiweber on 5/11/22, 6:14 PM
by polskibus on 5/11/22, 2:19 PM
by deanc on 5/11/22, 7:28 PM
by oxff on 5/11/22, 2:55 PM
by onphonenow on 5/11/22, 2:00 PM
by alberth on 5/11/22, 1:43 PM
This enables entirely new classes of applications where everything can now be hosted by Cloudflare.
Questions:
a. To help with concurrent writes, will Cloudflare be using WAL2 and BEGIN CONCURRENT branches of SQLite?
b. How is Cloudflare replicating the data cross region? Will it be Litestream.io behind the scenes?
c. Will our Worker code need to be written differently to ensure only a single-writer is writing to SQLite database?
d. How does data persistency and database file size get factored in? I have to imagine their is a limit to how much storage can be used, whether or not that storage is local to the Worker machine, and if its persistent.
by rvz on 5/11/22, 1:14 PM
Maybe they will announce a Hashicorp competitor in their next reveal. Who knows.