by CaptainJustin on 5/11/22, 4:42 PM with 38 comments
What are the options we have for handling state as the edge? What do you use in your business or service?
by powersurge360 on 5/11/22, 6:26 PM
You can do basically the same idea with any relational database, have a write leader... somewhere and a bunch of read replicas that live close to the edge.
There's also what you would call cloud native data stores that purport to solve the same issue, but I don't know much about how they work because I much prefer working w/ relational databases and most of those are NoSQL. And I haven't had to actually solve the problem yet for work so I also haven't made any compromises yet in how I explore it.
Another interesting way to go might be CockroachDB. It's wire compatible w/ PostgreSQL and supposedly automatically clusters and shares data in the cluster. I don't know very much about it but it seems to be becoming more and more popular and many ORMs seem to have an adapter to support it. May also be worth looking into because if it works as advertised you can get an RBDMS that you can deploy to an arbitrary number of places and then configure to talk to one another and not have to worry about replicating the data or routing correctly to write leaders and all that.
And again, I'm technical, but I haven't solved these problems so consider the above to be a jumping off point and take nothing as gospel.
by don-code on 5/11/22, 4:54 PM
While it's possible to distribute state to many AWS regions and select the closest one, I ended up going a different route: packaging state alongside the application. Most of the application's state was read-only, so I ended up packaging the application state up as JSON alongside the deployment bundle. At startup, it'd then statically read the JSON into memory - this performance penalty only happens at startup, and as long as the Lambda functions are being called often (in our case they are), requests are as fast as a memory read.
When the state does need to get updated, I just redeploy the application with the new state.
That strategy obviously won't work if you need "fast" turnaround on your state being in sync at all points of presence, or if users can update that state as part of your application's workflow.
by lewisl9029 on 5/11/22, 8:53 PM
Most of the approaches mentioned here will give you fast reads everywhere, but writes only fast if you're close to some arbitrarily chosen primary region.
A few technologies I've experimented with for doing fast, eventually consistently replicated writes: DynamoDB Global Tables, CosmosDB, Macrometa, KeyDB.
None of them are perfect, but in terms of write latency, active-active replicated KeyDB in my fly.io cluster has everything else beat. It's the only solution that offered _reliable_ sub-5ms latency writes (most are close to 1-2ms). Dynamo and Cosmos advertise sub-10ms, but in practice, while _most_ writes fall in that range, I've seen them fluctuate wildly to over 200ms (Cosmos was much worse than Dynamo IME), which is to be expected on the public internet with noisy neighbors.
Unfortunately, I got too wary of the operational complexity of running my own global persistent KeyDB cluster with potentially unbounded memory/storage requirements, and eventually migrated most app state over to use Dynamo as the source of truth, with the KeyDB cluster as a auto-replicating caching layer so I don't have to deal with perf/memory/storage scaling and backup. So far that has been working well, but I'm still pre-launch so it's not anywhere close to battle tested.
Would love to hear stories from other folks building systems with similar requirements/ambitions!
by kevsim on 5/11/22, 7:15 PM
by michaellperry71 on 5/11/22, 8:10 PM
If records are allowed to change, then you end up in situations where changes don't converge. But if you instead collect a history of unchanging events, then you can untangle these scenarios.
Event Sourcing is the most popular implementation of a history of immutable events. But I have found that a different model works better for data at the edge. An event store tends to be centrally localized within your architecture. That is necessary because the event store determines the one true order of events. But if you relax that constraint and allow events to be partially ordered, then you can have a history at the edge. If you follow a few simple rules, then those histories are guaranteed to converge.
Rule number 1: A record is immutable. It cannot be modified or deleted.
Rule number 2: A record refers to its predecessors. If the order between events matters, then it is made explicit with this predecessor relationship. If there is no predecessor relationship, then the order doesn't matter. No timestamps.
Rule number 3: A record is identified only by its type, contents, and set of predecessors. If two records have the same stuff in them, then they are the same record. No surrogate keys.
Following these rules, analyze your problem domain and build up a model. The immutable records in that model form a directed acyclic graph, with arrows pointing toward the predecessors. Send those records to the edge nodes and let them make those millisecond decisions based only on the records that they have on hand. Record their decisions as new records in this graph, and send those records back.
Jeff Doolittle and I talk about this system on a recent episode of Software Engineering Radio: https://www.se-radio.net/2021/02/episode-447-michael-perry-o...
No matter how you store it, treat data at the edge as if you could not update or delete records. Instead, accrue new records over time. Make decisions at the edge with autonomy, knowing that they will be honored within the growing partially-ordered history.
by deckard1 on 5/11/22, 6:35 PM
A number of people are talking about Lambda or loading files, SQLite, etc. These aren't likely to work on CF. CF uses isolated JavaScript sandboxes. You're not guaranteed to have two workers accessing the same memory space.
This is, in general, the problem with serverless. The model of computing is proprietary and very much about the fine print details.
edit: CF just announced their SQLite worker service/API today: https://blog.cloudflare.com/introducing-d1/
by fwsgonzo on 5/11/22, 5:04 PM
If you want to go one step further you can build a VMOD for Varnish to run your workloads inside Varnish, even with Rust: https://github.com/gquintard/vmod_rs_template
by F117-DK on 5/11/22, 4:48 PM
by crawdog on 5/11/22, 5:12 PM
Have your process regularly update the CDB file from a blob store like S3. Any deltas can be pulled from S3 or you can use a message bus if the changes are small. Every so often pull the latest CDB down and start aggregating deltas again.
CDB performs great and can scale to multiple GBs.
by rektide on 5/11/22, 5:44 PM
Most data-formats are thick-formats, pack data into a single file. Part of the effort in switching to git would be a shift to trying to unpack our data, to really make use of the file system to store fine grained pieces of data.
It's been around for a while, but Irmin[1] (written in Ocaml) is a decent-enough almost-example of these kinds of practices. It lacks the version control aspect, but 9p is certainly another inspiration, as it encouraged state of all things to be held & stored in fine-grained files. Git I think is a superpower, but just as much: having data which can be scripted, which speaks the lingua-franca of computing- that too is a superpower.
[1] https://irmin.org/ https://news.ycombinator.com/item?id=8053687 (147 points, 8 years ago, 25 comments)
by Joel_Mckay on 5/12/22, 1:50 AM
YMMV, I just discovered my favorite game on my phone was intended for cats. ;-)
Cheers, J
by adam_arthur on 5/11/22, 10:50 PM
Cloudflare KV can store most of what you need in JSON form, while DurableObjects let you model updates with transactional guarantees.
My app is particularly read heavy though, and backing data is mostly static (but gets updated daily).
Honestly after using Cloudflare feel like they will easily become the go to cloud for building small/quick apps. Everything is integrated much better than AWS and way more user friendly from docs and dev experience perspective. Also their dev velocity on new features is pretty insane.
Honestly didn't think that much of them until I started digging into these things.
Edit: And just today their S3 competitor entered open beta https://blog.cloudflare.com/r2-open-beta/
by efitz on 5/11/22, 7:24 PM
In my application, I had a central worker process that would ingest state updates and would periodically serialize the data to a MySQL database file, adding indexes and so forth and then uploading a versioned file to S3.
My Lambda workers would check for updates to the database, downloading the latest version to the local temp directory if there was not a local copy or if the local copy was out of date.
Then the work of checking state was just a database query.
You can tune timings etc to whatever your app can tolerate.
In my case the problem was fairly easy since state updates only occurred centrally; I could publish and pull updates at my leisure.
If I had needed distributed state updates I would have just made the change locally without bumping version, and then send a message (SNS or SQS) to the central state maintainer for commit and let the publication process handle versioning and distribution.
by ccouzens on 5/11/22, 9:52 PM
I can share a blog post about this if there is interest.
It gives us very good performance (p95 under 1ms) as the function doesn't need to call an external service.
by tra3 on 5/11/22, 5:53 PM
by jFriedensreich on 5/11/22, 9:45 PM
cloudflare kv store is great if the supported write pattern fits
if you need something with more consistency between pops durable objects should be on your radar
i also found that cloudant/couchdb is a perfect fit for a lot of usecases with heavy caching in the cf worker. its also possible to have multiple master replication with each couchdb cluster close to the local users, so you dont have to wait for writes to reach a single master on the other side of the world
by Elof on 5/11/22, 7:57 PM
by marzoeva on 5/11/22, 11:53 PM
by innerzeal on 5/12/22, 7:45 AM
by asdf1asdf on 5/11/22, 5:21 PM
Now on to develop the actual application that will host/serve your data to said cache layer.
If you learn basic application architecture concepts, you won't be fooled by sales person lies again.
by rad_gruchalski on 5/11/22, 10:51 PM
by weatherlight on 5/11/22, 10:44 PM