from Hacker News

My thoughts about Fly.io (so far) and other newish technology I'm getting into

by hartleybrody on 5/17/22, 4:59 PM with 108 comments

by simonw on 5/19/22, 1:01 PM
> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.
There's a very solid solution to this that isn't as widely known as it should be.
Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad!
The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit.
The Fly replay header is perfect for this. Here's what to do:
Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value.
I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire.
In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region.
This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code.
Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do.
But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective.
Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/
by rkangel on 5/19/22, 11:05 AM
I think his stack is a little confused. He's got HTMX and Phoenix in there.
If you are using Phoenix then LiveView is the obvious approach to dynamically updating a page based on server stuff. It's a similar-ish architecture to HTMX, but integrated into the framework. The page is rendered on the server as normal, then when it loads on the client a web-socket is opened to a task on the server (page includes the LiveView JS). Then when something changes on the server, some new HTML generated and then the parts that have changed are sent down the websocket to the client to insert into the page. LiveView is part of Phoenix, leverages Elixir's concurrency, is very performant and a joy to use.
HTMX is a way of getting similar functionality but for a conventional server rendered framework like Django which doesn't have any of this stuff built in. It would be challenging to build it in anyway because the concurrency isn't as powerful. Simplistically, Phoenix exists because Chris McCord was trying to do a LiveView equivalent in Ruby, had issues, went on a search discovered Elixir.
So either use:
Elixir + Phoenix + Phoenix LiveView
Or:
Python + Django + HTMX (Python and Django can be substituted for other frameworks like Rails)
In both cases, Alpine can then be useful to sprinkle in some clientside only UI features.
by chrismccord on 5/19/22, 1:04 PM
One of the points about read replicas and read-your-own-writes is correct to call out, but on the Elixir side we have an answer to that:
> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.
Elixir is distributed out of the box, so nodes can message each other. This allowed us to easily ship a `fly_postgres_elixir` library that guarantees read-your-own-writes: https://github.com/superfly/fly_postgres_elixir
It does this by sending writes to the primary region over RPC (via distributed elixir). The write is performed on a primary instance adjacent to the DB, then the result, and the postgres log-sequence-number, is sent back to the remote node. When the library gets a result of the RPC write, it blocks locally until its local read replica matches an LSN >= write LSN, then the result is returned to the caller
This gives us read-your-own-writes for the end-user, and the calling code remains unchanged for standard code paths. This doesn't solve all classes of race conditions – for example you may broadcast a message over Phoenix.PubSub that causes a read on the remote node for data that isn't yet replicated, but typically you'd avoid an N query problem from pubsub in general by populating the data in the message on the publisher beforehand.
There's no completely avoiding the fact you have a distributed system where the speed of light matters, but it's Fly's (and Phoenix's) goal to push those concerns back as far as possible. For read heavy apps, or apps that use caching layers for reads, developers already face these kinds of problems. If you think of your read-replicas as cache with a convenient SQL interface, you can avoid most foot guns.
I'm happy to answer other questions as it relates to Phoenix, Fly or what Phoenix + Fly enables from my perspective.
by csmpltn on 5/19/22, 7:39 AM
An unrelated, yet honest question.
There have been many posts hitting the HN frontpage regarding fly.io recently. Is it healthy to have so much content about a single PAAS platform showing up here so often now?
by tptacek on 5/19/22, 3:32 PM
Just a quick note that the list of applications for Fly.io at the end of this post was taken from our Launch HN --- https://news.ycombinator.com/item?id=22616857 --- and we've changed (expanded) since then.
When we launched, we didn't do persistent storage for instances, so it didn't make as much sense to run ordinary apps here; rather, the idea was that you'd run your full-stack app somewhere like us-east-1, and carve off performance-sensitive bits and run them on Fly.io. That's "edge computing".
But a bit over a year ago, we added persistent volumes, and then we built Fly Postgres on top of it. You can store files on Fly.io or use a bunch of different databases, some of which we support directly. So it makes a lot more sense to run arbitrary applications, like a Rails or Elixir app, which is not something we would have said back in March 2020.
by nicoburns on 5/19/22, 9:23 AM
> But despite how much I want to learn the fly.io platform – it has been a bit tricky for me wrap my head around a good use-case for this type of distributed hosting service.
Worth noting that you don't have to use the distributed aspect. I have my site hosted on a single one of a fly.io's smallest instances (which one can get 3 of for free), and even like this the performance is excellent (50ms response times), and it doesn't have the problem of spinning down when not in use like Heroku's free tier.
It's nice to at least get a choice of regions. For example, the company I work for (not hosted on fly.io currently) only has customers in the UK and Ireland. So it's would be to be able to pop our servers there with a simple config setting.
by davidkuennen on 5/19/22, 8:59 AM
Do many companies actually need databases geolocated near users?
I'm working on big and small projects/companies and that has never been any concern of ours.
I always imagined it to be something only the very very big players care about. And as a big player I would usually bet on a big partner like AWS, GCP, Azure. Or am I missing something?
by maliker on 5/19/22, 11:50 AM
I’ve used fly.io for a couple new projects. The main thing I like about it is that it supports affordable and easy to use persistent volumes, something the container services (GCP, AWS) don’t. This lets me test locally with volumes in a way that is identical to how things will work when deployed. With the other container hosts, I’ve had to refactor to use cloud storage services like S3.
Fly.io also has a clean, highly usable CLI and minimal set of services unlike the hundreds of options on other providers. But that’s just icing on top—the volume support is the big advantage for me.
by pw on 5/19/22, 8:56 AM
The end of this article raises the issue of whether Fly.io’s USP, deploying app servers close to your users, is useful for run of the mill web apps. And as much as I like Fly.io and the people associated with it, I’ve wondered this myself. It just seems like serving to US customers from any major US data center is generally fast enough. And I think this might even be true for the world of HTML-over-the-wire web stuff, which Fly.io seems to investing heavily in.
No doubts there are plenty of more niche uses (if I were serving users internationally, I’d probably use Fly.io), but the use case just doesn’t seem as broad as the Heroku/PaaS comparisons make it out to be.
by satyrnein on 5/19/22, 12:02 PM
Tired of SPA complexity? Try server side rendering, now with websockets, globally distributed nodes, read replicas, and eventual consistency!
All of this tech sounds cool, but like the author, I'm unsure when it's called for.
by zoomzoom on 5/19/22, 3:59 PM
There's been a lot of talk about fly.io lately - it's clearly an awesome and exciting platform. But I'd have to agree with the author here that it doesn't solve the core problems faced by most web devs and web dev teams.
There are 3 relevant (for this comment) "performance layers" in building software:
- Cycle time of a team or of the project - this is affected the most by language/framework choice, DevOps infrastructure, and team working style - this should be measured in days/weeks
- Feedback loop for an individual dev working on a new ticket - this is based on the team's cycle time but in addition is really about the dev environment, team collaboration, how the team maintains quality, and how well-defined work is before being started - this should be measured in muinutes/hours
- Performance of the software deployed in terms of response time to end users - milliseconds
Fly.io helps the most with category #3. But how often is that really the most important issue in choosing where to deploy your app? If an alternative made small sacrifices there (for example went form 99.99% performance to 99%) but gained velocity for individual devs and the team to be able to ship better product more quickly, would the company/project be better off?
At Coherence (www.withcoherence.com) - disclosure that I'm a cofounder - we're laser-focused on a post-Heroku development platform that goers further than Heroku on categories 1 & 2 above (where I'd argue Heroku is still the gold standard) rather than focusing on category 3.
We're super early but in closed beta - if it sounds exciting please check us out and request a demo on the site!
by kif on 5/19/22, 9:52 AM
> But despite how much I want to learn the fly.io platform – it has been a bit tricky for me wrap my head around a good use-case for this type of distributed hosting service.
The distributed features are there for when you need them – I don't think you have to use them. Or am I missing something?
by Dave3of5 on 5/19/22, 8:07 AM
A little note about read replicas and problems I've discovered. It's often the case in code that you write a value to the DB then immediately read it back often in the form of a different query somewhere else.
If you are setup to do some kind of a round robin read from the read replicas you can often get a different read from what you wrote as the value hasn't replicated to your read replicas yet. The solution is to use the write endpoint when reading after a write.
He says that here but just wanted to point out that it can happen inside an api and cause real issues with data.
by Mo3 on 5/19/22, 2:30 PM
Fly.io, as sorry as I am to say, does not come close to the functionality Heroku offers yet.
Redis instances are single-region single-replica, for example.
On another note, as soon as they offer serverless functions and solid redundant Redis + SQL I'll be thinking about moving some of our production services over there for a test run.
by tlarkworthy on 5/19/22, 7:37 AM
Cloud Run is an alternative to fly.io that scales to zero so it can be much cheaper but with added cold starts.
by ewalk153 on 5/19/22, 11:22 AM
Flyio does promote a pattern for avoiding the distributed write database complexity: request replay, a single main write database, and replicated read dbs.\1
When a request comes in to write on a read server that attempt a db write, the request is aborted and replayed on the main write server.
With some clever assumptions such as “get requests rarely write to the db” and “post request usually do”, much of the write traffic can skip the read vms.
They created a ruby rack middleware\2 to standardize this pattern for Ruby on Rails.
\1 https://fly.io/blog/run-ordinary-rails-apps-globally/
2\ https://github.com/superfly/fly-ruby
by Stampo00 on 5/19/22, 3:42 PM
I'm enjoying fly.io so far.
I just dropped DigitalOcean because of their price hike. No hard feelings. I was barely using it, and the product is growing more towards full-featured apps and teams, which is not as good a fit for me, an individual just screwing around. I don't fault them. I'm not their target customer.
Fly.io is very much designed for use primarily via their CLI tool. Their web interface needs some polish. But it does everything it says on the tin, for a price that's more than reasonable.
I only used Heroku briefly so I can't comment on similarities or differences with any authority.
As someone who is already very comfortable with container-based development, I'm happy with fly.io.
by marban on 5/19/22, 8:52 AM
I found render.com more convincing than fly.io (which looked more like a beta product with a prime-time landing page), with both of them not being anywhere close to making me jump from GCP.
by jjdeveloper on 5/19/22, 11:27 AM
I’m unsure why you would need HTMX and alpine. Phoenix I believe is capable of handling what both of those would provide, or perhaps I’m missing something?
by dfee on 5/19/22, 10:07 AM
I’ve got a deploy running on Fly.io, but I didn’t go with the buildpack option; instead I’m pushing a locally built docker image (buildpacks don’t support pnpm).
One big miss, though, is you’ll still need a database and s3, so I’m not sure if I totally understand the value.
by julianbuse on 5/19/22, 8:50 AM
I think some things need to be built out some more, like their postgres and adding more storage after creation, but in general it's really enjoyable to use. The pricing seems fair, and their blog is intersting and fun to read.
by bilater on 5/19/22, 3:02 PM
Noob question but don't Netlify and Vercel do this already at least for Next.js apps (cache to the edge, run serverless functions on edge nodes etc)?
by dom__inic on 5/19/22, 10:36 AM
Unrelated to this topic, but I think you should apply this one line of CSS to your stylesheet - it improves your text aesthetics and readability :)
html { -webkit-font-smoothing: "antialiased" }