by hartleybrody on 5/17/22, 4:59 PM with 108 comments
by simonw on 5/19/22, 1:01 PM
There's a very solid solution to this that isn't as widely known as it should be.
Read after write consistency is extremely important. If a user makes an edit to their content and then can't see that edit in the next page they load they will assume things are broken, and that the site has lost their content. This is really bad!
The best fix for this is to make sure that all reads from that user are directed to the lead database for a short period of time after they make an edit.
The Fly replay header is perfect for this. Here's what to do:
Any time a user performs a write (which should involve a POST request), set a cookie with a very short time expiry - 5s perhaps, though monitor your worst case replica lag to pick the right value.
I have trust issues with clocks in user's browsers, so I like to do this by including a value of the cookie that's the server-time when it should expire.
In your application's top-level middleware, look for that cookie. If a user has it and the court time has not been reached yet, send a Fly replay header that internally redirects the request to the lead region.
This guarantees that users who have just performed a write won't see stale data from a lagging replica. And the implementation is a dozen or so lines of code.
Obviously this won't work for every product - if you're building a chat app where every active user writes to the database every few seconds implementing this will send almost every piece of traffic to your leaders leaving your replicas with not much to do.
But if your application fits the common pattern where 95% of traffic are reads and only a small portion of your users are causing writes at any one time I would expect this to be extremely effective.
Fly replay headers are explained in detail here: https://fly.io/blog/globally-distributed-postgres/
by rkangel on 5/19/22, 11:05 AM
If you are using Phoenix then LiveView is the obvious approach to dynamically updating a page based on server stuff. It's a similar-ish architecture to HTMX, but integrated into the framework. The page is rendered on the server as normal, then when it loads on the client a web-socket is opened to a task on the server (page includes the LiveView JS). Then when something changes on the server, some new HTML generated and then the parts that have changed are sent down the websocket to the client to insert into the page. LiveView is part of Phoenix, leverages Elixir's concurrency, is very performant and a joy to use.
HTMX is a way of getting similar functionality but for a conventional server rendered framework like Django which doesn't have any of this stuff built in. It would be challenging to build it in anyway because the concurrency isn't as powerful. Simplistically, Phoenix exists because Chris McCord was trying to do a LiveView equivalent in Ruby, had issues, went on a search discovered Elixir.
So either use:
Elixir + Phoenix + Phoenix LiveView
Or:
Python + Django + HTMX (Python and Django can be substituted for other frameworks like Rails)
In both cases, Alpine can then be useful to sprinkle in some clientside only UI features.
by chrismccord on 5/19/22, 1:04 PM
> It seems like this would add a whole new class of bugs, like “I just submitted a form to change a setting and when the page reloaded, it still showed my previous value in the form” – since the write hadn’t propagated to the local read replica yet.
Elixir is distributed out of the box, so nodes can message each other. This allowed us to easily ship a `fly_postgres_elixir` library that guarantees read-your-own-writes: https://github.com/superfly/fly_postgres_elixir
It does this by sending writes to the primary region over RPC (via distributed elixir). The write is performed on a primary instance adjacent to the DB, then the result, and the postgres log-sequence-number, is sent back to the remote node. When the library gets a result of the RPC write, it blocks locally until its local read replica matches an LSN >= write LSN, then the result is returned to the caller
This gives us read-your-own-writes for the end-user, and the calling code remains unchanged for standard code paths. This doesn't solve all classes of race conditions – for example you may broadcast a message over Phoenix.PubSub that causes a read on the remote node for data that isn't yet replicated, but typically you'd avoid an N query problem from pubsub in general by populating the data in the message on the publisher beforehand.
There's no completely avoiding the fact you have a distributed system where the speed of light matters, but it's Fly's (and Phoenix's) goal to push those concerns back as far as possible. For read heavy apps, or apps that use caching layers for reads, developers already face these kinds of problems. If you think of your read-replicas as cache with a convenient SQL interface, you can avoid most foot guns.
I'm happy to answer other questions as it relates to Phoenix, Fly or what Phoenix + Fly enables from my perspective.
by csmpltn on 5/19/22, 7:39 AM
There have been many posts hitting the HN frontpage regarding fly.io recently. Is it healthy to have so much content about a single PAAS platform showing up here so often now?
by tptacek on 5/19/22, 3:32 PM
When we launched, we didn't do persistent storage for instances, so it didn't make as much sense to run ordinary apps here; rather, the idea was that you'd run your full-stack app somewhere like us-east-1, and carve off performance-sensitive bits and run them on Fly.io. That's "edge computing".
But a bit over a year ago, we added persistent volumes, and then we built Fly Postgres on top of it. You can store files on Fly.io or use a bunch of different databases, some of which we support directly. So it makes a lot more sense to run arbitrary applications, like a Rails or Elixir app, which is not something we would have said back in March 2020.
by nicoburns on 5/19/22, 9:23 AM
Worth noting that you don't have to use the distributed aspect. I have my site hosted on a single one of a fly.io's smallest instances (which one can get 3 of for free), and even like this the performance is excellent (50ms response times), and it doesn't have the problem of spinning down when not in use like Heroku's free tier.
It's nice to at least get a choice of regions. For example, the company I work for (not hosted on fly.io currently) only has customers in the UK and Ireland. So it's would be to be able to pop our servers there with a simple config setting.
by davidkuennen on 5/19/22, 8:59 AM
I'm working on big and small projects/companies and that has never been any concern of ours.
I always imagined it to be something only the very very big players care about. And as a big player I would usually bet on a big partner like AWS, GCP, Azure. Or am I missing something?
by maliker on 5/19/22, 11:50 AM
Fly.io also has a clean, highly usable CLI and minimal set of services unlike the hundreds of options on other providers. But that’s just icing on top—the volume support is the big advantage for me.
by pw on 5/19/22, 8:56 AM
No doubts there are plenty of more niche uses (if I were serving users internationally, I’d probably use Fly.io), but the use case just doesn’t seem as broad as the Heroku/PaaS comparisons make it out to be.
by satyrnein on 5/19/22, 12:02 PM
All of this tech sounds cool, but like the author, I'm unsure when it's called for.
by zoomzoom on 5/19/22, 3:59 PM
There are 3 relevant (for this comment) "performance layers" in building software:
- Cycle time of a team or of the project - this is affected the most by language/framework choice, DevOps infrastructure, and team working style - this should be measured in days/weeks
- Feedback loop for an individual dev working on a new ticket - this is based on the team's cycle time but in addition is really about the dev environment, team collaboration, how the team maintains quality, and how well-defined work is before being started - this should be measured in muinutes/hours
- Performance of the software deployed in terms of response time to end users - milliseconds
Fly.io helps the most with category #3. But how often is that really the most important issue in choosing where to deploy your app? If an alternative made small sacrifices there (for example went form 99.99% performance to 99%) but gained velocity for individual devs and the team to be able to ship better product more quickly, would the company/project be better off?
At Coherence (www.withcoherence.com) - disclosure that I'm a cofounder - we're laser-focused on a post-Heroku development platform that goers further than Heroku on categories 1 & 2 above (where I'd argue Heroku is still the gold standard) rather than focusing on category 3.
We're super early but in closed beta - if it sounds exciting please check us out and request a demo on the site!
by kif on 5/19/22, 9:52 AM
The distributed features are there for when you need them – I don't think you have to use them. Or am I missing something?
by Dave3of5 on 5/19/22, 8:07 AM
If you are setup to do some kind of a round robin read from the read replicas you can often get a different read from what you wrote as the value hasn't replicated to your read replicas yet. The solution is to use the write endpoint when reading after a write.
He says that here but just wanted to point out that it can happen inside an api and cause real issues with data.
by Mo3 on 5/19/22, 2:30 PM
Redis instances are single-region single-replica, for example.
On another note, as soon as they offer serverless functions and solid redundant Redis + SQL I'll be thinking about moving some of our production services over there for a test run.
by tlarkworthy on 5/19/22, 7:37 AM
by ewalk153 on 5/19/22, 11:22 AM
When a request comes in to write on a read server that attempt a db write, the request is aborted and replayed on the main write server.
With some clever assumptions such as “get requests rarely write to the db” and “post request usually do”, much of the write traffic can skip the read vms.
They created a ruby rack middleware\2 to standardize this pattern for Ruby on Rails.
by Stampo00 on 5/19/22, 3:42 PM
I just dropped DigitalOcean because of their price hike. No hard feelings. I was barely using it, and the product is growing more towards full-featured apps and teams, which is not as good a fit for me, an individual just screwing around. I don't fault them. I'm not their target customer.
Fly.io is very much designed for use primarily via their CLI tool. Their web interface needs some polish. But it does everything it says on the tin, for a price that's more than reasonable.
I only used Heroku briefly so I can't comment on similarities or differences with any authority.
As someone who is already very comfortable with container-based development, I'm happy with fly.io.
by marban on 5/19/22, 8:52 AM
by jjdeveloper on 5/19/22, 11:27 AM
by dfee on 5/19/22, 10:07 AM
One big miss, though, is you’ll still need a database and s3, so I’m not sure if I totally understand the value.
by julianbuse on 5/19/22, 8:50 AM
by bilater on 5/19/22, 3:02 PM
by dom__inic on 5/19/22, 10:36 AM
html { -webkit-font-smoothing: "antialiased" }