by bishopsmother on 3/6/23, 5:47 PM with 455 comments
by samwillis on 3/6/23, 7:38 PM
Fly (to my understanding) at its core is about edge compute. That is where they started and what the team are most excited about developing. It's a brilliant idea, they have the skills and expertise. They are going to be successful at it.
However, at the same time the market is looking for a successor to Heroku. A zero dev ops PAAS with instant deployment, dirt simple managed Postgres, generous free level of service, lower cost as you scale, and a few regions around the world. That isn't what Fly set out to do... exactly, but is sort of the market they find themselves in when Heroku then basically told its low value customers to go away.
It's that slight miss alignment of strategy and market fit that results in maybe decisions being made that benefit the original vision, but not necessarily the immediate influx of customers.
I don't envy the stress the Fly team are under, but what an exciting set of problems they are trying to solve, I do envy that!
by yamrzou on 3/6/23, 7:09 PM
by throwawaaarrgh on 3/7/23, 4:27 AM
Reliability is a thing that grows, like a plant. You start out with a new system or piece of software. It's fragile, small, weak. It is threatened by competing things and literal bugs and weather and the soil it's grown in and more. It needs constant care. Over time it grows stronger, and can eventually fend for itself pretty well. Sometimes you get lucky and it just grows fine by itself. And sometimes 50 different things conspire to kill it. But you have to be there monitoring it, finding the problems, learning how to prevent them. Every garden is a little different.
It doesn't matter what a company like Fly does technology wise. It takes time and care and churning. Eventually they will be reliable. But the initial process takes a while. And every new piece of tech they throw in is another plant in the garden.
So the good news is, they can become really reliable. But the bad news is, it doesn't come fast, and the more new plants they put in the ground, the more concerns there are to address before the garden is self sustaining.
by jrochkind1 on 3/6/23, 7:58 PM
Also:
> The Heroku exodus broke our assumptions. Pre-Heroku, most of the apps we were running were spread across regions. And: we were growing about 15% per month. But post-Heroku, we got a huge influx of apps in just a few hot spots — and at 30% per month.
I hadn't before seen anyone with a big picture view confirm a heroku exodus was happening, although a lot of people suspected it or had anecdotes.
But if fly is seeing a pretty enormous number of customers moving from heroku to fly... oh wait, now I'm wondering, is this mainly a result of heroku ending free services, and those are free customers coming to fly for free services?
If so... that's a pretty big burden to take on without revenue to match, it does seem kind of dangerous for fly.
by pyentropy on 3/6/23, 9:47 PM
As someone that has started tons of Consul clusters, analyzed tons of Terraform states, developed providers and wrote a HCL parser, I must say this:
HashiCorp built a brand of consistent design & docs, security, strict configuration, distributed-algos-made-approachable... but at its core, it's a very fragile ecosystem. The only benefit of HashiCorp headaches is that you will quickly learn Golang while reading some obscure github.com/hashicorp/blah/blah/file.go :)
by pier25 on 3/6/23, 8:26 PM
The PG issues hit me two times in the previous weeks but other than that it's been working great for me.
With the move to v2 apps (using their new machines infra) things are actually faster and smoother than ever.
About a year or so ago their CLI was quite buggy but I haven't really hit any bugs in months.
I will remain with Fly for the time being. Hopefully they don't close shop!
by nu11ptr on 3/6/23, 7:36 PM
by lll-o-lll on 3/6/23, 8:42 PM
And then I was “Huh, these technical challenges are actually pretty difficult”
And then I was all “crap, these are a bunch of technologies I was about to add to our stack”
Thanks heaps fly.io people; having the humility to honestly talk about the challenges and failures massively helps people such as myself as we navigate new unfamiliar technologies. If more companies were willing to do this, it’d be a lot easier to avoid common pitfalls.
by outworlder on 3/6/23, 7:46 PM
Eh? Unless you are consuming something as a service and it actually advertises it as a feature, nothing is ready for 'global deployment'.
If you have a 'centralized' secret storage, then you have made it tied to a region. Want to have redundancies and lower latency? You'll have to distribute it. Vault has docs about this: https://developer.hashicorp.com/vault/tutorials/day-one-raft...
by sergiotapia on 3/6/23, 7:25 PM
Maybe they were _too_ ambitious at the start? They have a hard road ahead of them, and competition like Render.com and Northflank have provided me with solutions to all of my problems. Great dev ux, great prices and predictable solutions. They also keep pushing out very useful features. A third competitor also sprung up Railway! There's certainly blood in the water.
Will they catch up to others before the competition solves the "global mesh" unique value proposition Fly.io currently has? That's the $1MM question.
by e1g on 3/6/23, 8:24 PM
by claytonjy on 3/6/23, 7:27 PM
I could see them building something RDS-like on their own, but if they're trying to go further than that I wonder if they'll buy or partner with other companies rather than doing it themselves. Neon strikes me as a Postgres-as-a-service that could pair well with Fly.
by deivid on 3/6/23, 9:49 PM
by nomilk on 3/6/23, 11:40 PM
I moved one app successfully from heroku to fly and attempted to move a few others. These are my experiences (both good and bad):
Great:
- The load time on the pages is insanely faster on fly than heroku. Sometimes I thought I was on the localhost version of the app, it was that snappy.
- Love that it uses a Dockerfile
- Love paying for what I use (compared to Heroku's rigid minimum of $16/month for hobby dyno w/ postgres for baby apps, or $34/month just to get a second web dyno for toddler apps). The same apps are <$5/month each on fly.
Not great:
- I find the fly.toml file hard to understand and use, and the cycle time slow to fix or tinker with it. It's partly (entirely?) a 'me' problem because I haven't spent a huge amount of time reading the documentation.
- I found scheduling a rake task in a rails app time consuming (~days) the first time, but very easy (15 minutes) the second and subsequent times, once I knew a way that worked (cron didn't work; had to use a tool I hadn't used before 'supercronic').
- Deploys sometimes time out with `Error failed to fetch an image or build from source: error rendering push status stream: EOF`. Most layers copied, but randomly, some layers wouldn't. All I could do is keep trying until it worked, which it did, 2 hours later. Not the end of the world, but an annoying complication when you're already trying to solve complex problems.
- I followed a youtube video on how to move a rails app from heroku to fly, and it worked on a modern app, but I couldn't quite get fly happy when moving the older app - something to do with postgres versions, and I didn't want to spend all day figuring it out. I'm not hugely experienced with docker, it could have been an easy fix for someone more experienced.
On reflection, 3 of the 4 negatives above are solvable by me reading the docs more thoroughly and getting more proficient with docker.
I look forward to continuing using and exploring fly, and can't be happier with the directness, transparency and care from fly staff. A platform with huge potential.
by skywhopper on 3/6/23, 9:14 PM
by emschwartz on 3/6/23, 6:53 PM
I’m not sure it is for 100% of early stage startups, but I guess it is once you exceed some minimum usage threshold.
That said, definitely appreciate the detailed explanation.
by ashiban on 3/6/23, 9:29 PM
It gets significantly more challenging when you grow, either in feature complexity or scale complexity - and then very few services can offer what AWS/GCP/Azure offer - albeit at the increased engineering/monetary cost of using them.
We're building a different kind of approach[0] that aims to absorb the mechanical cost of using public cloud capabilities (that are proven to scale) without hiding it altogether.
by djha-skin on 3/6/23, 11:13 PM
I wonder why they didn't try to use Serf[1] for this, since they were so into HashiCorp tools. It also uses the gossip protocol.
by birracerveza on 3/7/23, 7:42 AM
by tebbers on 3/6/23, 7:53 PM
by iamdbtoo on 3/6/23, 7:25 PM
by clement_b on 3/7/23, 8:09 AM
by plasma on 3/6/23, 11:12 PM
Would it help to replace Corrosion with a simpler "Here's my local known state" blob that is POST'd to blob storage (for example) on a major cloud provider, and have another service read that at intervals? Just to make it really simple.
There will be a better way than that, but my thought is if you can make it simpler (known state is always just pushed, so missing updates auto-recovers and avoids corruption) then you can be building on top of a more stable service discovery system.
Centralized secret storage, can you keep the US instance read/write, but replicate read-only copies (a side-car tool that copies the database to other regions at various intervals?) so each region can fetch secrets locally?
Or perhaps both can be solved with a general "Copy local state to other regions" service that is pretty simple but gives each region its own copy of other region's information (secrets, provisioning states, ...).
I've needed to do similar things for some of the apps I've built, where a service needed another (simpler) service in front of it to bear the traffic load but was operationally simple (deferred the smarts to the system it was using as the source of truth) and automatically recovered from failure due to its simplicity.
by rtpg on 3/7/23, 1:13 AM
The thing that worries me about these incidents is they haven't been, like, full service outages. A small subset of users talking about issues in forums. This makes me just feel like Fly has an immense amount of issues.
At least if like 50% of fly goes down then it feels like a config fat finger. When it's a bunch of tiny issues now all my ops debugging has to start with going to the fly forums (and it's _always been issues on fly's side_).
The price is "right" (though like with all PaaS the gaslighting about running multiple processes in one container makes me feel bad about the state of cloud computing). And I really like the CLI stuff mostly! But I extremely don't care about edge computing so for me fly is just heroku and I would love to feel more confident on that end.
(EDIT: the nice thing is I get email support with a bit of cash. This is a thing that will go away when they get bigger but it's here while things are still breaking often)
by hinkley on 3/7/23, 12:17 AM
I fundamentally don't understand why people are in such a big hurry to get 'famous'. I've worked a couple of places where the marketing side was working as hard as they could to make sure that our heads were on fire at all possible moments. At one job I had a (very, very junior) manager come up to me and say great news we landed <big customer> and my immediate reply was, "fuck me". We were already running to stay upright and now we're about to have twice as much scrutiny. Wonderful.
If you push hard enough, eventually everyone looks like an idiot. The number of humans for whom that is not true could fit into a book. Both alive and deceased. They most definitely do not work for the companies I've described, at least not enough of them so you'd notice.
by debarshri on 3/7/23, 1:19 AM
I used to work for a company that built deployment platforms for law firms. All our deployments where on prem and we had the same complexity with kubernetes. We had similar setup with vault and stolon for HA PG. More moving parts you have in infra, more permutations and combinations of failure modes you have.
What these guys are building is something I have seen in many orgs trying to do it internally and fail. PaaS is a hard problem if you want to solve it "reliabily"
by theloco on 3/6/23, 7:47 PM
by ec109685 on 3/6/23, 8:20 PM
I take it that it’s far more important that the local region know about changes than a remote region, which makes a mastered store in one location as the source of truth problematic.
I also wonder why these companies don’t backstop themselves on the public cloud? Failing into an AWS seems better than running out of capacity and some its services could be used in circumstances where an open source technology isn’t ready.
by computomatic on 3/7/23, 12:34 AM
I took a quick look and couldn't find them. Do they have any documented service limits?
A google search turned up [0] which does not inspire optimism.
> ...there isn’t a limit to number of apps from a billing standpoint...
[0] https://community.fly.io/t/free-tier-limits-and-quota-needs-...
by soperj on 3/6/23, 7:02 PM
by revskill on 3/6/23, 9:52 PM
I'm bullish on fly.io.
by pwelch on 3/7/23, 5:05 PM
I've only used Fly.io for a personal app but I think it's a great option so I hope they keep growing.
by Karupan on 3/7/23, 12:10 AM
by ChrisMarshallNY on 3/6/23, 8:55 PM
I've been lucky, in the past, but a lot of that, is because I have "overengineered," and the tools/frameworks have advanced to meet the new demand.
I am in the middle of a complete, bottom-to-top rewrite of the app we've been developing for the last couple of years. It's going great, but making this leap was a fraught decision.
It's mainly, so I wouldn't have to write a post like that, in a year or two.
We spent all the time refining it, until we had what we wanted, and it worked great on our small test team.
Then, I loaded up a test server with 10,000 fake users, and tossed the app at that. To be fair, we don't think we'll have even that many users for quite a while. It's a very specialized demographic.
* SOB *
It no do so well.
At that point, I had to decide whether to fix the issues (they were quite fixable), or revisit the architecture.
The main issue with the architecture, was that it was an "accreted" app, with changes gradually being factored in, as we progressed. The main reason for this, is because no one really knew what they wanted, until we ran it up the flagpole (sound familiar?).
The business logic was distributed throughout the app. That was ... ugly.
I envisioned myself, a year or two down the road, sucking on a magnum, because the app had turned into a Cruftosaurus, and was coming for me in my nightmares.
So I decided to rewrite, as we hadn't done any kind of MVP or public beta, so we actually had the runway to do this.
I refined the entire business logic of the app into a single, platform-agnostic SPM module, which took just over a month, and have started to develop the app around that. It's pretty much a rewrite, but I am recycling a lot of the app code. We also brought in our designer, and he's looking at every screen I make. It's working well for him.
Like I said, it's going great. Better than I expected.
I know that I have a huge luxury, and I'm grateful. I can credit a lot of that, to doing some stress-testing before we got to a point where we had a bunch of users to support. I was able to go in, and go all Victor Frankenstein on the model.
The result, so far, is that this thing screams, and you don't really even notice that there's that many users on it. The model has already been proven (that SPM module), and all we're doing, is chrome (which is a ton of work).
by ecmascript on 3/7/23, 8:37 AM
Seems like they have a good understanding what the problems are so they will most likely be solved sooner or later.
Good work and keep honesty as open as you've done so far :)
by tonnydourado on 3/7/23, 2:36 PM
by vinay_ys on 3/7/23, 7:59 AM
by andy_ppp on 3/6/23, 11:46 PM
by tiffanyh on 3/6/23, 7:22 PM
by Dave3of5 on 3/7/23, 9:25 AM
Solving hard problems like this seems interesting.
On the other hand it could be a giant shit show of micromanagement and toxicity, who knows really.
At the moment they aren't hiring though so that's that.
by sidcool on 3/7/23, 9:17 AM
by chucky_z on 3/6/23, 7:46 PM
by none_to_remain on 3/6/23, 11:51 PM
by siliconc0w on 3/7/23, 6:09 AM
by gpjanik on 3/7/23, 8:30 AM
by nprateem on 3/7/23, 7:58 AM
by nathants on 3/6/23, 11:11 PM
https://github.com/nathants/libaws
companies like fly are fantastic.
they provide a good service, and they put market pressure on aws.
a free tier isn’t important anymore. with usage based pricing for lambda/dynamo/s3, an app with usage approaching zero has no cost.
by anacrolix on 3/7/23, 3:41 AM
- Machines seem like a waste of time
- Access directly to VMs is being removed (and doesn't support TCP over IPv4, or UDP over IPv6)
- The CDN is nice but should support private networking too.
- Volume management is deficient: It should be possible to access and fix volumes outside the context of an its app instance.
- Egress traffic should be free between apps over private networking, at least in the same DC.
by crabbone on 3/7/23, 3:14 PM
by 1023bytes on 3/6/23, 10:52 PM
by davedx on 3/7/23, 8:18 AM
by al_be_back on 3/7/23, 9:21 AM
by lopatin on 3/6/23, 8:02 PM
by victorbjorklund on 3/6/23, 8:35 PM
by swamp40 on 3/6/23, 11:35 PM
They just lost about 40% of their paying users with that blog post.
by benatkin on 3/7/23, 4:33 AM
by KETpXDDzR on 3/7/23, 4:08 AM