by pmig on 3/10/25, 12:48 PM with 115 comments
by paranoidrobot on 3/10/25, 9:44 PM
K8S provides two (well three, now) health checks.
How this interacts with ALB is quite important.
Liveness should always return 200 OK unless you have hit some fatal condition where your container considers itself dead and wants to be restarted.
Readiness should only return 200 OK if you are ready to serve traffic.
We configure the ALB to only point to the readiness check.
So our application lifecycle looks like this:
* Container starts
* Application loads
* Liveness begins serving 200
* Some internal health checks run and set readiness state to True
* Readiness checks now return 200
* ALB checks begin passing and so pod is added to the target group
* Pod starts getting traffic.
time passes. Eventually for some reason the pod needs to shut down.
* Kube calls the preStop hook
* PreStop sends SIGUSR1 to app and waits for N seconds.
* App handler for SIGUSR1 tells readiness hook to start failing.
* ALB health checks begin failing, and no new requests should be sent.
* ALB takes the pod out of the target group.
* PreStop hook finishes waiting and returns
* Kube sends SIGTERM
* App wraps up any remaining in-flight requests and shuts down.
This allows the app to do graceful shut down, and ensures the ALB doesn't send traffic to a pod that knows it is being shut down.
Oh, and on the Readiness check - your app can use this to (temporarily) signal that it is too busy to serve more traffic. Handy as another signal you can monitor for scaling.
e: Formatting was slightly broken.
by bradleyy on 3/10/25, 7:32 PM
One of my former co-workers went to a K8S shop, and longs for the simplicity of ECS.
No software is a panacea, but ECS seems to be one of those "it just works" technologies.
by _bare_metal on 3/10/25, 5:43 PM
The amount of companies who use K8s when they have no business nor technological justification for it is staggering. It is the number one blocker in moving to bare metal/on prem when costs become too much.
Yes, on prem has its gotchas just like the EKS deployment described in the post, but everything is so much simpler and straightforward it's much easier to grasp the on prem side of things.
by paol on 3/10/25, 6:02 PM
The AWS Load Balancer Controller uses readiness gates by default, exactly as described in the article. Am I missing something?
Edit: Ah, it's not by default, it requires a label in the namespace. I'd forgotten about this. To be fair though, the AWS docs tell you to add this label.
by cassianoleal on 3/11/25, 12:11 AM
At that point we already had a CRD used by most of out tenant apps, which deployed an opinionated (but generally flexible enough) full app stack (Deployment, Service, PodMonitor, many sane defaults for affinity/anti-affinity, etc, lots of which configurable, and other things).
Because we didn't have an opinion on what tenant apps would use in their containers, we needed a way to make the pre-stop sleep small but OS-agnostic.
We ended up with a 1 LOC (plus headers) C app that compiled to a tiny static binary. This was put in a ConfigMap, which the controller mounted on the Pod, from where it could be executed natively.
Perhaps not the most elegant solution, but a simple enough one that got the job done and was left alone with zero required maintenance for years - it might still be there to this day. It was quite fun to watch the reaction of new platform engineers the first time they'd come across it in the codebase. :D
by NightMKoder on 3/10/25, 9:04 PM
by happyweasel on 3/11/25, 6:23 AM
20 years ago we used simple bash scripts using curl to do rest calls to take one host out of our load balancers, then scp to the host and shut down the app gracefully, and updated the app using scp again, then put it back into the load balancer after testing the host on its own. we had 4 or 5 scripts max, straightforward stuff..
They charge $$$ and you get downtime in this simple scenario ?
by glenjamin on 3/10/25, 6:24 PM
We had perfectly good rolling deploys before k8s came on the scene, but k8s insistence on a single-phase deployment process means we end up with this silly workaround.
I yelled into the void about this once and I was told that this was inevitable because it's an eventually consistent distributed system. I'm pretty sure it could still have had a 2 phase pod shutdown by encoding a timeout on the first stage. Sure, it would have made some internals require more complex state - but isn't that the point of k8s? Instead everyone has to rediscover the sleep hack over and over again.
by js2 on 3/11/25, 4:30 AM
by strangelove026 on 3/11/25, 1:42 AM
by jayd16 on 3/11/25, 2:10 PM
Is there a slick strategy for this? Is it possible to have minutes long pre-stop hooks? Is the only option to give client connections an abandon ship message and kick them out hopefully fast enough?
by gurrone on 3/11/25, 1:20 PM
by yosefmihretie on 3/10/25, 8:38 PM
by evacchi on 3/10/25, 5:54 PM
> Seamless Migrations with Zero Downtime
(I don't work for them but they are friends ;))