from Hacker News

Upgrading Executable on the Fly

by pantuza on 1/4/22, 5:40 PM with 71 comments

  • by mholt on 1/4/22, 7:21 PM

    We did this for Caddy 1 too [1]. It was really cool. I am not sure how many people used this feature, so I haven't implemented it for Caddy 2 yet, and in the ~two years that Caddy 2 has been released, I've only had the request once. It's a bit tricky/tedious to do properly, but I'm willing to bring it over to Caddy 2 with a sufficient sponsorship.

    [1]: https://github.com/caddyserver/caddy/blob/v1/upgrade.go

  • by bragr on 1/4/22, 7:41 PM

    I've implemented this a few times in a few languages based on exactly what nginx does. It works well, and it is pretty straight forward if you are comfortable with posix style signals, sockets, and daemons.

    I'm not sure it is super critical in the age of containerized workloads with rolling deploys but at the very least the connection draining is a good pattern to implement to prevent deploy/scaling related error spikes.

  • by moderation on 1/4/22, 8:16 PM

  • by majke on 1/4/22, 8:20 PM

    Just a shout out: it's super hard to do it for UDP / QUIC / H3. Beware.

    (but I don't think nginx supports h3 out of the box yet)

  • by monroeclinton on 1/4/22, 10:31 PM

    I've been working on something similar in a load balancer I've been writing in Rust. It's still a work in progress.

    Basically the parent executes the new binary after it receives a USR1 signal. Once the child is healthy it kills the parent via SIGTERM. The listener socket file descriptor is passed over an environment variable.

    https://github.com/monroeclinton/- (this is the proper url, it's called dash)

  • by bob1029 on 1/4/22, 10:31 PM

    I've considered building something like this to allow for us to update customer software while it's serving users.

    In my proposals, there would be a simple application-aware http proxy process that we'd maintain and install on all environments. It would handle relaying public traffic to the appropriate final process on an alternate port. There would be a special pause command we could invoke on the proxy that would buy us time to swap the processes out from under the TCP requests. A second resume command would be issued once the process is running and stable. Ideally, the whole deal completes in ~5 seconds. Rapid test rollbacks would be double that. You can do most of the work ahead of time by toggling between an A and B install path for the binaries, with a third common data path maintained in the middle (databases, config, etc)

    With the above proposal, the user experience would be a brief delay at time of interaction, but we already have some UX contexts where delays of up to 30 seconds are anticipated. Absolutely no user request would be expected to drop with this approach, even in a rollback scenario. Our product is broad enough that entire sections of it can be a flaming wasteland while other pockets of users are perfectly happy, so keeping the happy users unbroken is key.

  • by rootlocus on 1/4/22, 8:11 PM

    How do the two processes listen to the same port?
  • by bogomipz on 1/4/22, 10:18 PM

    I am curious does anyone know why Nginx uses SIGWINCH for this? I know Apache uses WINCH as well which makes me wonder if there was some historical reason a server process wound up using a signal meant for a TTY?
  • by jabedude on 1/4/22, 9:30 PM

    Seems like a useful feature for a service manager like systemd to have for its managed services. It is already able to perform inetd style socket activation, I imagine this would be a welcome feature
  • by dang on 1/4/22, 11:35 PM

    One thread with one comment from a long time ago:

    Upgrading an Nginx executable on the fly - https://news.ycombinator.com/item?id=8677077 - Nov 2014 (1 comment)

  • by politelemon on 1/4/22, 9:31 PM

    So if I understood correctly, would it be like this

    cp new/nginx /path/to/nginx kill -SIGUSR2 <processid>

    That does sound pretty neat if you're not running nginx in a container. I wonder if they've built a Windows equivalent for that.

  • by kokizzu2 on 1/5/22, 2:59 AM

  • by nullify88 on 1/4/22, 8:14 PM

    The systemv init script for nginx had an upgrade operation (in addition to start/stop/reload etc) which would send the signal. Worked like a charm.
  • by secondcoming on 1/4/22, 8:18 PM

    I've always found the multi-process approach taken by both nginx and apache to be nothing but a hindrance when you have to write a custom module. It means that you may have to use shared memory, which is a PITA.

    I don't know why they haven't moved on from it; it only really made sense when uni-core processors were the norm.