from Hacker News

Way too many ways to wait on a child process with a timeout

by broken_broken_ on 11/10/24, 11:01 PM with 32 comments

  • by machine_coffee on 11/11/24, 3:38 PM

    Lol, author's thought process mirrored mine as I read the article, as I was reading I was thinking, 'doesn't kqueue support that?... and then a section on kqueue. Then I was thinking to myself, so how does the Linux implementation do it then?... was just about to start trawling the source code when 'A parenthesis..'

    Great article. Sorry to say though, Windows does manage all this in a more consistent way - but I guess they had the benefit of a clean slate.

  • by adrianmonk on 11/13/24, 10:33 PM

    Tenth Approach: fork() two processes.

    Child 1 exec()s the command.

    Child 2 does this:

        signal(SIGALRM, alarm_handler);
        alarm(timeout_length);
        pause();
        exit(0);
    
    Start both children, then call wait(), which blocks until any child exits and returns the pid of the child that exited. If it's the command child, then your command finished. If it's the other child, then the timeout expired.

    Now that one child has exited, kill() the other child with SIGTERM and reap it by calling wait() again.

    All of this assumes you'll only have these two children going, but if you're writing a small exponential backoff command retry utility, that should be OK.

  • by AnotherGoodName on 11/14/24, 2:05 AM

    In the early days of android i had an app that had to do video transcoding yet often hit oom on startup (reported via telemetry) even when the phone should have enough memory. This was before android had any video transcoding built in (2.3 days).

    The solution was to spawn a child process, use memory in a loop, catch the sigkill in the parent, yield to the os as it killed other processes to free memory in the device as a whole and then on return from sleep in the parent process after killing the child start the video transcoding.

    Hopefully this hack is not needed but if you want android to proactively run its process killing job so your app starts with maximum free memory the above worked!

  • by greggyb on 11/13/24, 8:12 PM

    Not so much about timeouts, but related in that it is based around managing children processes:

    The lineage of tools descending from daemontools for service management is worth exploring:

    daemontools: http://cr.yp.to/daemontools.html

    runit: https://smarden.org/runit/

    s6: https://skarnet.org/software/s6/

    dinit: https://davmac.org/projects/dinit/

  • by nf3 on 11/11/24, 6:10 AM

    FWIW io_uring does have support for waitid.

    https://www.man7.org/linux/man-pages/man3/io_uring_prep_wait...

  • by nasretdinov on 11/11/24, 2:37 PM

    So many ways and no-one mentioned threads..?

    Edit: by threads I mean creating a new thread to wait for the process, and then kill the process after a certain timeout if the process hasn't terminated. I guess I'm spoiled by Go...

  • by eduction on 11/13/24, 7:02 PM

    He mentions Bryan Cantrill in there and I can’t resist posting his famous epoll/kqueue rant:

    https://youtu.be/l6XQUciI-Sc?t=3643

    I know this is related but maybe someone smarter than me can explain how closely it relates (or doesn’t) to this issue which seems more general (iirc Cantrill was talking about fs events not child processes generally)

  • by xchip on 11/11/24, 11:14 AM

    Thanks for this great article, it is going to be very useful for my project. I am currently developing an open source Android native app that invokes rsync when a file gets closed (ie: you take a picture)

    https://github.com/aguaviva/Syncy

  • by akira2501 on 11/13/24, 8:11 PM

    > I would prefer extending poll to support things other than file descriptors, instead of converting everything a file descriptor to be able to use poll.

    Why? The ability to block on these descriptors as a one off rather than wrapping into a poll makes them extremely useful and avoids the race issues that exist with signal handlers and other non-blocking mechanisms.

    signalfd, timerfd, eventfd, userfaultfd, pidfd are all great applications of this strategy.

  • by cdaringe on 11/15/24, 1:19 AM

    I wrote a crate https://crates.io/crates/swaperooni for similar use cases some time ago. I only gave the article a cursory scan, and can clearly see much deeper thought given here. Can't wait to dig in after work and learn a little bit.

    Dunking on my crate is welcomed :)

  • by moron123 on 11/13/24, 1:26 AM

    Parenting 101
  • by JackSlateur on 11/11/24, 8:43 PM

    What is the meaning of this code ?

      void on_sigchld(int sig) { (void)sig; }
  • by o11c on 11/13/24, 7:49 PM

    > Because the Linux kernel coalesces SIGCHLD (and other signals), the only way to reliably determine if a monitored process has exited, is to loop through all PIDs registered by any kqueue when we receive a SIGCHLD. This involves many calls to waitid(2) and may have a negative performance impact.

    This is somewhat wrong. To speed things up in the happy case (where we are the only part of the program that is spawning children), you can just do a `WNOHANG` wait for any child first, and check if it's one of the children we care about. Only if it's an unknown child do you have to do the full loop (of course, if you only have a couple of children the loop may be better).

  • by tlsalmin on 11/15/24, 8:36 AM

    First nitpick:

       static int pipe_fd[2] = {0};
    
    0 is valid fd, so I recommend initializing fds to -1.

    signalfd was just off-hand mentioned, but for writing anything larger, like lets say a daemon process, it keeps things close to all the other events being reacted to. E.g.

      #include <signal.h>
      #include <unistd.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <sys/timerfd.h>
      #include <sys/signalfd.h>
      #include <sys/epoll.h>
    
      static int signalfd_init(void)
        {
          sigset_t sigs, oldsigs;
          int sfd = -1;
    
          sigemptyset(&sigs);
          sigemptyset(&oldsigs);
          sigaddset(&sigs, SIGCHLD);
          if (!sigprocmask(SIG_BLOCK, &sigs, &oldsigs))
            {
              sfd = signalfd(-1, &sigs, SFD_CLOEXEC | SFD_NONBLOCK);
              if (sfd != -1)
                {
                  // Success
                  return sfd;
                }
              else
                {
                  perror("signalfd");
                }
              sigprocmask(SIG_SETMASK, &oldsigs, NULL);
            }
          else
            {
              perror("sigprocmask");
            }
          return -1;
        }
    
      static int timerfd_init(void)
        {
          int tfd = timerfd_create(CLOCK_MONOTONIC, TFD_NONBLOCK | TFD_CLOEXEC);
    
          if (tfd != -1)
            {
              struct itimerspec tv =
                {
                  .it_value = 
                    {
                      .tv_sec = 5
                    }
                };
              if (!timerfd_settime(tfd, 0, &tv, NULL))
                {
                  return tfd;
                }
              else
                {
                  perror("timerfd_settime");
                }
              close(tfd);
            }
          else
            {
              perror("timerfd_create");
            }
          return -1;
        }
    
      static int epoll_init(int sfd, int tfd)
        {
          int efd;
    
          if (!sfd || !tfd)
            {
              return -1;
            }
    
          efd = epoll_create1(EPOLL_CLOEXEC);
          if (efd != -1)
            {
              struct epoll_event ev[2] =
                {
                    {
                      .events = EPOLLIN,
                      .data =
                        {
                          .fd = sfd,
                        }
                    },
                    {
                      .events = EPOLLIN,
                      .data = 
                        {
                          .fd = tfd
                        }
                    }
                };
              if (!epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &ev[0]) &&
                  !epoll_ctl(efd, EPOLL_CTL_ADD, tfd, &ev[1]))
                {
                  return efd;
                }
              else
                {
                  perror("epoll_ctl");
                }
              close(efd);
            }
          else
            {
              perror("epoll_create1");
            }
          return -1;
        }
    
      int main(int argc, char *argv[])
        {
          int exit_value = EXIT_FAILURE;
          int sfd = signalfd_init(),
              tfd = timerfd_init(),
              efd = epoll_init(sfd, tfd);
    
          if (sfd != -1 && tfd != -1 && efd != -1)
            {
              int child_pid = fork();
    
              if (child_pid != -1)
                {
                  if (!child_pid)
                    {
                      argv += 1;
                      if (-1 == execvp(argv[0], argv)) {
                          exit(EXIT_FAILURE);
                      }
                      __builtin_unreachable();
                    }
                  else
                    {
                      int err;
                      struct epoll_event ev;
    
                      while ((err = epoll_wait(efd, &ev, 1, -1)) > 0)
                        {
                          if (ev.data.fd == tfd)
                            {
                              // Read the signalfd for the possible SIGCHLD and
                              exit_value = EXIT_SUCCESS;
                            }
                          else if (ev.data.fd == tfd)
                            {
                              // Timer triggered, kill the child process.
                            }
                        }
                      if (err == -1)
                        {
                          perror("epoll_wait");
                        }
                    }
                }
              else
                {
                  perror("fork");
                }
            }
          close(sfd);
          close(tfd);
          close(efd);
          exit(exit_value);
        }