from Hacker News

S5cmd: Parallel S3 and local filesystem execution tool

by polyrand on 6/11/25, 1:38 PM with 23 comments

  • by smpretzer on 6/11/25, 4:30 PM

    I have used s5cmd in a professional setting and it works wonderfully. I have never attempted to test performance to confirm their claims, but as an out of the box client, it is (anecdotally) significantly faster than anything else I have tried.

    My only headache was that I was invoking it from python, and it does not have bindings, so I had to write a custom wrapper to call out to it. I am not sure of the difficulty of adding native support for Python, but I assume its not worth the squeeze and just calling out to a subprocess will work for most user's needs.

  • by therealmarv on 6/11/25, 2:38 PM

    Very interesting. And this is an amazing graph for small file uploading/download speed improvement. I have the feeling that all cloud drives are really not optimised for many small files like smaller than 1Mbytes in average.

    https://raw.githubusercontent.com/peak/s5cmd/master/doc/benc...

    I've implemented at work once a rudimentary parallel uploading of many small files to S3 in Python and with boto3 (was not allowed to use a third party library or tool at that time) because it's soooo slow to upload many small files to S3. It really takes ages and even if you just upload 8 small files in parallel it makes a huge difference.

  • by rsync on 6/11/25, 2:59 PM

    If you don't want to install, or maintain, s5cmd yourself:

      ssh user@rsync.net s5cmd
      
      Password:
      NAME:
      s5cmd - Blazing fast S3 and local filesystem execution tool
      
      USAGE:
      s5cmd [global options] command [command options] [arguments...]
    
    If you move data between clouds, you don't need to use any of your own bandwidth ...
  • by Galanwe on 6/11/25, 2:53 PM

    > For downloads, s5cmd can saturate a 40Gbps link (~4.3 GB/s)

    I'm surprised by these claims. I have worked pretty intimately with S3 for almost 10 years now, developed high performance tools to retrieve data from it, as well as used dedicated third party tools for performant file download tailored for S3.

    My experience is that individual S3 connections are capped over the board at ~80MB/s, and the throughput of 1 file is capped at 1.6GB/s (at least per ec2 instance). At least I have never managed myself nor seen any tool capable of going beyond that.

    My understanding is then that this benchmark's claims of 4.3GB/s are across multiple files, but then it would be rather meaningless, as it's free concurrency basically.

  • by quodlibetor on 6/11/25, 3:27 PM

    I recently wrote a similar tool focused more on optimizing the case of exploring millions or billions of objects when you know a few aspects of the path: https://github.com/quodlibetor/s3glob

    It supports glob patterns like so, and will do smart filtering at every stage possible: */2025-0[45]-*/user*/*/object.txt

    I haven't done real benchmarks, but it's parallel enough to hit s3 parallel request limits/file system open file limits when downloading.*

  • by BlackLotus89 on 6/11/25, 3:52 PM

    If you want parallel upload/download over s3 and other protocols (GDrive, ftp, WebDAV, ...) you can use rclone.

    For s3 mounts I would use geesefs.

    Have to later take a look at s5cmd as well...

  • by nodesocket on 6/11/25, 2:27 PM

    I posted a link to s5cmd in the other post about s3mini. Really want to try it out migrating a few TBs of s3 data.
  • by StackTopherFlow on 6/11/25, 3:22 PM

    Awesome work. I love seeing a community project step up and make a better solution than a multi trillion dollar company.
  • by remram on 6/11/25, 5:37 PM

    What is meant by "filesystem execution" here? Is it just an S3 file transfer tool?