from Hacker News

Show HN: Hck – a fast and flexible cut-like tool

by totalperspectiv on 7/10/21, 3:46 PM with 34 comments

  • by lillesvin on 7/10/21, 5:23 PM

    I wrote something similar (but necet really finished it), called 'gut', in Go a few years back. Funny thing is, that I literally never use it. I thought splitting on regexes and that stuff would be super useful, but it turns out that I just use Perl one-liners instead. And Perl is available on something like 99.99% of all *nix machines, which my own 'cut'-substitute isn't.

    Still a good exercise for me to write it, and I assume for OP too.

  • by rashil2000 on 7/10/21, 5:18 PM

    Love seeing these modern alternatives to coreutils! Ripgrep, fd, hyperfine, bat, exa, bottom, gdu, wc, sd, hexyl...

    Yet to find a GNU 'tr' alternative though

  • by kitd on 7/10/21, 5:56 PM

    Nice work!

    I don't know whether anyone here has used Rexx. The 'parse' instruction in Rexx was incredibly powerful, breaking up text by field/position/delimiter and assigning to variables all in one line.

    I've often wondered if there was a command-line equivalent. Awk is great but you have to 'program' the parsing spec, rather than declare it.

  • by bilalhusain on 7/10/21, 5:51 PM

    It is interesting to note how it compares to "choose" (also in Rust) in the benchmarks.

    single character

        hck           1.494 ± 0.026s
        hck (no-mmap) 1.735 ± 0.004s
        choose        4.597 ± 0.016s
    
    multi character

        hck           2.127 ± 0.004s
        hck (no-mmap) 2.467 ± 0.012s
        choose        3.266 ± 0.011s
    
    The single pass optimization trick[1] seems to be helping a lot in single character case.

    Of course, doing away with a pass is suppossed to give 2x, and I am wondering whether the regex constraint lead to this "side-effect".

    [1] fast mode - https://github.com/sstadick/hck/blob/master/src/lib/core.rs#... https://github.com/sstadick/hck/blob/master/src/lib/core.rs#...

  • by asicsp on 7/11/21, 3:09 AM

    I saw about `hck` recently on twitter, was impressed to see support for compressed files. From the current todo list, I hope complement is implemented for sure.

    I see Negative index is currently "unlikely". I'm writing a similar tool [0], but with bash+awk. I solved the negative index support with a `-n` option, which changes the range syntax to `:` instead of `-` character.

    My biggest trouble came with literal field separator [1], because FS can only be specified as a string in awk and backslash is a metacharacter for both string and regexp.

    [0] https://github.com/learnbyexample/regexp-cut

    [1] https://learnbyexample.github.io/escaping-madness-awk-litera...

  • by visarga on 7/10/21, 8:28 PM

    <offtopic> I have implemented a `_split` command to split a line by a separator and `_stat` command that does basically `sort | uniq -c | sort -nr` counting elements and sorting by frequency. Really useful operations for me.

    When my one liners become 2-3 lines long I need to switch to a regular script, but I also log all my shell commands years back and have something a bit better than `history | grep word` to search it.</>

  • by rendall on 7/11/21, 3:40 AM

    The README and description should not assume the reader knows what `cut` is or what it's used for. Maybe reference it and then ELI5
  • by technological on 7/11/21, 1:00 AM

    Nice one op. It’s mostly due to my lack of knowledge of rust but the code is not easy to read unlike golang. Does anyone feel the same ? (between nothing to do with how op wrote but rather the language itself)
  • by queuebert on 7/10/21, 5:59 PM

    Yay, no more piping multiple cuts when you have multiple delimiters.
  • by toastal on 7/10/21, 5:01 PM

    Heck