from Hacker News

Parallel decompression of gzip-compressed files

by nkrumm on 5/20/19, 4:09 PM with 2 comments

  • by nkrumm on 5/20/19, 4:12 PM

    GitHub: https://github.com/Piezoid/pugz.

    From the readme:

    "Contrary to the pigz program which does single-threaded decompression (see https://github.com/madler/pigz/blob/master/pigz.c#L232), pugz found a way to do truly parallel decompression. In a nutshell: the compressed file is splitted into consecutive sections, processed one after the other. Sections are in turn splitted into chunks (one chunk per thread) and will be decompressed in parallel. A first pass decompresses chunks and keeps track of back-references (see e.g. our paper for the definition of that term), but is unable to resolve them. Then, a quick sequential pass is done to resolve the contexts of all chunks. A final parallel pass translates all unresolved back-references and outputs the file."

  • by LinuxBender on 5/20/19, 4:25 PM

    Somewhat related, for bzip2, I use pbzip2 which uses all the cores, or as many as you specify. [1] It is in the EPEL repo for RHEL/CentOS/Fedora.

    [1] - https://linux.die.net/man/1/pbzip2