While compressing large amount of file using gzip
I realised that it is quite slow, specially if you use --best
flag for compressing maximum. While searching on web, I got hold of this tool named pigz which is quite fast as it does the compression in a parralel manner accross multiple cores. The website explains it as:
pigz, which stands for parallel implementation of gzip, is a fully functional replacement for gzip that exploits multiple processors and multiple cores to the hilt when compressing data. pigz was written by Mark Adler, and uses the zlib and pthread libraries.
Let me try to compress files using pigz
and gzip
see how fast can pigz
get:
CPU
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i3-7020U CPU @ 2.30GHz
Stepping: 9
CPU MHz: 1102.124
CPU max MHz: 2300.0000
CPU min MHz: 400.0000
BogoMIPS: 4599.93
Virtualization: VT-x
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 512 KiB
L3 cache: 3 MiB
Memory status while running the tools:
free -h
total used free shared buff/cache available
Mem: 11Gi 2.9Gi 2.6Gi 359Mi 5.9Gi 7.9Gi
Swap: 2.0Gi 710Mi 1.3Gi
-
For a single file
Original size:
1649192 VID_20200413_193959.mp4
-
Compressing this video file using
pigz
:time pigz --best -k VID_20200413_193959.mp4 real 0m31.855s user 1m56.362s sys 0m2.726s
Due to the parallel execution nature of
pigz
all the cores are being used simulataneously.
The output compressed file is:
du VID_20200413_193959.mp4.gz 1648308 VID_20200413_193959.mp4.gz
-
Compressing using
gzip
:time gzip --best -k VID_20200413_193959.mp4 real 1m16.566s user 1m14.458s sys 0m1.748s
As you can see, due to the single threaded execution nature of
gzip
only one core is being used, that too 100%, while other cores are much free, which is not good!The compressed output file:
du VID_20200413_193959.mp4.gz 1648068 VID_20200413_193959.mp4.gz
Result :
Even after trying to compress the same file multiple times, we see thatgzip
is faster and compresses more thanpigz
. Hence, parallel execution does not always guarantee faster execution.
-
For a directory of files.
Original size:
253M Me/
-
Using
gzip
du meGzip.tar.gz 249408 meGzip.tar.gz
-
Using
pigz
Compressed file:du mePigz.tar.gz 249512 mePigz.tar.gz
Result- While dealing with multiple files, parallel execution did help obtain higher compression speed.
-
Compressing a directory of larger size:
Original size:
4.0G retire/
-
Using
pigz
:time tar -cv retire/ | pigz --best > retirePigz.tar.gz
real 2m5.427s user 5m25.956s sys 0m14.496s
Compressed size:
4116660 retirePigz.tar.gz
-
Using
gzip
:time tar -cv retire/ | gzip --best > retireGzip.tar.gz
real 3m26.640s user 3m8.151s sys 0m10.939s
Compressed size:
4116024 retireGzip.tar.gz
Result - For larger sized directories, the difference in speed is more significant while compressing parallelly.
Final Conclusion:
Here we can see that pigz
is much faster then gzip
when compressing multiple files, but gzip
was able to compress more in any condition! Hence when compressing multiple files, use pigz
but for single file use gzip
.