How to Maximize Compression - pete4abw/lrzip-next GitHub Wiki
Maximizing Compression
Normally, using level 9 compression will yield the best results regardless of compression method. Sometimes, with text files, results may differ among levels, but that cannot be anticipated. ZPAQ will notmally provide the best compression at level 9, but will be slow. With binary files, using one of the binary filters -- x86, arm, etc. which use Branch Call Jump analysis to improve compression, will improve results
However, one setting above all can really improve results. That is -p1 which will inhibit multi-threading and allow the largest possible block size to be compressed. Using -p1 for decompression is not beneficial and will only slow things down. Text file compression was far more influenced by -p1.
Here are some results.
Text Files
Target file: enwik8 100,000,000 bytes
Level: 9
File type: Text
Method | Compressed Size | Time to Compress | p1 Compressed Size | p1 Time to Compress | p1 Compress % | p1 Time % |
---|---|---|---|---|---|---|
bzip2 | 28,796,694 | 5.230 | 28,738,621 | 10.100 | 0.202% | -93.117% |
bzip3 | 22,417,476 | 7.130 | 21,107,931 | 14.000 | 5.842% | -96.353% |
gzip | 35,723,370 | 4.910 | 35,714,597 | 9.210 | 0.025% | -87.576% |
lzo | 39,797,973 | 5.710 | 39,785,060 | 14.160 | 0.032% | -147.986% |
lzma | 25,118,871 | 53.450 | 25,118,871 | 53.870 | 0.000% | -0.786% |
zpaq | 20,332,905 | 129.080 | 19,563,012 | 190.590 | 3.786% | -47.653% |
zstd | 27,794,089 | 15.730 | 25,663,122 | 68.320 | 7.667% | -334.329% |
zpaq had the best compression overall, followed by bzip3. zstd had the greatest compression benefit using -p1, but it also paid the greatest time penalty in percent, 334% more than 4x longer! bzip3 had the next best benefit, followed by zpaq. For thie file, bzip3 and zstd had the fastest and best compression combination.
Binary Files
Target File: bin.tar 100,000,000 bytes
Level: 9 using --x86 filter
File Type: x86 Binary
Method | Compressed Size | Time to Compress | p1 Compressed Size | p1 Time to Compress | p1 Compress % | p1 Time % |
---|---|---|---|---|---|---|
bzip2 | 30,357,383 | 4.510 | 30,352,255 | 10.100 | 0.017% | -123.947% |
bzip3 | 27,643,957 | 6.470 | 27,644,451 | 14.000 | -0.002% | -116.383% |
gzip | 31,802,035 | 5.750 | 31,809,266 | 9.210 | -0.023% | -60.174% |
lzo | 34,800,547 | 6.390 | 34,807,911 | 14.160 | -0.023% | -60.174% |
lzma | 23,973,472 | 22.340 | 23,973,472 | 53.870 | 0.000% | -141.137% |
zpaq | 20,639,397 | 110.910 | 20,260,994 | 190.590 | 1.833% | -71.842% |
zstd | 26,436,347 | 9.950 | 25,833,383 | 68.320 | 2.281% | -586.633% |
Here, with the exception of zpaq and zstd there is little benefit with a binary/random file when using -p1. From a time perspective, there is a significant time penalty for around a 2% compression benefit.