Reference files

In this test we use common test files to provide additional benchmark data for all compressors including those that did not qualify for the main benchmark.

Program Ver Arguments WR
paq 8p -5 9.786
lpaq 8 7 8.334
zcm 0.11 c5 7.985
zpaq 1.00 cmax.cfg 7.918
ppmonstr Jr1 -m600 -o16 7.443
ash 0.7b1 /m512 7.387
ccm 1.30c x 5 7.370
epm r9 7.365
cmm4 0.1e 75 7.348
bit 0.7 -p=4 7.230
durilca 0.5 -m384 -o8 7.121
nanozip 0.07a -m.5g -cc 7.015
bee 0.79 -m3 -d7 6.791
bwmonstr 0.02 6.599
rzm 0.07h 6.593
bwtmix 1s c1150 6.437
enc 0.15 6.399
m1 x2-0.6 6 enwik7.txt 6.356
blizzard 0.24b c 100000000 6.294
m03 0.2a1 100000000 6.288
rings 1.5c 5 6.282
grzipii 0.24 -p -b8m -m1 6.240
bcm 0.09 -b118 6.233
ppmd sh8 /m590 /o16 /r 6.222
bbb 1 cm400q 6.196
bma 1.35b -mx -m52m 6.194
bsc 2.2.0 -b110t -p -m3 6.190
mnzip 0 5 6.174
dark 0.51 p-b118m 6.130
paf 03a -m550 6.114
  1. English text: book1
    paq 7, zpaq 1.00, lpaq 8, ash 0.7b1, ppmonstr Jr1, ppmy 2.02s, durilca 0.5, dc 0.98b, bwtmix 1s, cmm4 0.1e, ...
  2. Binary object: obj2
    paq 7, ppmonstr Jr1, zpaq 1.00, lpaq 8, zcm 0.11, durilca 0.5, ccm 1.30c, cmm4 0.1e, epm r9, rzm 0.07h, ...
  3. Geophysical data: geo
    paq 7, lpaq 8, zpaq 1.00, zcm 0.03, ppmonstr Jr1, ccm 1.30c, bwtmix 1s, blizzard 0.24b, bbb 1, durilca 0.5, ...
  4. DNA: E.coli
    paq 8p, blizzard 0.24b, lpaq 8, ash 0.7b1, zpaq 1.00, ppmy 2.02s, bwtmix 1s, bbb 1, zcm 0.03, bwmonstr 0.02, ...
  5. Image: lena.ppm
    winace 2.6, paq 8p, nanozip 0.09a, zpaq 1.00, ccm 1.30c, zcm 0.03, lpaq 8, winrar 3.93, mnzip 0, sbc 0.970r3, ...
  6. Audio: stereo/wav
    nanozip 0.07a, freearc 0.60, paq 8p, sbc 0.970r3, squeez 5.62, ccm 1.30c, zcm 0.11, bma 1.35b, uharc 0.6b, winace 2.6, ...
  7. English text: enwik6
    paq 8p, zpaq 1.00, lpaq 8, ppmonstr Jr1, ash 0.7b1, durilca 0.5, zcm 0.03, cmm4 0.1e, epm r9, bee 0.79, ...
  8. Chinese text: zhwik6
    paq 8p, lpaq 8, zpaq 1.00, ppmonstr Jr1, durilca 0.5, cmm4 0.1e, epm r9, zcm 0.03, ash 0.7b1, bit 0.7, ...
  9. Executable: acrord32.exe
    paq 8p, nanozip 0.07a, lpaq 8, zcm 0.11, ppmonstr Jr1, durilca 0.5, ccm 1.30c, cmm4 0.1e, uharc 0.6b, zpaq 1.00, ...
  10. Windows help: vcfiu.hlp
    paq 8p, zpaq 1.00, lpaq 8, zcm 0.11, nanozip 0.07a, ppmonstr Jr1, epm r9, cmm4 0.1e, bit 0.7, ccm 1.30c, ...
  11. MS Word document: ohs.doc
    paq 8p, nanozip 0.07a, zpaq 1.00, zcm 0.11, lpaq 8, ppmonstr Jr1, cmm4 0.1e, durilca 0.5, bit 0.7, epm r9, ...
  12. Apache log: fp.log (cut to 5 MB)
    paq 8p, lpaq 8, zcm 0.11, zpaq 1.00, ash 0.7b1, epm r9, ccm 1.30c, cmm4 0.1e, ppmonstr Jr1, bit 0.7, ...
  13. PDF: flashmx.pdf (precomp processed and cut to 5 MB)
    paq 8p, zpaq 1.00, lpaq 8, zcm 0.03, ppmonstr Jr1, ccm 1.30c, cmm4 0.1e, bit 0.7, durilca 0.5, nanozip 0.07a, ...

About this test

All compressors that pass the qualification without an error are tested here with known test files to provide additional benchmark data. Another reason to run benchmarks with these files is that some experts may prefer to see the results for these files despite the problems that arise using them.

The files used here are often part of collections (like Calgary Corpus) that sample various types of data attempting to providing broad evaluation over a range of data types. These test files are flawed for various reasons that include:

  1. The files are too small
  2. A single file does not establish a general case
  3. Compressors are tuned to these files
  4. The files are poorly chosen [1]

Using these files may work for developing a compressor (evaluating changes to algorithms), but the files suit poorly for comparing 2 or more compressors. We can observe that the main Compression Ratings benchmark by large does not suffer from any of these points.

To address the point 4. We have not included some files that fit into this category especially. Because of 1. it follows that the measured timings are skewed for programs that slow down as the size of input grows and compression ratio is difficult to measure (also because of 3.) as the decoder program size takes large portition of the output size. To address this, we do not count the size of decompressor program in this test (the size column lists the raw compressed file size) and we introduce "weighted compression ratio" (WR) [2] that is compression ratio with decompressor size counted in, except that we discount 90% of the decompressor size, but no more than 85000 bytes (roughly 90% of the median uncompressed decompression program size). The 90% discount is an arbitrary number to limit random effect from including the decompressor size to the results for small files. We limit the discount to the median because the purpose is to attempt lessen this effect, but not to eliminate the decompressor size (such as static dictionaries) completely. No ratings are provided in these tests because of the results require interpretation even to establish compressor performance for these specific files. (We cannot make conclusions about compressors strenght over certain types of files because of 2.)

[1] Files are too special (performance doesn't translate to other data) that small tweaks to a program may improve compression ratio dramatically (but have no effect in other files). Or files that are already compressed or contain errors (like book1).
[2] WR=sizeof(input)/(sizeof(output)+sizeof(decompressor)-0.9*min(95kb,sizeof(decompressor)))