Reference files

In this test we use common test files to provide additional benchmark data for all compressors including those that did not qualify for the main benchmark.

Summary
ProgramVerArgumentsWR
 
paq8p-59.786
lpaq878.334
zpaq1.00cmax.cfg7.918
ppmonstrJr1-m600 -o167.443
ash0.7b1/m5127.387
ccm1.30cx 57.370
epmr97.365
cmm40.1e757.348
bit0.7-p=47.230
durilca0.5-m384 -o87.121
nanozip0.07a-m.5g -cc7.015
bee0.79-m3 -d76.791
bwmonstr0.026.599
rzm0.07h6.593
bwtmix1sc11506.437
enc0.156.399
m1x2-0.66 enwik7.txt6.356
blizzard0.24bc 1000000006.294
m030.2a11000000006.288
rings1.5c56.282
grzipii0.24-p -b8m -m16.240
bcm0.09-b1186.233
ppmdsh8/m590 /o166.222
bbb1cfm100q6.196
bma1.35b-mx -m52m6.194
bsc2.2.0-p -m36.190
mnzip056.174
dark0.51p-b118m6.130
paf03a-r16.114
lzpxj1.2he6.103
  1. English text: book1
    paq 7, zpaq 1.00, lpaq 8, ash 0.7b1, ppmonstr Jr1, ppmy 2.02s, durilca 0.5, dc 0.98b, bwtmix 1s, cmm4 0.1e, ...
  2. Binary object: obj2
    paq 7, ppmonstr Jr1, zpaq 1.00, lpaq 8, durilca 0.5, ccm 1.30c, cmm4 0.1e, epm r9, rzm 0.07h, ash 0.7b1, ...
  3. Geophysical data: geo
    paq 7, lpaq 8, zpaq 1.00, ppmonstr Jr1, ccm 1.30c, bwtmix 1s, blizzard 0.24b, bbb 1, durilca 0.5, cmm4 0.1e, ...
  4. DNA: E.coli
    paq 8p, blizzard 0.24b, lpaq 8, ash 0.7b1, zpaq 1.00, ppmy 2.02s, bwtmix 1s, bbb 1, bwmonstr 0.02, ha 0.999b, ...
  5. Image: lena.ppm
    winace 2.6, paq 8p, zpaq 1.00, nanozip 0.08a, ccm 1.30c, lpaq 8, winrar 3.93, mnzip 0, sbc 0.970r3, yzx 0.03, ...
  6. Audio: stereo/wav
    nanozip 0.07a, freearc 0.60, paq 8p, sbc 0.970r3, squeez 5.62, ccm 1.30c, bma 1.35b, uharc 0.6b, winace 2.6, winrar 3.93, ...
  7. English text: enwik6
    paq 8p, zpaq 1.00, lpaq 8, ppmonstr Jr1, ash 0.7b1, durilca 0.5, cmm4 0.1e, epm r9, bee 0.79, ppmy 2.02s, ...
  8. Chinese text: zhwik6
    paq 8p, lpaq 8, zpaq 1.00, ppmonstr Jr1, durilca 0.5, cmm4 0.1e, epm r9, ash 0.7b1, bit 0.7, ccm 1.30c, ...
  9. Executable: acrord32.exe
    paq 8p, nanozip 0.07a, lpaq 8, ppmonstr Jr1, durilca 0.5, ccm 1.30c, cmm4 0.1e, uharc 0.6b, zpaq 1.00, rzm 0.07h, ...
  10. Windows help: vcfiu.hlp
    paq 8p, zpaq 1.00, lpaq 8, nanozip 0.07a, ppmonstr Jr1, epm r9, cmm4 0.1e, bit 0.7, ccm 1.30c, durilca 0.5, ...
  11. MS Word document: ohs.doc
    paq 8p, nanozip 0.07a, zpaq 1.00, lpaq 8, ppmonstr Jr1, cmm4 0.1e, durilca 0.5, bit 0.7, epm r9, ccm 1.30c, ...
  12. Apache log: fp.log (cut to 5 MB)
    paq 8p, lpaq 8, zpaq 1.00, ash 0.7b1, epm r9, ccm 1.30c, cmm4 0.1e, ppmonstr Jr1, bit 0.7, durilca 0.5, ...
  13. PDF: flashmx.pdf (precomp processed and cut to 5 MB)
    paq 8p, zpaq 1.00, lpaq 8, ppmonstr Jr1, ccm 1.30c, cmm4 0.1e, bit 0.7, durilca 0.5, nanozip 0.07a, epm r9, ...

About this test

All compressors that pass the qualification without an error are tested here with known test files to provide additional benchmark data. Another reason to run benchmarks with these files is that some experts may prefer to see the results for these files despite the problems that arise using them.

The files used here are often part of collections (like Calgary Corpus) that sample various types of data attempting to providing broad evaluation over a range of data types. These test files are flawed for various reasons that include:

  1. The files are too small
  2. A single file does not establish a general case
  3. Compressors are tuned to these files
  4. The files are poorly chosen [1]

Using these files may work for developing a compressor (evaluating changes to algorithms), but the files suit poorly for comparing 2 or more compressors. We can observe that the main Compression Ratings benchmark by large does not suffer from any of these points.

To address the point 4. We have not included some files that fit into this category especially. Because of 1. it follows that the measured timings are skewed for programs that slow down as the size of input grows and compression ratio is difficult to measure (also because of 3.) as the decoder program size takes large portition of the output size. To address this, we do not count the size of decompressor program in this test (the size column lists the raw compressed file size) and we introduce "weighted compression ratio" (WR) [2] that is compression ratio with decompressor size counted in, except that we discount 90% of the decompressor size, but no more than 85000 bytes (roughly 90% of the median uncompressed decompression program size). The 90% discount is an arbitrary number to limit random effect from including the decompressor size to the results for small files. We limit the discount to the median because the purpose is to attempt lessen this effect, but not to eliminate the decompressor size (such as static dictionaries) completely. No ratings are provided in these tests because of the results require interpretation even to establish compressor performance for these specific files. (We cannot make conclusions about compressors strenght over certain types of files because of 2.)

[1] Files are too special (performance doesn't translate to other data) that small tweaks to a program may improve compression ratio dramatically (but have no effect in other files). Or files that are already compressed or contain errors (like book1).
[2] WR=sizeof(input)/(sizeof(output)+sizeof(decompressor)-0.9*min(95kb,sizeof(decompressor)))