Reference files

In this test we use common test files to provide additional benchmark data for all compressors including those that did not qualify for the main benchmark.

Summary
ProgVerArgWR
CompressionRatings.Com
paq8p-59.770
lpaq878.224
ZPAQ1.00cmax.cfg7.912
UDA0.3017.529
EPMr97.353
CCM1.30cx 57.306
PPMonstrJr1-m512 -o167.279
ASH0.6/m5127.242
CMM40.1e757.117
BWMonstr0.016.616
RZM0.07h6.519
Bee0.78-m3 -d76.379
ENC0.156.335
BWTmix0c10006.316
Blizzard0.24bc 1000000006.238
GRZipII0.24-p -b8m -m16.235
bcm0.09-b1126.225
bbb1cm400q6.193
mnzip056.142
RINGS1.5c96.123
Dark0.51-b112m6.044
PPMdJr1-m256 -r1 -o86.044
ZZIP0.36c-mx -a -16m5.957
Flashzip0.93c-m2 -s7 -b55.957
Mix3e21a5.925
YBS0.3f-m16m5.910
BSSC0.95a-b163835.847
Hook1.45125.846
chile0.55.836
XWRT3.2-l12 -b32 -m325.807
  1. English text: book1
    paq 8p, ZPAQ 1.00, UDA 0.301, ASH 0.4a, lpaq 8, PPMY 2.02(3csse), BWMonstr 0.00, PPMonstr Jr1, EPM r9, BWTmix 0, ...
  2. Binary object: obj2
    UDA 0.301, paq 8p, ZPAQ 1.00, lpaq 8, PPMonstr Jr1, ASH 0.4a, EPM r9, CCM 1.30c, RZM 0.07h, CMM4 0.1e, ...
  3. Geophysical data: geo
    UDA 0.301, ZPAQ 1.00, paq 9a, lpaq 8, BWMonstr 0.00, CCM 1.30c, bbb 1, PPMonstr Jr1, Blizzard 0.24b, ASH 0.4a, ...
  4. DNA: E.coli
    ASH 0.4a, paq 8p, UDA 0.301, Blizzard 0.24b, lpaq 8, ZPAQ 0.08, BWMonstr 0.00, PPMY 2.02(3csse), bbb 1, BWTmix 0, ...
  5. Image: lena.ppm
    WinACE 2.6, paq 8p, UDA 0.301, Flashzip 0.99b1, ZPAQ 1.00, CCM 1.30c, lpaq 8, mnzip 0, SBC 0.970r3, BWMonstr 0.02, ...
  6. Audio: stereo/wav
    paq 8p, SBC 0.970r3, CCM 1.30c, BMA 1.35b, Flashzip 0.94, UDA 0.301, WinACE 2.6, BSSC 0.95a, DC 0.98b, GRZipII 0.24, ...
  7. English text: enwik6
    paq 8p, UDA 0.301, ZPAQ 1.00, lpaq 8, ASH 0.4a, PPMonstr Jr1, EPM r9, CMM4 0.1e, PPMY 2.02(3csse), BWMonstr 0.00, ...
  8. Chinese text: zhwik6
    paq 8p, UDA 0.301, ZPAQ 1.00, lpaq 8, PPMonstr Jr1, EPM r9, ASH 0.4a, CMM4 0.1e, CCM 1.30c, BWMonstr 0.00, ...
  9. Executable: acrord32.exe
    paq 8p, UDA 0.301, lpaq 8, PPMonstr Jr1, CCM 1.30c, CMM4 0.1e, ZPAQ 1.00, DURILCA 0.5, RZM 0.07h, ASH 0.6, ...
  10. Windows help: vcfiu.hlp
    paq 8p, UDA 0.301, ZPAQ 1.00, lpaq 8, PPMonstr Jr1, EPM r9, CMM4 0.1e, CCM 1.30c, ASH 0.6, XWRT 3.2, ...
  11. MS Word document: ohs.doc
    paq 8p, UDA 0.301, ZPAQ 1.00, lpaq 8, PPMonstr Jr1, CMM4 0.1e, EPM r9, CCM 1.30c, BWMonstr 0.02, RZM 0.07h, ...
  12. Apache log: fp.log (cut to 5 MB)
    paq 8p, lpaq 8, ZPAQ 1.00, ASH 0.6, EPM r9, CCM 1.30c, PPMonstr Jr1, CMM4 0.1e, UDA 0.301, BWMonstr 0.01, ...
  13. PDF: flashmx.pdf (precomp processed and cut to 5 MB)
    paq 8p, UDA 0.301, ZPAQ 1.00, lpaq 8, PPMonstr Jr1, CCM 1.30c, CMM4 0.1e, EPM r9, BWMonstr 0.02, XWRT 3.2, ...

About this test

All compressors that pass the qualification without an error are tested here with known test files to provide additional benchmark data. Another reason to run benchmarks with these files is that some experts may prefer to see the results for these files despite the problems that arise using them. Because of the problems discussed below it is recommended for non-experts to keep to the the main Compression Ratings benchmark instead.

The files used here are often part of collections (like Calgary Corpus) that are supposed to sample various types of data to give broad evaluation over wide range of data types. These test files are flawed for various reasons that include:

  1. The files are too small
  2. A single file does not establish a general case
  3. Compressors are tuned to these files
  4. The files are poorly chosen [1]

Using these files may work for developing a compressor (evaluating changes to algorithms), but the files suit poorly for comparing 2 or more compressors. We can observe that the main Compression Ratings benchmark by large does not suffer from any of these points.

To address the point 4. We have not included some files that fit into this category especially. Because of 1. it follows that the measured timings are skewed for programs that slow down as the size of input grows and compression ratio is difficult to measure (also because of 3.) as the decoder program size takes large portition of the output size. To address this, we introduce "weighted compression ratio" (WR) [2] that discounts 90% of the decompressor program size, but no more than 85000 bytes (roughly 90% of the median decompression program size). So the 90% discount is an arbitrary number to limit random effect from including the decompressor size to the results for small files. We limit the discount to the median because the purpose is to eliminate this effect, but not to eliminate the decompressor size (such as static dictionaries) completely. No ratings are provided in these tests because of the results require interpretation even to establish compressor performance for these specific files. (We cannot make conclusions about compressors strenght over certain types of files because of 2.)

Currently there is an issue with programs that are configured to use SFX. We are unable to display all numbers for these programs (since no decompressor size is known). This will be fixed in the future.

[1] Files are too esoteric (performance doesn't translate to other data) that small tweaks to compressors may improve compression ratio dramatically (but have no effect in other files). Or files that are already compressed or contain errors (like book1).
[2] WR=sizeof(input)/(sizeof(output)+sizeof(decompressor)-0.9*min(95kb,sizeof(decompressor)))