BWT Compression Comparison

In this test we compare Burrows-Wheeler based compressors. We especially try to compare such compressors that do not use preprocessing.

Definition

We define BWT compressor to be a program that applies BWT transformation before compression. When decompressing the program applies inverse BWT transformation after decompressing the data.

We define BWT compressor without preprocessing to be such that it does not apply any kind of transformation to the data before the context sorting phase of Burrows-Wheeler transform.

Programs

This is a special test. We include unrated compressors that do not qualify for the main benchmark and run compressors with only a single configuration for maximum compression. We do not assign ratings and no size for decoder program is considered. For some unrated compressors we do not list times nor other info at all (for technical reasons). We do not even verify correct decompression for some programs (where n/a is marked for d.time). Authors are welcome to contact me if they find their compressors are missing or tested incorrectly.

We have no decisive evidence whether a program uses preprocessing or not without the source code. Hence we must use undisclosed arbitrary qualifying methods to choose the set of programs for this test.

Brief comments about the programs which are considered for the test.

DC 0.98b, BSSC 0.95a, BA 1.01b, BMA 1.35b

Closed source programs that use preprocessing. We try disabling the preprocessing by using the provided switches.

Dark 0.51, GRZipII 0.24, szip 1.12, ecp 0.e.s

The programs are open source and have switches to disable preprocessing.

IMP 1.12

The program's decoder is open source. The program does not use preprocessing but it switches from BWT to LZ if BWT sorting is problematic. This probably happens in this test only with the enwik9.

ZZIP 0.36c, bzip 0.21, bzip2 1.0.5, mnzip 0, GRZip 0.7.3, bred 3

Programs are open source and use preprocessing. ZZIP has a maximum blocksize of 15 MB.

RINGS 1.5c

It appears the program is using ST, not BWT. The program apparently uses preprocessing that cannot be disabled.

Blizzard 0.24b, SBC 0.970r3, YBS 0.3f, GCA 0.9k

Closed source programs that are very likely using preprocessing. No switches are provided to disable it.

M99 2.21, M03 0.2a1, UHBC 1.0, bcm 0.09, mcomp 2.00, M03exp 5.2, BWMonstr 0.02, bwic 0, X1 0.95a, QLFC 6.6

Likely, but unproven that preprocessing is not used in these programs.

bwtzip 0, chile 0.5, ABC 2.4, bwc 0.99d, bbb 1, MAr 0, BWTmix 1

Programs are open source and are not using preprocessing.

The following open source programs are not using preprocessing: bbb, bwtzip, chile, ABC, bwc, Dark, BWTmix. Because we do not wish to discard all other compressors, we include them anyway with hopes that they do not use heavy preprocessing for the test files. We discard RINGS and SBC since they do not fit well to the concept of BWT without preprocessing. We would have reasons to discard other compressors such as YBS, Blizzard, IMP, ZZIP, but we hope their preprocessing is rather inefficient for our test files and we include them anyway. This results arbitrary comparison, but this we knew from the start.

The programs are set up with maximum memory setting that works in the test machine. Dark reports out of memory using 125 MB block, therefore a smaller blocksize was used. For BMA the following switches are used in addition to the ones listed in the tables: '-rv- -pe- -t- -d- -l- -z- -tx-'. We run BMA with -sif0 because otherwise the program attempts to choose the best compression method that may lead of confusing results especially with the transformation test.

Memory Requirements

Program Version Space for a block
BWMonstr0.2~0.6n
bbb11.25n (+4n)
BWTmix15n
bcm0.095n
Blizzard0.24b6n
M030.2a16n
UHBC1.08n
Dark0.515n
M992.216n
QLFC6.66n
M03exp5.212n?
chile0.57n
BMA1.35b12n
BSSC0.95a9n
YBS0.3f9n
bred35n
GRZipII0.2410n
ZZIP0.36c12n
GRZip0.7.310n?
ABC2.410n?
bwc0.99d8n
bwtzip0~50n
ecp0.e.s5n?
szip1.125n
GCA0.9k?
bzip0.218n?
bzip21.0.58n
IMP1.125n
X10.95a5n?

Most programs use 5 to 10n blocksize memory. bbb 0 and BWMonstr 0.02 work with less than 5n memory. bbb uses 1.25n blocksize memory, but requires 4n space for temporary files. BWMonstr uses less than 0.6n blocksize memory. The table beside gives an estimated memory requirements for each compressor compressing enwik9.

Test files

We will use special test files for this test: enwik9 for a large test file. A smaller 5 000 000 byte sample of enwik9 to attempt get as many programs as possible to compress the file using a single block. We also add book1 and obj2 (from the Calgary Corpus). Finally a random file DejaVuSans.ttf from Application1.

enwik9

enwik9
ProgVerArgSizeCTDTCMDM
CompressionRatings.Com
BWMonstr0.02160 468 597331881.06156147.83590585
bbb1cb500000000q169 339 1546286.274415.94619604
BWTmix1c1250178 510 0435920.882351.44616616
bcm0.09-b115179 315 4432170.411771.19586586
M030.2a1100000000180 893 1111357.561006.09577574
Blizzard0.24bc 100000000182 780 2641250.52871.73573572
UHBC1.0-b75m -m3185 166 6871954.281337.86n/an/a
Dark0.51-b112mrf185 516 042800.01645.64562562
M992.21-m 100m186 572 3161001.38813.01612603
QLFC6.6100m190 040 6421903.25603.98574574
M03exp5.232 MB191 259 160n/an/an/an/a
chile0.5-b=90000192 097 53712255.588852.03619529
BMA1.35b-sif0 -mx -m52m195 271 5744090.80606.17593377
BSSC0.95a-b16383aeflz201 678 505896.42263.9813881
YBS0.3f-m16m202 110 5101075.17293.5014581
bred3-m729 -w0 -M125202 675 9421838.81525.67598598
GRZipII0.24-m1 -b8m -p207 894 667562.16334.168343
ZZIP0.36c-mx -96m211 182 1171147.02408.6918178
GRZip0.7.3211 659 211800.88377.288040
ABC2.4213 717 700782.86632.454556
bwc0.99d-m16m217 224 9051113.81279.6912965
bwtzip012000000220 678 8134566.81614.0957670
ecp0.e.s-b127226 202 0002043.00462.846375
szip1.12-b41o0227 789 0631278.91317.452020
GCA0.9k240 007 964870.09319.75229
bzip0.21251 879 486780.05361.6286
bzip21.0.5-9253 977 891577.95191.3874
IMP1.12-2257 928 864413.20127.5666
X10.95am7264 638 028597.28202.03n/an/a
bwic0crash while compressing
MAr0-a bwtverify error
mnzip05verify error
BA1.01b-k -50 -r -n -xverify error
mcomp2.00-M123M -mwverify error
DC0.98b-cprleab15200verify error

enwik5m

enwik5m
ProgVerArgSizeCTDTCMDM
CompressionRatings.Com
BWMonstr0.021 169 1551484.38832.063332
BWTmix1c12501 193 19018.6210.824343
M030.2a11000000001 194 9465.634.283030
bcm0.09-b1151 195 1669.378.023434
M03exp5.232 MB1 199 291n/an/an/an/a
UHBC1.0-b75m -m31 213 4066.665.56n/an/a
Blizzard0.24bc 1000000001 217 4573.813.233130
bbb1cb500000000q1 219 52821.0320.8648613
GRZipII0.24-m1 -b8m -p1 221 2192.391.753726
ABC2.41 224 0063.883.534353
YBS0.3f-m16m1 226 1463.881.454425
mcomp2.00-M123M -mw1 229 8203.082.492626
BSSC0.95a-b16383aeflz1 230 1953.331.225530
mnzip051 230 9945.761.237628
DC0.98b-cprleab152001 235 2652.201.693125
GRZip0.7.31 236 1934.092.055124
Dark0.51-b112mrf1 241 8662.252.362525
M992.21-m 100m1 244 3334.003.454031
bwic01 245 81210.737.14n/an/a
QLFC6.6100m1 259 2863.811.973029
BMA1.35b-sif0 -mx -m52m1 263 2256.642.235431
chile0.5-b=900001 268 94147.6243.3311830
BA1.01b-k -50 -r -n -x1 270 3764.261.89n/an/a
ZZIP0.36c-mx -96m1 275 2933.672.025825
bwtzip0120000001 316 96112.082.9523030
bwc0.99d-m16m1 318 5424.141.333920
szip1.12-b41o01 322 3395.561.562020
bred3-m729 -w0 -M1251 336 8702.951.692525
MAr0-a bwt1 350 35712.251.833430
ecp0.e.s-b1271 352 9854.922.172529
GCA0.9k1 363 5464.161.66159
bzip0.211 442 5583.911.8386
bzip21.0.5-91 454 7792.700.9774
IMP1.12-21 459 3211.690.7766
X10.95am71 514 6052.890.95n/an/a

book1

book1
ProgVerArgSizeCTDTCMDM
CompressionRatings.Com
BWMonstr0.02204 844175.1479.733030
BWTmix1c1250208 2082.431.682323
bcm0.09-b115208 7261.421.241413
M030.2a1100000000210 4830.850.6965
M03exp5.232 MB211 303n/an/an/an/a
Blizzard0.24bc 100000000212 1300.520.4766
bbb1cb500000000q213 1623.272.274858
YBS0.3f-m16m213 1620.500.2584
UHBC1.0-b75m -m3213 6411.220.81n/an/a
ABC2.4213 9520.580.531921
BSSC0.95a-b16383aeflz214 6180.450.22375
bwic0215 2451.451.12n/an/a
DC0.98b-cprleab15200215 3530.310.3065
GRZipII0.24-m1 -b8m -p215 3770.340.33135
Dark0.51-b112mrf215 5120.330.3954
mnzip05216 5720.770.22405
M992.21-m 100m217 2670.610.55147
mcomp2.00-M123M -mw217 4030.490.4266
GCA0.9k217 7570.660.30117
BMA1.35b-sif0 -mx -m52m219 0940.670.3396
QLFC6.6100m219 1080.440.2865
GRZip0.7.3219 3250.580.36164
ZZIP0.36c-mx -96m220 7460.470.33104
BA1.01b-k -50 -r -n -x221 0990.550.33n/an/a
chile0.5-b=90000222 5137.016.66936
szip1.12-b41o0225 6560.620.2574
bwtzip012000000228 3951.720.44365
bzip0.21230 2470.670.3675
bwc0.99d-m16m230 7250.550.2374
bzip21.0.5-9232 5980.480.1964
bred3-m729 -w0 -M125233 2240.360.2754
MAr0-a bwt234 4352.780.3065
IMP1.12-2235 0730.280.1755
ecp0.e.s-b127239 2580.560.3355
X10.95am7250 1210.480.19n/an/a

obj2

obj2
ProgVerArgSizeCTDTCMDM
CompressionRatings.Com
BWMonstr0.0265 70673.7556.893029
M03exp5.232 MB70 668n/an/an/an/a
M030.2a110000000070 8490.300.2242
BWTmix1c125071 7000.780.612020
bcm0.09-b11572 1110.620.591111
GRZipII0.24-m1 -b8m -p72 3500.100.09102
GRZip0.7.372 6160.170.11112
ABC2.472 9640.430.4198
UHBC1.0-b75m -m373 1910.400.31n/an/a
mnzip0573 1960.220.06352
mcomp2.00-M123M -mw73 4960.150.1343
BSSC0.95a-b16383aeflz73 7980.140.06192
Blizzard0.24bc 10000000074 4490.140.1233
GCA0.9k74 5100.140.0843
QLFC6.6100m74 7890.110.0832
BA1.01b-k -50 -r -n -x74 8360.120.09n/an/a
chile0.5-b=9000075 2142.302.12903
YBS0.3f-m16m75 4100.120.0732
DC0.98b-cprleab1520075 6320.080.0833
Dark0.51-b112mrf75 8510.090.0822
bzip0.2176 0170.170.0832
szip1.12-b41o076 0220.140.0652
ZZIP0.36c-mx -96m76 0990.130.0922
M992.21-m 100m76 3710.240.19124
bzip21.0.5-976 4410.120.0532
bbb1cb500000000q76 5271.251.034857
bwc0.99d-m16m76 7230.140.0632
ecp0.e.s-b12777 2950.160.0822
MAr0-a bwt77 5398.110.0632
bred3-m729 -w0 -M12577 6920.100.0522
IMP1.12-277 9460.070.0322
BMA1.35b-sif0 -mx -m52m78 1160.190.1243
bwic078 7740.470.38n/an/a
bwtzip01200000080 1330.510.14132
X10.95am7crash while decompressing

ttf

ttf
ProgVerArgSizeCTDTCMDM
CompressionRatings.Com
BWMonstr0.02275 070188.12159.003030
BWTmix1c1250295 4751.941.512222
bcm0.09-b115296 1601.781.681312
M03exp5.232 MB300 445n/an/an/an/a
M030.2a1100000000301 0980.910.6964
mcomp2.00-M123M -mw301 4610.450.4155
Blizzard0.24bc 100000000302 3350.380.3155
GCA0.9k303 1610.470.2096
UHBC1.0-b75m -m3303 8021.551.45n/an/a
GRZipII0.24-m1 -b8m -p306 1860.360.38124
bbb1cb500000000q306 8473.053.224857
GRZip0.7.3307 1890.560.45143
mnzip05307 1940.560.19384
BSSC0.95a-b16383aeflz308 7210.380.20204
QLFC6.6100m312 5340.300.2244
ABC2.4313 8130.500.661819
YBS0.3f-m16m314 1270.330.2363
M992.21-m 100m314 2360.800.67145
Dark0.51-b112mrf314 2660.280.3144
DC0.98b-cprleab15200314 6770.250.2354
chile0.5-b=90000316 7715.475.22924
szip1.12-b41o0317 5910.340.1773
bwic0319 2861.241.17n/an/a
BA1.01b-k -50 -r -n -x319 3720.420.33n/an/a
bzip0.21320 6460.440.2554
ZZIP0.36c-mx -96m320 9450.380.3044
bzip21.0.5-9322 1650.300.1453
BMA1.35b-sif0 -mx -m52m323 5450.390.3475
bwc0.99d-m16m324 0150.360.1953
bred3-m729 -w0 -M125326 0090.360.2244
IMP1.12-2327 2180.200.1144
MAr0-a bwt327 95920.360.2054
ecp0.e.s-b127329 0240.500.3144
bwtzip012000000337 7621.660.52264
X10.95am7verify error

Transformation Test

Finally we try to get indicators of possible preprocessing and compressor behavior by applying various transformations to the data before compression. The table below lists compressed sizes for book1 with various transformations added. We use red color if a transformation causes the size to grow compared to the book1 result without transformation and green color if the transformation improves compression.

  1. book1
  2. book1 (reverse)
  3. book1 (x[i]=255-x[i])
  4. book1 (x[i]=0x55^x[i])
  5. book1 (x[i]=x[i]*31 mod 256)
  6. book1 (reorder)
  7. book1 (DC 0.98b -n -s -d -e)

The results indicate that no compressor use heavy preprocessing for the original data. GCA 0.9k, bcm 0.09, BWTmix 1 and Blizzard 0.24 do not benefit from DC preprocessing, which is designed to help BWT compression. This may suggest that these programs may have preprocessing. BWTmix does not, but the program has a case insensitive model, which boosts the performance for text data. ZZIP uses preprocessing, but DC preprocessing works much better for it. This may be the case with some other compressors like BA.

The arguments used for each program are the same as in other tests.

Transformation Test
ProgVer1234567
CompressionRatings.Com
BWMonstr0.02204 844204 966204 183204 361208 456203 476202 298
BWTmix1208 208208 714208 076208 119214 138207 179209 081
bcm0.09208 726209 065208 900208 720213 875207 553209 197
M030.2a1210 483210 681210 362210 399212 304209 791207 109
M03exp5.2211 303n/an/an/an/an/a207 358
Blizzard0.24b212 130212 076211 988212 317217 176211 596212 840
bbb1213 162213 134213 123213 121218 558211 629212 937
YBS0.3f213 162214 333213 178213 144217 332211 628211 025
UHBC1.0213 641214 367213 669213 518217 844212 122210 427
ABC2.4213 952214 785213 970213 942217 641212 356211 218
BSSC0.95a214 618215 733214 581214 461218 668212 930212 577
bwic0215 245217 888216 622215 478220 609213 587213 373
DC0.98b215 353216 517215 410215 300219 460213 744212 725
GRZipII0.24215 377216 271215 241215 139219 331213 621212 296
Dark0.51215 512214 427215 567215 638218 491214 687213 876
mnzip0216 572217 187216 585216 597220 542214 973212 074
M992.21217 267218 248217 320217 173221 478215 445213 859
mcomp2.00217 403217 021217 481217 410222 833215 572215 664
GCA0.9k217 757217 634217 806217 879224 099216 278222 022
BMA1.35b219 094218 416219 144219 124223 561217 241216 698
QLFC6.6219 108219 815219 052219 065222 757217 454213 853
GRZip0.7.3219 325220 073219 230219 120223 375217 541214 913
ZZIP0.36c220 746223 401224 048223 576227 697222 359219 045
BA1.01b221 099223 316223 339223 182226 936221 978219 820
chile0.5222 513224 089222 833222 811227 055221 265217 853
szip1.12225 656225 134225 624225 738227 965224 986221 923
bwtzip0228 395229 679228 289228 359232 383226 803225 882
bzip0.21230 247231 512230 267230 180233 749228 843224 172
bwc0.99d230 725229 556230 732230 763233 225230 180225 876
bzip21.0.5232 598234 538232 568232 559235 851231 416225 420
bred3233 224231 982233 311233 299236 066232 645228 038
MAr0234 435235 166234 558234 371238 855232 998228 514
IMP1.12235 073236 947235 096234 929238 448233 786228 094
ecp0.e.s239 258238 252239 203238 974244 434237 030233 351
X10.95a250 121247 398250 150250 186252 355249 857246 029