BWT Compression Comparison

In this test we compare Burrows-Wheeler based compressors. We especially try to compare such compressors that do not use preprocessing.

  1. Definition
  2. Programs
  3. Memory Requirements
  4. Test Files
  5. Results:
    enwik9
    enwik5m
    zhwik8
    book1
    obj2
    ttf
    Transformations (with percentages)

Definition

We define BWT compressor to be a program that applies BWT transformation before compression. When decompressing the program applies inverse BWT transformation after decompressing the data.

We define BWT compressor without preprocessing to be such that it does not apply any kind of transformation to the data before the context sorting phase of Burrows-Wheeler transform.

Programs

For this test we include compressors that do not qualify for the main benchmark and run compressors with only a single configuration for maximum compression. It follows that some programs have faster modes of operation. See the main benchmarks for exhaustive comparison of such modes. We do not assign ratings and no size for decoder program is considered. Authors are welcome to write feedback if they find their programs are missing or tested incorrectly.

We have no decisive evidence whether a program uses preprocessing or not without the source code. Hence we must use undisclosed arbitrary qualifying methods to choose the set of programs for this test.

Brief comments about the programs which are considered for the test.

dc 0.98b, bssc 0.95a, ba 1.01b, bma 1.35b, ybs 0.3f

Closed source programs that use preprocessing. We try disabling the preprocessing by using the provided switches.

dark 0.51, grzipii 0.24, bsc 2.2.0, szip 1.12, ecp 0.e.s

The programs are open source and have switches to disable preprocessing.

imp 1.12

The program's decoder is open source. The program does not use preprocessing but it switches from BWT to LZ if BWT sorting is problematic. This probably happens in this test only with the enwik9.

zzip 0.36c, bzip 0.21, bzip2 1.0.5, mnzip 0, grzip 0.7.3, bred 3

Programs are open source and use preprocessing. ZZIP has a maximum blocksize of 15 MB.

rings 1.5c

It appears the program is using ST, not BWT. The program apparently uses preprocessing that cannot be disabled.

blizzard 0.24b, sbc 0.970r3, gca 0.9k

Closed source programs that are very likely using preprocessing. No switches are provided to disable it.

m99 2.21, m03 0.2a1, uhbc 1.0, bcm 0.11, mcomp 2.00, m03exp 5.2, bwmonstr 0.02, bwic 0, x1 0.95a, qlfc 6.6

It is likely that preprocessing is not used in these programs.

bwtzip 0, chile 0.5, abc 2.4, bwc 0.99d, bbb 1, mar 0, bwtmix 1

Programs are open source and are not using preprocessing.

The following open source programs are not using preprocessing: bbb, bwtzip, chile, abc, bwc, dark, bwmmix. Because we do not wish to discard all other compressors, we include them anyway with hopes that they do not use heavy preprocessing for the test files. We discard rings and sbc since they do not fit well to the concept of BWT without preprocessing. We would have reasons to discard other compressors such as blizzard, imp, zzip, but we hope their preprocessing is rather inefficient for our test files and we include them anyway.

The programs are set up with maximum memory setting that works in the test machine. For bma the following switches are used in addition to the ones listed in the tables: '-rv- -pe- -t- -d- -l- -z- -tx-'. We run bma with -sif0 because otherwise the program attempts to choose the best compression method that may lead of confusing results especially with the transformation test.

Update 2010 spring: some of the very old compressors no longer run on the test platform: ba, x1, uhbc, m03exp.

Memory Requirements

Program Version Space for a block
bwmonstr0.02~0.6n
bbb11.25n (+4n)
bwtmix15n
bcm0.115n
bsc2.2.05n
blizzard0.24b6n
m030.2a16n
uhbc1.08n
dark0.515n
m992.216n
qlfc6.66n
m03exp5.212n?
chile0.57n
bma1.35b12n
bssc0.95a9n
ybs0.3f9n
bred35n
grzipii0.2410n
zzip0.36c12n
grzip0.7.310n?
abc2.410n?
bwc0.99d8n
bwtzip0~50n
ecp0.e.s5n?
szip1.125n
gca0.9k?
bzip0.218n?
bzip21.0.58n
imp1.125n
x10.95a5n?

Most programs use 5 to 10n blocksize memory. bbb 1 and bwmonstr 0.02 work with less than 5n memory. bbb uses 1.25n blocksize memory, but requires 4n space for temporary files and runs out of disk space on the test platform with 500MB block, so we use 334MB block instead. bwmonstr uses less than 0.6n blocksize memory. The table beside gives an estimated memory requirements for each compressor compressing enwik9.

Test files

We will use special test files for this test: enwik9 for a large test file. A smaller 5 000 000 byte sample of enwik9 to attempt get as many programs as possible to compress the file using a single block. We also add book1 and obj2 (from the Calgary Corpus) and a sample of Chinese Wikipedia (zhwik8). Finally a random file DejaVuSans.ttf from Application1.

enwik9

enwik9
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.02160 468 59716.063790.0939053012.55322592587
bbb1cb333333334172 179 82417.22201.361001380.86100446406
bwtmix1c1250178 510 04317.91924.28100824.00100618618
bsc2.4.5-pt -m3 -b120179 910 09018.0149.2217445.91308605605
bcm0.11-b118180 484 78718.0458.06100331.92100594594
m030.2a1100000000180 893 11118.1526.25100394.20100579576
blizzard0.24bc 100000000182 780 26418.3377.67100296.12100575574
mcomp2.00-M112M -mw183 146 63118.3257.80100241.66100566565
dark0.51p-b112mrf185 516 04218.6242.20100275.56100564564
m992.21-m 96m186 635 79518.7365.38100303.12100591581
qlfc6.6100000000190 040 64219.0527.52100201.98100576576
bma1.35b-sif0 -mx -m52m195 271 57419.5711.22100211.06100595379
bssc0.95a-b16383aeflz201 678 50520.2291.48100120.7310013982
ybs0.3f-op -m16m202 110 51020.2314.88100134.6110014682
bred3-w0 -M100 -m729204 611 93420.5366.89100174.53100480480
grzipii0.24-m1 -b8m -p207 894 66720.8204.75100151.221008444
zzip0.36c-mx -15m211 182 11721.1349.67100191.8310018379
grzip0.7.3211 659 21121.2230.25100157.801008141
abc2.4213 717 70021.4279.34100264.501004657
bwtzip012000000220 678 81322.11594.38100271.4510057871
ecp0.e.s-b127226 202 00022.6820.17100183.811006476
gca0.9k240 007 96424.0209.7010095.811002310
bzip0.21251 879 48625.2358.64100135.3610097
bzip21.0.5253 977 89125.4126.3110049.0510085
imp1.12-2257 928 86625.8108.349925.9110077
chile0.5-b=90000disqualified: verify error
mar0-a bwtdisqualified: verify error
mnzip05disqualified: verify error
dc0.98b-cprleab15200disqualified: verify error

enwik5m

enwik5m
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.021 169 15523.4306.09389279.053163333
bwtmix1c12501 193 19023.96.341004.011004444
m030.2a11000000001 194 94623.92.421001.891003130
bcm0.11-b1181 203 33024.11.69991.441002626
bsc2.4.5-pt -m3 -b1201 207 96824.20.482740.234002626
blizzard0.24bc 1000000001 217 45724.31.251001.201003131
bbb1cb3333333341 219 52824.48.141006.2510032813
grzipii0.24-m1 -b8m -p1 221 21924.40.841000.781003827
abc2.41 224 00624.51.391001.481004554
ybs0.3f-op -m16m1 226 14624.51.261000.661004525
mcomp2.00-M112M -mw1 229 82024.61.031001.031002727
bssc0.95a-b16383aeflz1 230 19524.61.261000.551005631
mnzip051 230 99424.61.941000.551007729
dc0.98b-cprleab152001 235 26524.70.80980.80983226
grzip0.7.31 236 19324.71.191000.841005225
dark0.51p-b112mrf1 241 86624.80.801001.061002626
m992.21-m 96m1 244 33324.91.701001.501004032
qlfc6.61000000001 259 28625.21.301000.781003130
bma1.35b-sif0 -mx -m52m1 263 22525.32.031000.911005532
chile0.5-b=900001 268 94125.424.4210023.0310011931
zzip0.36c-mx -15m1 275 29325.51.391000.941005926
bwtzip0120000001 316 96126.35.411001.3610023131
bred3-w0 -M100 -m7291 336 87026.70.911000.611002625
mar0-a bwt1 350 35727.04.731000.661003530
ecp0.e.s-b1271 352 98527.11.881000.911002630
gca0.9k1 363 54627.30.94980.52971610
bzip0.211 442 55828.91.831000.7010097
bzip21.0.51 454 77929.10.581000.2510085
imp1.12-21 459 32329.20.451000.1610077

zhwik8

zhwik8
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.0221 834 05021.86121.003905635.953169494
bwtmix1c125022 458 89222.5171.7010083.50100499499
m030.2a110000000022 521 59822.551.4710041.64100577576
bsc2.4.5-pt -m3 -b12022 567 04822.613.121914.89321481481
bbb1cb33333333422 743 53322.7184.24100139.44100362127
bcm0.11-b11822 751 40822.843.5210032.38100481481
blizzard0.24bc 10000000022 945 12922.935.0910029.17100574572
mcomp2.00-M112M -mw22 957 47223.024.2210022.64100482482
m992.21-m 96m23 463 76823.536.7810032.88100586577
dark0.51p-b112mrf23 487 45923.523.2010026.38100481480
qlfc6.610000000023 604 16323.639.5810019.50100575575
chile0.5-b=9000024 176 10924.2519.12100467.34100619531
bma1.35b-sif0 -mx -m52m24 731 67624.754.8610023.39100583328
bssc0.95a-b16383aeflz25 166 00325.225.8910011.6710013982
ybs0.3f-op -m16m25 502 20625.526.5010013.3310014682
bred3-w0 -M100 -m72925 570 77125.629.1210016.08100480480
dc0.98b-cprleab1520025 787 90525.819.5510015.92999377
grzipii0.24-m1 -b8m -p25 975 62326.019.2510016.661006044
grzip0.7.326 210 90426.223.4410017.621007940
zzip0.36c-mx -15m26 312 98726.328.0310019.171009277
abc2.426 804 04726.826.2310027.891004656
bwtzip01200000027 498 05627.5149.1910027.6610054871
ecp0.e.s-b12727 745 64827.743.9510018.191006476
gca0.9k29 536 14929.518.861009.91992310
bzip0.2130 254 45930.334.0510014.1210097
imp1.12-230 507 73030.58.33993.009777
bzip21.0.530 541 16730.511.201005.0510085
mar0-a bwtdisqualified: verify error
mnzip05disqualified: verify error

book1

book1
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.02204 84426.637.6138427.953453131
bwtmix1c1250208 20827.10.881000.611002424
bcm0.11-b118209 77127.30.201000.199966
m030.2a1100000000210 48327.40.361000.2710076
blizzard0.24bc 100000000212 13027.60.161000.1210076
bsc2.4.5-pt -m3 -b120213 11827.70.064020.034031717
ybs0.3f-op -m16m213 16227.70.111000.0610085
bbb1cb333333334213 16227.71.241000.731003278
abc2.4213 95227.80.201000.171002022
bssc0.95a-b16383aeflz214 61827.90.16990.0598376
dc0.98b-cprleab15200215 35328.00.11850.118576
grzipii0.24-m1 -b8m -p215 37728.00.111000.09100145
dark0.51p-b112mrf215 51228.00.111000.119965
mnzip05216 57228.20.191000.05100416
m992.21-m 96m217 26728.30.251000.20100157
mcomp2.00-M112M -mw217 40328.30.161000.148977
gca0.9k217 75728.30.14890.0983128
bma1.35b-sif0 -mx -m52m219 09428.50.201000.0999107
qlfc6.6100000000219 10828.50.161000.0610066
grzip0.7.3219 32528.50.16990.1199165
zzip0.36c-mx -15m220 74628.70.17990.1199105
chile0.5-b=90000222 51328.93.671003.48100946
bwtzip012000000228 39529.70.781000.17100376
bzip0.21230 24730.00.301000.119886
bzip21.0.5232 59830.30.091000.039775
bred3-w0 -M100 -m729233 22430.30.11990.0510055
mar0-a bwt234 43530.51.051000.059876
imp1.12-2235 07530.60.08990.0110066
ecp0.e.s-b127239 25831.10.221000.0910066

obj2

obj2
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.0265 70626.615.7338517.002993130
m030.2a110000000070 84928.70.12860.0910053
bwtmix1c125071 70029.10.281000.201002121
grzipii0.24-m1 -b8m -p72 35029.30.031000.03100113
grzip0.7.372 61629.40.05980.03100122
bcm0.11-b11872 94729.60.061000.059833
abc2.472 96429.60.161000.14100109
mnzip0573 19629.70.061000.01100363
mcomp2.00-M112M -mw73 49629.80.05980.056654
bssc0.95a-b16383aeflz73 79829.90.05980.01100193
bsc2.4.5-pt -m3 -b12073 92130.00.033520.014131616
blizzard0.24bc 10000000074 44930.20.051000.0510043
gca0.9k74 51030.20.05670.034854
qlfc6.610000000074 78930.30.051000.0110033
chile0.5-b=9000075 21430.51.241001.12100913
ybs0.3f-op -m16m75 41030.60.03970.0110043
dc0.98b-cprleab1520075 63230.60.05660.056744
dark0.51p-b112mrf75 85130.70.031000.0310033
bzip0.2176 01730.80.081000.0310043
zzip0.36c-mx -15m76 09930.80.051000.0310033
m992.21-m 96m76 37130.90.091000.0879134
bzip21.0.576 44131.00.011000.0110033
bbb1cb33333333476 52731.00.451000.331003278
ecp0.e.s-b12777 29531.30.061000.0310033
mar0-a bwt77 53931.44.311000.0110043
bred3-w0 -M100 -m72977 69231.50.031000.0110033
imp1.12-277 94931.60.011000.00033
bma1.35b-sif0 -mx -m52m78 11631.60.06980.0310054
bwtzip01200000080 13332.50.221000.0698133

ttf

ttf
ProgramVerArgumentsSize %C.Time %D.Time %C.MD.M
 
bwmonstr0.02275 07047.338.6238748.502823131
bwtmix1c1250295 47550.80.701000.521002323
bcm0.11-b118299 78451.60.161000.1410055
m030.2a1100000000301 09851.80.381000.3010075
mcomp2.00-M112M -mw301 46151.90.121000.128766
blizzard0.24bc 100000000302 33552.00.12870.1110065
bsc2.4.5-pt -m3 -b120302 98252.10.063500.053961717
gca0.9k303 16152.10.12860.0879107
grzipii0.24-m1 -b8m -p306 18652.70.141000.14100135
bbb1cb333333334306 84752.80.971000.891003278
grzip0.7.3307 18952.80.17990.16100154
mnzip05307 19452.80.171000.0698395
bssc0.95a-b16383aeflz308 72153.10.12990.06100215
qlfc6.6100000000312 53453.80.11990.089955
abc2.4313 81354.00.191000.251001921
ybs0.3f-op -m16m314 12754.00.091000.0810074
m992.21-m 96m314 23654.10.331000.2894156
dark0.51p-b112mrf314 26654.10.111000.119955
dc0.98b-cprleab15200314 67754.10.11850.118565
chile0.5-b=90000316 77154.52.921002.77100935
bzip0.21320 64655.20.221000.119965
zzip0.36c-mx -15m320 94555.20.161000.1210054
bzip21.0.5322 16555.40.08990.0310064
bma1.35b-sif0 -mx -m52m323 54555.70.141000.1110086
bred3-w0 -M100 -m729326 00956.10.121000.0610054
imp1.12-2327 21856.30.061000.0110055
mar0-a bwt327 95956.411.411000.0610065
ecp0.e.s-b127329 02456.60.201000.1210055
bwtzip012000000337 76258.10.671000.20100275

Transformation Test

Finally we try to get indicators of possible preprocessing and compressor behavior by applying various transformations to the data before compression. The table below lists compressed sizes for book1 with various transformations added. We use red color if a transformation causes the size to grow compared to the book1 result without transformation and green color if the transformation improves compression.

  1. book1
  2. book1 (reverse)
  3. book1 (x[i]=255-x[i])
  4. book1 (x[i]=0x55^x[i])
  5. book1 (x[i]=x[i]*31 mod 256)
  6. book1 (reorder)
  7. book1 (dc 0.98b -n -s -d -e)

The arguments used for each program are the same as in other tests.

Transformations
ProgVerbook12 (rev)3 (not)4 (xor)5 (mul)6 (reo)7 (dc)
 
bwmonstr0.02204 844204 966204 183204 361208 456203 476202 298
bwtmix1208 208208 714208 076208 119214 138207 179209 081
bcm0.11209 771210 125209 937209 789215 063208 498210 218
m030.2a1210 483210 681210 362210 399212 304209 791207 109
blizzard0.24b212 130212 076211 988212 317217 176211 596212 840
bsc2.4.5213 118213 934213 131213 113216 969211 633210 124
bbb1213 162213 134213 123213 121218 558211 629212 937
ybs0.3f213 162214 333213 178213 144217 332211 628211 025
abc2.4213 952214 785213 970213 942217 641212 356211 218
bssc0.95a214 618215 733214 581214 461218 668212 930212 577
dc0.98b215 353216 517215 410215 300219 460213 744212 725
grzipii0.24215 377216 271215 241215 139219 331213 621212 296
dark0.51215 512214 427215 567215 638218 491214 687213 876
mnzip0216 572217 187216 585216 597220 542214 973212 074
m992.21217 267218 248217 320217 173221 478215 445213 859
mcomp2.00217 403217 021217 481217 410222 833215 572215 664
gca0.9k217 757217 634217 806217 879224 099216 278222 022
bma1.35b219 094218 416219 144219 124223 561217 241216 698
qlfc6.6219 108219 815219 052219 065222 757217 454213 853
grzip0.7.3219 325220 073219 230219 120223 375217 541214 913
zzip0.36c220 746223 401224 048223 576227 697222 359219 045
chile0.5222 513224 089222 833222 811227 055221 265217 853
bwtzip0228 395229 679228 289228 359232 383226 803225 882
bzip0.21230 247231 512230 267230 180233 749228 843224 172
bzip21.0.5232 598234 538232 568232 559235 851231 416225 420
bred3233 224231 982233 311233 299236 066232 645228 038
mar0234 435235 166234 558234 371238 855232 998228 514
imp1.12235 075236 949235 099234 931238 449233 788228 095
ecp0.e.s239 258238 252239 203238 974244 434237 030233 351

The results indicate that no compressor use heavy preprocessing for the original data. gca 0.9k, bcm 0.11, bwtmix 1 and blizzard 0.24 do not benefit from dc preprocessing, which is designed to help BWT compression. This may suggest that these programs may have preprocessing. bwtmix does not, but the program has a case insensitive model, which boosts the performance for text data. zzip uses preprocessing, but dc preprocessing works much better for it. This may be the case with some other compressors like ba.

Here we have percentage differences instead of the absolute sizes for the same data. A sum values are provided to show the sum of size differences for all transforms. Here a large negative number is desired. We also add a mean of absolute percentages for alphabet permutation transformations (this excludes the dc preprocessing). A low number means the model does not see how different contexts are arranged. This may be negative or positive depending whether such property desired or not. Finally we list average gain for each transform. The row also shows the mean for "sum" and "mean" columns.

Transformations (percentages)
ProgVer2 (rev)3 (not)4 (xor)5 (mul)6 (reo)7 (dc)summean
 
bwmonstr0.020.060 %-0.323 %-0.236 %1.763 %-0.668 %-1.243 %-0.108 %0.610 %
bwtmix10.243 %-0.063 %-0.043 %2.848 %-0.494 %0.419 %0.485 %0.738 %
bcm0.110.169 %0.079 %0.009 %2.523 %-0.607 %0.213 %0.398 %0.677 %
m030.2a10.094 %-0.057 %-0.040 %0.865 %-0.329 %-1.603 %-0.178 %0.277 %
blizzard0.24b-0.025 %-0.067 %0.088 %2.379 %-0.252 %0.335 %0.410 %0.562 %
bsc2.4.50.383 %0.006 %-0.002 %1.807 %-0.697 %-1.405 %0.015 %0.579 %
bbb1-0.013 %-0.018 %-0.019 %2.531 %-0.719 %-0.106 %0.276 %0.660 %
ybs0.3f0.549 %0.008 %-0.008 %1.956 %-0.720 %-1.003 %0.130 %0.648 %
abc2.40.389 %0.008 %-0.005 %1.724 %-0.746 %-1.278 %0.016 %0.575 %
bssc0.95a0.520 %-0.017 %-0.073 %1.887 %-0.787 %-0.951 %0.096 %0.657 %
dc0.98b0.541 %0.026 %-0.025 %1.907 %-0.747 %-1.220 %0.080 %0.649 %
grzipii0.240.415 %-0.063 %-0.111 %1.836 %-0.815 %-1.431 %-0.028 %0.648 %
dark0.51-0.503 %0.026 %0.058 %1.382 %-0.383 %-0.759 %-0.030 %0.471 %
mnzip00.284 %0.006 %0.012 %1.833 %-0.738 %-2.077 %-0.113 %0.575 %
m992.210.452 %0.024 %-0.043 %1.938 %-0.839 %-1.569 %-0.006 %0.659 %
mcomp2.00-0.176 %0.036 %0.003 %2.498 %-0.842 %-0.800 %0.120 %0.711 %
gca0.9k-0.056 %0.023 %0.056 %2.912 %-0.679 %1.959 %0.702 %0.745 %
bma1.35b-0.309 %0.023 %0.014 %2.039 %-0.846 %-1.094 %-0.029 %0.646 %
qlfc6.60.323 %-0.026 %-0.020 %1.665 %-0.755 %-2.398 %-0.202 %0.558 %
grzip0.7.30.341 %-0.043 %-0.093 %1.847 %-0.813 %-2.012 %-0.129 %0.628 %
zzip0.36c1.203 %1.496 %1.282 %3.149 %0.731 %-0.771 %1.182 %1.572 %
chile0.50.708 %0.144 %0.134 %2.041 %-0.561 %-2.094 %0.062 %0.718 %
bwtzip00.562 %-0.046 %-0.016 %1.746 %-0.697 %-1.100 %0.075 %0.613 %
bzip0.210.549 %0.009 %-0.029 %1.521 %-0.610 %-2.638 %-0.200 %0.544 %
bzip21.0.50.834 %-0.013 %-0.017 %1.399 %-0.508 %-3.086 %-0.232 %0.554 %
bred3-0.533 %0.037 %0.032 %1.219 %-0.248 %-2.224 %-0.286 %0.414 %
mar00.312 %0.052 %-0.027 %1.885 %-0.613 %-2.526 %-0.153 %0.578 %
imp1.120.797 %0.010 %-0.061 %1.435 %-0.547 %-2.969 %-0.223 %0.570 %
ecp0.e.s-0.420 %-0.023 %-0.119 %2.163 %-0.931 %-2.469 %-0.300 %0.731 %
average gain0.265 %0.043 %0.024 %1.955 %-0.602 %-1.307 %0.063 %0.640 %