The following is quoted from bred.ps where Wheeler describes the programs bred (block reduce) and bexp (block expand) in 1995.
The data transforms are as follows: First the data is read and where there are more than two repeats of a character, the sequence is slightly changed in a recoverable way so that the maximum length of a repeated character is about 31 for a 2 32 repeats. This avoids a special case and eliminates the slow down of the quicksort routine for repeated characters. [RLE]bred.ps
Next the character string is grouped by each character and a set of pointers is generated to each character set. The characters are packed four to a word with a separate word for each character. The data is sorted by a quicksort variant dealing with the smallest groups first. After each group is sorted the lower three characters are replaced by an integer giving the position in that group. Thus the remaining sorts still work and the depths of comparisons are reduced. [BWT sorting phase]
The next or predicted characters are scanned and converted by move to front [MTF], and the frequencies established.
From the frequencies we set up a Huffman coding table. The code used is not a Huffman code but one of equivalent performance. We use the lengths of codes generated by a Huffman process, but the codes are derived from the length information and the known ordering of the alphabet. Thus only the lengths are needed to transmit the coding. [HUF] They are transmitted taking some advantage of the fact that the table is almost monotonic decreasing. Next the table of lengths is transmitted followed by the encoded data.
Huffman and other coding
The basic coding used is not Huffman although it is of equal efficiency. The Huffman coding lengths are derived, and then a code constructed from the lengths with the following extra properties. The maximum length of a code is limited to 23 bits. For the codes of a given length, the code values are in reverse order of the character set. The longer codes have the smaller coding values. The longest code of all zeros is used as the end of file. These properties are used to simplify the decoding. The program also constructs a table to decode directly all codes of less than nine bits.
As the predicted character list has a very large number of repeats the frequency of zeroes may exceed one half, leading to a lack of efficiency of the code which will need at least one bit for each coding. To avoid this problem, and to use the fact that there is correlation between succesive zeroes, we use a one two grouping of zeroes i.e. immediate repeats. This works by extending the character set by one value. We increase other codes by 1 and use 0 or 1 to represent groups of zeroes as follows. The first 0/1 gives 1/2 zeroes. The second 0/1 gives 2/4 zeroes. The third 0/1 gives 4/8 zeroes etc. This nearly always gives some correlation gain, but is assymptotically poor. In fact for very large numbers of repeats, a 1,2,3,4 system will give improvements.
Usually, the frequency of the 0's is greater than the 1's, often leading to codes of different length. It is obvious in this case that when the repeat number uses three or more codes, a permutation would save bits. In fact, the coding for 7 to 14 should be 000,001,010,100,011,101,110,111 rather than 000,001,010,011,100,101,110,111. This assumes longer strings are less frequent.
Long strings of repeats are removed in a way similar to the 1/2 method used for long strings of zeroes produced by the move to front algorithm. If a character c is repeated n times making n+1 in all. Then after the first two we encode with c and c L 1 giving 1/2,2/4 etc. If the character following the groups of c's is c L 1 or c L 2, an extra c L 2 is inserted.
This maintains most of the predictability, and the losses if any seem small. It is possible for a file to become larger, but this probably enhances the local predictability so that the overall effect is quite small.
The character string is sorted by prefix backwards, and a list of pointers initially pointing at every prefix is rearranged so that the list of pointers gives the prefixes in alphabetic order. Thus the text "there are twelve" would be sorted as shown below.position 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
letter t h e r e a r e t w e l v e
pointer 10 6 7 3 9 5 16 13 2 14 8 4 1 11 15 12
This assumes that before t there is a start character whose value is lower than the value of space. First pointers to machine words are set where each word contains four characters backwards arranged. By counting the frequencies of the letters on input we can set pointers in groups to point at those prefixes starting with a, b, c etc. The groups are arranged in order of size and the smallest group sorted first by the quicksort routine QQS. QQS splits the group being sorted into those that are equal and then uses the next word -four down- to continue the sort. This depth first strategy uses the cache effectively. Many apparently better methods are slow because the cache is not used effectively. When a group is sorted the three least significant bytes are replaced by the sequence number (plus 32) in that group. Thus any further sort that uses data that group will not need further depth to complete the sort.
In front of the words we place 32 zero words. As no internal prefix can have longer than 32 zeroes, this causes comparisons however deep to end correctly if we also scan the first prefixes and avoid sorting initial zeroes. We do not need to sort them and if we did we would get the wrong type sort! A start of text of -1 rather than a long enough string of zeroes would have avoided this hazard, at the cost of one character beyond 256.
The most significant bit of the packed words is reversed to cause unsigned comparisons using signed long values.
Three years later for version 3 Wheeler described modifications in bred3.ps.
The Bred routine set have been changed and improved. The input filter to remove multiples has been removed and a different sorting algorithm used which deals well with simple repeats. To ensure workability in all cases, when the sorting algorithm becomes slow the block size is repeatedly halved, so that the block is still compressed, but at a lower efficiency, due to the smaller block size.bred3.ps
A set of tables can be used (up to eight in the current implementation) which gives adaption to changing statistics, and reduces the inefficiencies of Huffman coding. Currently, the -m option followed by a decimal number specifies the number of groups 1-8 the size of the groups 25, 50, 100 or 200 and the maximum number of iterations allowed. The default is one group, for which no iterations take place.
The routine improves coding efficiency by scanning the data to be coded after the move to front and 1/2 transforms, and choosing the best table of the set for the next group of symbols. The selection is coded and mixed with the rest of the coded data. The extra cost of the tables and selection bits is more than compensated for by the improved coding efficiency, although small files default to a single table.
The method could be readily improved by additional searching, and the decoding not slowed any further. However, the increased small gain would be bought at a cost of cosiderable extra time.
The bulk space needed by bred3 is five times the -K parameter+256.000 bytes. [...] The block size is reduced under certain rare conditions.
The bulk space used by bexp3 is four ( or eight where long is eight bytes) times the -K parameter.
Our modified bred3 (bred version 3) expansion routine (bexp) supports large block size, uses 5n instead of 4n for blocks larger than 16 MB, and has an option to disable the block halving, etc.
The compressor is featured in the BWT comparison.
(Have info that should be added here? E-mail.)
Wheeler's original compressor with functional modifications. The default configuration crashes on Image1.
|crash while compressing (img1)