Versions tested: 8
Optimal parameters: 7
Authors: Matt Mahoney, Alexander Ratushnyak
Algorithms: X86, CM
Notable peformances: Source1


lpaq1 is a "lite" version of PAQ, about 35 times faster than paq8l at the cost of some compression (but similar to high end BWT and PPM compressors). It is a context-mixing compressor combining 7 contexts: orders 1, 2, 3, 4, 6, a lowercase unigram word context (for ASCII text), and a "match" order, which predicts the next bit in the last matching context. The independent bit predictions of the 7 models are combined using one of 80 neural networks (selected by a small context), then adjusted using 2 SSE stages (order 0 and 1) and arithmetic coded.

Prediction is bitwise. This means that an order-n context consists of the last n whole bytes plus any of the 0 to 7 previously coded bits of the current byte starting with the most significant bit. The unigram word context consists of a hash of the last (at most) 11 consecutive letters (A-Z, a-z) folded to lower case. The context does not include any nonalphabetic characters nor any characters preceding the last nonalphabetic character.

The first 6 contexts (orders 1..4, 6, word) are used to index a hash table to look up a bit-history represented by an 8-bit state. The representable states are the same as in paq8l. A state can either represent all histories up to 4 bits long, or a pair of 0,1 counts plus a flag to indicate the most recent bit. The counts are bounded by (41,0), (40,1), (12,2), (5,3), (4,4) and likewise for 1,0. When a count is exceeded, the opposite count is reduced to approximately preserve the count ratio. The last bit flag is present only for states whose total count is less than 16. There are 253 possible states.

A bit history is mapped to a probability using an adaptive table (StateMap). This differs from paq8l in that each table entry includes a count so that adaptation is rapid at first. Each table entry has a 22-bit probability (initially p = 0.5) and 10-bit count (initially n = 0) packed into 32 bits. After bit y is predicted, n is incremented up to the limit (1023) and the probability is adjusted by p := p + (y - p)/(n + 0.5). This model is stationary: p = (n1 + 0.5)/(n + 1), where n1 is the number of times y = 1 out of n.

The "match" model (MatchModel) looks up the current context in a hash table, first using a longer context, then a shorter one. If a match is found, then the following bits are predicted until there is a misprediction. The prediction is computed by mapping the predicted bit, the length of the match (1..15 or quantized by 4 in 16..62, max 62), and the last whole byte as context into a StateMap. If no match is found, then the order 0 context (last 0..7 bits of the current byte) is used as context to the StateMap.

The 7 predictions are combined using a neural network (Mixer) as in paq8l, except it is a single level network without MMX code. The inputs p_i, i=0..6 are first stretched: t_i = log(p_i/(1 - p_i)), then the output is computed: p = squash(SUM_i t_i * w_i), where squash(x) = 1/(1 + exp(-x)) is the inverse of stretch(). The weights are adjusted to reduce the error: w_i := w_i + L * t_i * (y - p) where (y - p) is the prediction error and L ~ 0.002 is the learning rate. This is a standard single layer backpropagation network modified to minimize coding cost rather than RMS prediction error (thus dropping the factors p * (1 - p) from learning).

One of 80 neural networks are selected by a context that depends on the 3 high order bits of the last whole byte plus the context order (quantized to 0, 1, 2, 3, 4, 6, 8, 12, 16, 32). The order is determined by the number of nonzero bit histories and the length of the match from MatchModel.

The Mixer output is adjusted by 2 SSE stages (called APM for adaptive probability map). An APM is a StateMap that accepts both a discrte context and an input probability, pr. pr is stetched and quantized to 24 levels. The output is interpolated between the 2 nearest table entries, and then only the nearest entry is updated. The entries are initialized to p = pr and n = 6 (to slow down initial adaptation) with a limit n <= 255. The APM differs from paq8l in that it uses the new StateMap rapid initial adaptation, does not update both adjacent table entries, and uses 24 levels instead of 33. The two stages use a discrete order 0 context (last 0..7 bits) and a hashed order-1 context (14 bits). Each output is averaged with its input weighted by 1/4.

The output is arithmetic coded. The code for a string s with probability p(s) is a number between Q and Q+p(x) where Q is the total probability of all strings lexicographically preceding s. The number is coded as a big-endian base-256 fraction. A header is prepended as follows:

Two thirds of the memory (2 * 2^N MB) is used for a hash table mapping the 6 regular contexts (orders 1-4, 6, word) to bit histories. A lookup occurs every 4 bits. The input is a byte-oriented context plus possibly the first nibble of the next byte. The output is an array of 15 bit histories (1 byte each) for all possible contexts formed by appending 0..3 more bits. The table entries have the format:

{checksum, "", 0, 1, 00, 10, 01, 11, 000, 100, 010, 110, 001, 101, 011, 111}

The second byte is the bit history for the context ending on a nibble boundary. It also serves as a priority for replacement. The states are ordered by increasing total count, where state 0 represents the initial state (no history). When a context is looked up, the 8 bit checksum (part of the hash) is compared with 3 adjacent entries, and if there is no match, the entry with the lowest priority is cleared and the new checksum is stored.

The hash table is aligned on 64 byte cache lines. A table lookup never crosses a 64 byte address boundary. Given a 32-bit hash h of the context, 8 bits are used for the checksum and 17 + N bits are used for the index i. Then the entries i, i XOR 1, and i XOR 2 are tried. The hash h is actually a collision-free permuation, consisting of multiplying the input by a large odd number mod 2^32, a 16-bit rotate, and another multiply.

The order-1 context is mapped to a bit history using a 64K direct lookup table, not a hash table.

One third of memory is used by MatchModel, divided equally between a rotating input buffer of 2^(N+19) bytes and an index (hash table) with 2^(N+17) entries. Two context hashes are maintained, a long one, h1, of length ceil((N+17)/3) bytes and a shorter one, h2, of length ceil((N+17)/5) bytes, where ceil() is the ceiling function. The index does not use collision detection. At each byte boundary, if there is not currently a match, then the bytes before the current byte are compared with the location indexed by h1. If less than 2 bytes match, then h2 is tried. If a match of length 1 or more is found, the match is maintained until the next bit mismatches the predicted bit. The table is updated at h1 and h2 after every byte.

the program source code


(Have info that should be added here? E-mail.)

Qualified configurations
Ver Rating CPR DPR S.E. R.E. Ratio C. kB/s D. kB/s
8 86 159 13 19 44 4.531 636 630

Version info

8, tar: yes, mode: 32-bit, crc32: caaab802, decoder: 10 kB, tested: 2010 Apr 13
compression: lpaq8.exe <args> <src> <cfile>
decompression: lpaq8.exe d <cfile> <src>