CompressionRatings.Com runs a series of tests that measure the effectiveness of software (called compressor or archiver) to losslessly compress data.
Compression ratings benchmark site consists of several parts. The main benchmark currently ranks 542 program configurations (as of February 2010) with multigigabyte public corpus consisting 12 categories:
In addition the benchmark provides following additional categories that are not considered in the overall benchmark summary: Database, 2 Medical image sets, PGN game notation set and 2 other rare sets of data.
For a program to qualify on the main benchmark it is first run on 2 qualification rounds consisting of samples from the main corpus. Currently over 350 entries qualify for the main benchmark. All compressors are provided separate information page where tables list the historical performance over past versions with various metrics (such as percentage difference for compression speed, etc).
For each dataset raw benchmark data consisting of size, compression time, decompression time and memory use are provided in large tables. These tables can be sorted by any column. Graphs and number of simplified listings are provided. Instead of a simple ranking system the site provides various metrics to measure compression and decompression efficiency with various weights for speed and ratio. A calculator tool is provided for custom user ratings and calculating more complex tasks such as weighting decompression time more than compression. Additional tool for comparing two or more compressors is provided that makes large table out of the entire benchmark data for selected compressors. A summary tables are provided and special table that provides a single page summary for the top 10 programs by the performance metrics for all data sets.
The site has a separate BWT comparison benchmark where BWT based compressors are compared over a set of files. Currently it consists of some 40 BWT programs. The comparison also tries to probe the programs by repeatedly compressing a file with various different transformations added.
Another section is the reference files benchmark where programs are tested with 13 mostly well known test files, such as ones used in the Calgary corpus and elsewhere. For this section all programs automatically qualify. Additional WR metric is provided which gives only limited weight to the decompression program size.
The benchmark is open, meaning authors can submit compressors with unlimited number of option sets. The site is promptly updated on new submissions.
Each compressor is assigned several rating numbers based on how the software performs. These numbers reflect how effectively the software makes use of resources to compress data. The most technically superior compressor has the highest rating.
Several types of ratings are provided to measure different aspects or give emphasis on various variables.
The ratings do not reflect the most "practical" compression, since that may be subjective. Instead the ratings reflect the technical merits. We may nevertheless argue these ratings can be as a guide to understand the practical file compression performance as well.
To make ratings easily readable, a common compressor software either Info-ZIP 2.3, 7-Zip 4.20 or PPMd Jr1 compressor, which ever performs the best (for the particular test), is assigned a rating number which is power of 10.
The compression ratio (and the size) is composed of the sum of decoder program size and the program output. The size and ratio columns in the test results reflect this sum. An exception is made for the program pages, for the "qualification ratings" section, where ratings and ratio are calculated without the decoder program size (for programs that do not use SFX).
The results are presented in detail in the 'Detailed' tables. 'Brief' tables are displaying a filtered version of the results with only the best configuration(s) for each compressor, ratio and memory columns discarded.
It is impossible to represent all compressible data in a test of finite size. To produce results that are meaningful, the data must mainly reflect the practical considerations from end-user file compression point of view.
Unfortunately no research data is available on what kind of files average end-users really compress. Even if there was, we should use rational guidelines for selecting a range of different kinds of data and more specifically data, which can show various strenghts and weaknesses of different algorithms to minimize possible bias. The emphasis must be on the real existing formats and encodings used in common file types. Therefore the corpus used by CompressionRatings.Com attempts to make use of various data types and formats commonly used in the internet and the general software domain. Often common formats may itself have a compression in place, this is why other types of data are used which contain more raw data. These types may be older formats.
The main corpus contains (roughly) 5 kinds of data. These are software, text, audio, image and a mixture of these (game data). Two of these (software and mixture data) can easily be argued to be important. These types of data make up the most of compressed data downloaded through internet by end-users (such as Open Office installation package and large software patches). Large amounts of losslessly compressed audio, image and text data are uncommon and virtually non-existent in the practical end-user point of view (a case for lossless audio compression can be made). However these types of data make up an important areas in lossless compression research and are therefore sampled in the corpus.
The main corpus does not wholly represent the practical end-user needs. It is a blend of practical data and data ("technically interesting" data) that help better bring out the technical differences for compression software. Unlike the main corpus, the "extended" section is purely for testing unusual and special data.
To download the corpus, click here.
A compressor candidate is a compression software with a set of parameters. Additional global parameters are listed at the program page. Each such candidate will be tested in identical environment.
First a candidate is tested and rated in 2 qualification rounds with a limited test data. If the candidate shows performance that exceeds the current qualification threshold in either of the rounds, it qualifies for the full tests and will be rated.
The candidate qualifies if any of the following conditions are met:
The "extended" section of the benchmark is based on unusual data and therefore performances there do not have an effect for the overall ratings.
Tests are run under 1800 MHz CPU (Athlon XP 2200+ family 6, model 8, stepping 1) with 768 MB memory using Windows XP with pagefile disabled. The CPU supports MMX, SSE1, 3DNow!, 3DNow!+ instruction sets.
The listed timings are the sums of user and kernel times. The disk IO speed is not included to total times due the inaccurary of the real environment (operating system cache, background processes, hardware, etc).
Compression is repeated up to 5 times for each compressor and test until 30 seconds passes. The arithmetic mean is then listed. The same is done for the decompression. This ensures the time information is very accurate.
Before each test is run, system cache will be flushed by making a single read pass over the test files in alphabetical order. The files are also verified for lossless decompression.
Software author can submit a piece of software to be tested. The author should provide:
Often a reasonable number of configurations is 5-10 for a compressor, but for some compressors as many as 20-30 might make sense. To not have a strict limit means the authors should show good faith by requesting only a reasonable number of configurations that make sense for the given compressor.
Tests that contain multiple files will be first glued with the TAR format for compressors that are unable to recurse directory tree (and restore it) or understand the asterisk (*) as a wildcard character for all files.
The following conditions must be met in order for a software to be tested:
When submiting software, the software author is encouraged to provide additional information, such as technical changes for the current version, etc. And to do so for each version release. This information will be quoted in the program page.
There are currently no forms to submit software for testing, simply contact compressionratings.com informally by email. The author that is swift notifying a new software release helps maintaining the benchmark up to date.
Q: Why the tests are run on a single core cpu instead of multicore one?
A: It would make the benchmark less universal by introducing a strong bias which would not translate into end-user expectations with single core platforms. It is also questionable whether a compressor is really technically superior by splitting the input into two and compressing them separately. However you can donate any kind of additional hardware to CompressionRatings.Com and all the tests will be duplicated for the platform.
Q: Does this test reflect the practical or theoretical performance of a compressor?
A: The practical, since the test data is such that it is really being made use in file compression. For a "theoretical" test literally any kind of results can be achieved, for a lossless compressor only assigns shorter codes for some inputs and longer for others, and no compressor is better than another.
Q: Shouldn't we perform a transformations to the files until the top compressors perform badly to disable optimizations inside of the highest rated compressors?
A: Often the opposite happens, the transformation enables other optimizations which cause further compression loss. As a result, almost any simple compressor outperforms complex optimized compressors. We produce results for the transformed data, which does not represent real data and we got a transformation that did not do what it was supposed to do.