CompressionRatings.Com runs a series of tests that measure the effectiveness of software (called compressor or archiver) to losslessly compress data.
Compression ratings benchmark site consists of several parts. The main benchmark currently ranks 542 program configurations (as of February 2010) with multigigabyte public corpus consisting 12 categories:
In addition the benchmark provides following additional categories that are not considered in the overall benchmark summary: Database, 2 Medical image sets, PGN game notation set and 2 other rare sets of data.
For a program to qualify on the main benchmark it is first run on 2 qualification rounds consisting of samples from the main corpus. Currently over 350 entries qualify for the main benchmark. All compressors are provided separate information page where tables list the historical performance over past versions with various metrics (such as percentage difference for compression speed, etc).
For each dataset raw benchmark data consisting of size, compression time, decompression time and memory use are provided in large tables. These tables can be sorted by any column. Graphs and number of simplified listings are provided. Instead of a simple ranking system the site provides various metrics to measure compression and decompression efficiency with various weights for speed and ratio. A calculator tool is provided for custom user ratings and calculating more complex tasks such as weighting decompression time more than compression. Additional tool for comparing two or more compressors is provided that makes large table out of the entire benchmark data for selected compressors. A summary tables are provided and special table that provides a single page summary for the top 10 programs by the performance metrics for all data sets.
The site has a separate BWT comparison benchmark where BWT based compressors are compared over a set of files. Currently it consists of some 40 BWT programs. The comparison also tries to probe the programs by repeatedly compressing a file with various different transformations added.
Another section is the reference files benchmark where programs are tested with 13 mostly well known test files, such as ones used in the Calgary corpus and elsewhere. For this section all programs automatically qualify. Additional WR metric is provided which gives only limited weight to the decompression program size.
The benchmark is open, meaning authors can submit compressors with unlimited number of option sets. The site is promptly updated on new submissions.
The easiest objective way of evaluating a compressor is by looking if another compressor exist that can compress the same data faster and still achieve the same (or better) ratio. If no such program exist, this is the best result for the compression time and ratio. We can do the same for decompression time and ratio. The set of such compressors make the "pareto frontier".
The pareto frontier is used in many places at the site to present data. Tables use underlining for times that belong to pareto frontier. Some data visualizations at the site present multiple pareto frontiers where the pareto set is removed and a pareto frontier is computed again for the remaining data. We can compute the "distance" to the pareto frontier for any given data.
For easy evaluation of compressors we simply compute the distance to the pareto frontier. We do this for compression and decompression times. These distances are listed in the full table results. To further simplify this we assign "stars" for each compressor. For the nearest 5 distances we assign ½ stars for both compression and decompression. This makes 5 stars for a compressor that is pareto frontier in both compression and decompression. We quantize the data before computing the stars so that two or more programs near each other (ratio and speed) will be considered having the same performance.
Each compressor is assigned several rating numbers based on how the software performs. These numbers reflect how effectively the software makes use of resources to compress data. The most technically superior compressor has the highest rating.
Several types of ratings are provided to measure different aspects or give emphasis on various variables.
The ratings do not reflect the most "practical" compression, since that may be subjective. Instead the ratings reflect the technical merits. We may nevertheless argue these ratings can be as a guide to understand the practical file compression performance as well.
To make ratings easily readable, a common compressor software either Info-ZIP 2.3, 7-Zip 4.20 or PPMd Jr1 compressor, which ever performs the best (for the particular test), is assigned a rating number which is power of 10.
The compression ratio (and the size) is composed of the sum of (compressed) decompression program size and the program output. The size column in the test results reflect this sum. An exception is made for the program pages, for the "qualification ratings" section, where ratings and ratio are calculated without the decoder program size.
The results are presented in detail in the 'Technical' lists. The simple 'list' pages display a filtered version of the results with only the best configuration(s) for each compressor and omit most of the technical information.
It is impossible to represent all compressible data in a test of finite size. To produce results that are meaningful, the data must mainly reflect the practical considerations from end-user file compression point of view.
No research data is available on what kind of files average end-users compress the most. Even if there was, we should use rational guidelines for selecting a range of different kinds of data and more specifically data, which can show various strenghts and weaknesses of different algorithms to minimize possible bias. The emphasis must be on the real existing formats and encodings used in common file types. Therefore the corpus used by CompressionRatings.Com attempts to make use of various data types and formats commonly used in the internet and the general software domain. Often common formats may itself have a compression in place, this is why other types of data are used which contain more raw data. These types may be older formats.
The main corpus contains (roughly) 5 kinds of data. These are software, text, audio, image and a mixture of these (game data). Two of these (software and mixture data) can easily be argued to be important. These types of data make up the most of compressed data downloaded through internet by end-users (such as Open Office installation package and large software patches). Large amounts of losslessly compressed audio, image and text data are uncommon and virtually non-existent in the practical end-user point of view (a case for lossless audio compression can be made). However these types of data make up an important areas in lossless compression research and are therefore sampled in the corpus.
The main corpus does not wholly represent the practical end-user needs. It is a blend of practical data and data ("technically interesting" data) that help better bring out the technical differences for compression software. Unlike the main corpus, the "extended" section is purely for testing unusual and special data.
To download the corpus, click here.
A compressor candidate is a compression software with a set of parameters. Additional global parameters are listed at the program page. Each such candidate will be tested in identical environment.
First a candidate is tested and rated in 2 qualification rounds with a limited test data. If the candidate shows performance that exceeds the current qualification threshold in either of the rounds, it qualifies for the full tests and will be rated.
The candidate qualifies if any of the following conditions are met:
The "extended" section of the benchmark is based on unusual data and therefore performances there do not have an effect for the overall ratings.
Since spring 2010 the testing platform is Intel Core 2 Quad, Q6600 "Kentsfield" @ 2.9 GHz, 4 GB DDR2 800 MHz memory with Windows XP 64. The CPU has 2x4 MB L2 cache and 4 cores. It supports MMX, SSE1, SSE2, SSE3 and SSSE3 instruction sets. (Previously the tests were run under AMD Athlon XP 2200+ 1800 MHz with 768 MB memory using Windows XP 32.)
In the past we used the sum of user and kernel time to exclude the disk IO time. Currently the testing is done under 3 GB RAM drive and the global time is measured. The kernel and user times are measured also and the relation is presented in the detailed full tables as a percentage compared to the global time. This number will be 400% for a program that makes full use of the quad core.
The operating system has the pagefile disabled. We can guarantee that no program will be slowed down by disk swapping. The available memory is 600-650 MB for a program to use. We check the maximum size of contiguous free block of memory and ensure it is at least 600 MB before each program is run.
Compression and decompression operation is repeated up to 5 times for each compressor and test until 30 seconds passes. The fastest (global) time is then used. This ensures the time information is as precise as possible.
To obtain decompression program size we compress each program binary with 7-Zip (lzma). If the size of (compressed) source code or SFX archive (stub) is smaller than the compressed program size, we will manually override the decompressor size for such program. Decompressor sizes are listed for each program version in their respective program pages. Note that to avoid running the benchmark twice we actually did not use the SFX for decompression and presume that the program is able to do so.
The files are always verified for lossless decompression.
Since the spring 2010 update. All the tests have been run again with the new hardware for all configurations that were not previously disqualified due to an error. Some software has been updated to latest versions (64-bit versions if available). Since the testing software has been written again from scratch, some compressors that were previously disqualified (that have not been tested again) on technical grounds may qualify now. If you think this is the case, please contact us and we can test the program again.
Software author can submit a piece of software to be tested. The author should provide:
Often a reasonable number of configurations is 5-10 for a compressor, but for some compressors as many as 20-30 might make sense. To not have a strict limit means the authors should show good faith by requesting only a reasonable number of configurations that make sense for the given compressor.
Tests that contain multiple files will be first glued with the TAR format for compressors that are unable to recurse directory tree (and restore it) or understand the asterisk (*) as a wildcard character for all files.
The following conditions must be met in order for a software to be tested:
When submiting software, the software author is encouraged to provide additional information, such as technical changes for the current version, etc. And to do so for each version release. This information will be quoted in the program page.
There are currently no forms to submit software for testing, simply contact compressionratings.com informally by email. The author that is swift notifying a new software release helps maintaining the benchmark up to date.