Download

Here are the download links and instructions to construct the corpus.

Application1

Open Office Org. Precomped with its headers removed.

mirror: http://compressionratings.com/files/cr_app1.tar.7z

Application2

MinGW compiler with files copied into a single file in random order.

mirror: http://compressionratings.com/files/cr_app2.7z

Application3

http://sourceforge.net/project/downloading.php?groupname=portableapps&filename=PortableApps.com_Suite_Light_Setup_1.1.exe

mirror: http://compressionratings.com/files/cr_app3.tar.7z

Application4

Firefox 3.6.3, Inkscape 0.4.7.3, Thunderbird 3.0.4 and VLC 1.0.5 binaries. Precomped with its headers removed.

mirror: http://compressionratings.com/files/cr_app4.tar.7z

Audio1

Audio1 is a highly diverse collection of CD-quality music. This corpus presents extreme range of different kinds of music samples. The corpus is original. It is made for CompressionRatings.Com.

Currently a mini version of the corpus is being used:

http://compressionratings.com/files/cr_audio1-mini.tar.7z (88,0 MB, contains uncompressed files)
http://compressionratings.com/files/cr_audio1-mini.zip (58,5 MB, contains flac-compressed files)
http://compressionratings.com/files/cr_audio1-mini.exe (55,9 MB, self-extracting nz archive containing wav-files)

Full version is available here:

http://compressionratings.com/files/cr_audio1.zip (118,1 MB, contains flac-compressed files)
http://compressionratings.com/files/cr_audio1.exe (112,7 MB, self-extracting nz archive containing wav-files)

Game1

http://dcemulation.org/files/homebrew/nxdoom/law56ker-nxdoom-collection.rar

mirror: http://compressionratings.com/files/cr_game1.tar.7z

Game2

http://www.gamershell.com/download_21853.shtml

mirror: http://compressionratings.com/files/cr_game2.tar.7z

Image1

http://www.imagecompression.info/test_images/

mirror: http://compressionratings.com/files/cr_img1.tar.7z

Image2

http://www.imagecompression.info/test_images/

mirror: http://compressionratings.com/files/cr_img2.tar.7z

Text1

English language books from the Project Gutenberg etext00-02 archives. Manually selected. For each file 16384 bytes from the beginning and end of the file removed (to remove P.G. headers and other redundant information).

http://compressionratings.com/files/cr_txt1.bz2 (26,9 MB)
http://compressionratings.com/files/cr_txt1.exe (19,8 MB, self-extracting nz archive)

Text2

http://www.cs.fit.edu/~mmahoney/compression/textdata.html

OS1

http://download.linhost.info/vmware/ubuntu904alpha2.7z

mirror: http://compressionratings.com/files/cr_os1.7z

Source1

ftp://ftp.irisa.fr/pub/mirrors/gcc.gnu.org/gcc/releases/gcc-4.2.0/gcc-core-4.2.0.tar.bz2

mirror: http://compressionratings.com/files/cr_src1.tar.7z

Qualifying1

mirror: http://compressionratings.com/files/cr_q1.7z

Qualifying2

mirror: http://compressionratings.com/files/cr_q2.7z

Database1

http://ftp.freedb.org/pub/freedb/freedb-complete-20080601.tar.bz2

mirror: http://compressionratings.com/files/cr_db1.7z

FP1

http://www.csl.cornell.edu/~burtscher/research/FPC/
http://www.csl.cornell.edu/~burtscher/research/FPC/datasets.html

mirror: http://compressionratings.com/files/cr_fp1.7z

Pgn1

http://www.top-5000.nl/dl/million.rar

mirror: http://compressionratings.com/files/cr_pgn1.7z

Pitches1

http://pizzachili.dcc.uchile.cl/texts/music/

mirror: http://compressionratings.com/files/cr_pit1.7z

Medical1

http://www.data-compression.info/Corpora/lukas_2d_8_tif.zip

Medical2

http://www.data-compression.info/Corpora/lukas_2d_16_tif.zip

Special

Reference files

Files used in the "Reference files" test:

http://compressionratings.com/files/cr_reference_files.zip

zhwik8

100 MB sample of Chinese Wikipedia that was used for in the "Reference files" test.

http://compressionratings.com/files/zhwik8.bz2

Other data

Gauntlet corpus

http://www.michael-maniscalco.com/testset/gauntlet/

mirror: http://compressionratings.com/files/gauntlet_corpus.zip