Here are the download links and instructions to construct the corpus.
Open Office Org. Precomped with its headers removed.
mirror: http://compressionratings.com/files/cr_app1.tar.7z
MinGW compiler with files copied into a single file in random order.
mirror: http://compressionratings.com/files/cr_app2.7z
http://sourceforge.net/project/downloading.php?groupname=portableapps&filename=PortableApps.com_Suite_Light_Setup_1.1.exe
mirror: http://compressionratings.com/files/cr_app3.tar.7z
Firefox 3.6.3, Inkscape 0.4.7.3, Thunderbird 3.0.4 and VLC 1.0.5 binaries. Precomped with its headers removed.
mirror: http://compressionratings.com/files/cr_app4.tar.7z
Audio1 is a highly diverse collection of CD-quality music. This corpus presents extreme range of different kinds of music samples. The corpus is original. It is made for CompressionRatings.Com.
Currently a mini version of the corpus is being used:
http://compressionratings.com/files/cr_audio1-mini.tar.7z (88,0 MB, contains uncompressed files)
http://compressionratings.com/files/cr_audio1-mini.zip (58,5 MB, contains flac-compressed files)
http://compressionratings.com/files/cr_audio1-mini.exe (55,9 MB, self-extracting nz archive containing wav-files)
Full version is available here:
http://compressionratings.com/files/cr_audio1.zip (118,1 MB, contains flac-compressed files)
http://compressionratings.com/files/cr_audio1.exe (112,7 MB, self-extracting nz archive containing wav-files)
mirror: http://compressionratings.com/files/cr_game1.tar.7z
http://www.gamershell.com/download_21853.shtml
mirror: http://compressionratings.com/files/cr_game2.tar.7z
http://www.imagecompression.info/test_images/
mirror: http://compressionratings.com/files/cr_img1.tar.7z
http://www.imagecompression.info/test_images/
mirror: http://compressionratings.com/files/cr_img2.tar.7z
English language books from the Project Gutenberg etext00-02 archives. Manually selected. For each file 16384 bytes from the beginning and end of the file removed (to remove P.G. headers and other redundant information).
http://compressionratings.com/files/cr_txt1.bz2 (26,9 MB)
http://compressionratings.com/files/cr_txt1.exe (19,8 MB, self-extracting nz archive)
http://www.cs.fit.edu/~mmahoney/compression/textdata.html
mirror: http://compressionratings.com/files/cr_os1.7z
mirror: http://compressionratings.com/files/cr_src1.tar.7z
mirror: http://compressionratings.com/files/cr_q1.7z
mirror: http://compressionratings.com/files/cr_q2.7z
mirror: http://compressionratings.com/files/cr_db1.7z
mirror: http://compressionratings.com/files/cr_fp1.7z
mirror: http://compressionratings.com/files/cr_pgn1.7z
mirror: http://compressionratings.com/files/cr_pit1.7z
Files used in the "Reference files" test:
http://compressionratings.com/files/cr_reference_files.zip
100 MB sample of Chinese Wikipedia that was used for in the "Reference files" test.
http://compressionratings.com/files/zhwik8.bz2
http://www.michael-maniscalco.com/testset/gauntlet/
mirror: http://compressionratings.com/files/gauntlet_corpus.zip