Image Compression at Wikia

At Wikia, unhealthy we have a ton of images and we serve them up constantly on our 1 billion+ pageviews per month. Since we weren’t compressing our images, purchase this left a fairly big potential area for improvement.

Stating what is probably obvious: having lossless image compression gives smaller filesizes which means lower bandwidth costs, sildenafil but more importantly, it gives a speedup in the page load times since the user doesn’t have to spend as much time downloading the images.

Compressing user-generated images

Since I’d recently seen smush.it (which is built into Google Page Speed) give me a “lossless” image that looked way worse than the original, it didn’t seem right to just bulldoze the images that user uploaded. Instead, it seemed best to use a bot to upload better versions. If it turned out that one of our lossless compression algorithms actually hurt the image quality, the community could roll it back quite easily.

This means that we won’t save file-storage space (because we’ll actually keep the old version also), but we still get the other benefits.

Research!

There are a ton of tools for image compression floating around and it wasn’t clear from secondary-research (ie: googling) which were the best. So I decided to do some primary research. Side-note: A few years ago at a previous startup (which was later acquired by LinkedIn) I did some similar reasearch on PNG compression with a much smaller dataset but with more compressors.

This all started as a Wikia Hackathon project in which I wrote a bot which could download an image, compress it and re-upload it. There was a lot of buy-in for this idea, so my first tests were built from that script. I chose 10 wikis and used the Wikia API (via the Perl MediaWiki API library that I co-author) to find 100 images from each wiki. I compress PNGs and JPGs and ignore the others.

The raw data can be found on ImageBot’s User Page. But here are some takeaways:

  • We could save 11.10% across all images
  • pngcrush compressed more than optipng (when pngcrush was given a good long time to do its best methods) and jpegoptim compressed more than jpegtran.
  • But it really doesn’t matter which were better because BOTH was better than either. Meaning that if we compressed using both methods, then chose the best compression, that result was better. pngcrush saved 20.89% on average but always choosing the smallest PNG of the two, resulted in 23.83% savings. Similarly, jpegoptim was 5.40% on average, but JPGs in general compressed to 5.94% when using the better method for each image.

Leave a Reply

Your email address will not be published. Required fields are marked *