SeanColombo.com

My little corner of the internet.

Image Compression at Wikia

At Wikia, we have a ton of images and we serve them up constantly on our 1 billion+ pageviews per month. Since we weren’t compressing our images, this left a fairly big potential area for improvement.

Stating what is probably obvious: having lossless image compression gives smaller filesizes which means lower bandwidth costs, but more importantly, it gives a speedup in the page load times since the user doesn’t have to spend as much time downloading the images.

Compressing user-generated images

Since I’d recently seen smush.it (which is built into Google Page Speed) give me a “lossless” image that looked way worse than the original, it didn’t seem right to just bulldoze the images that user uploaded. Instead, it seemed best to use a bot to upload better versions. If it turned out that one of our lossless compression algorithms actually hurt the image quality, the community could roll it back quite easily.

This means that we won’t save file-storage space (because we’ll actually keep the old version also), but we still get the other benefits.

Research!

There are a ton of tools for image compression floating around and it wasn’t clear from secondary-research (ie: googling) which were the best. So I decided to do some primary research. Side-note: A few years ago at a previous startup (which was later acquired by LinkedIn) I did some similar reasearch on PNG compression with a much smaller dataset but with more compressors.

This all started as a Wikia Hackathon project in which I wrote a bot which could download an image, compress it and re-upload it. There was a lot of buy-in for this idea, so my first tests were built from that script. I chose 10 wikis and used the Wikia API (via the Perl MediaWiki API library that I co-author) to find 100 images from each wiki. I compress PNGs and JPGs and ignore the others.

The raw data can be found on ImageBot’s User Page. But here are some takeaways:

  • We could save 11.10% across all images
  • pngcrush compressed more than optipng (when pngcrush was given a good long time to do its best methods) and jpegoptim compressed more than jpegtran.
  • But it really doesn’t matter which were better because BOTH was better than either. Meaning that if we compressed using both methods, then chose the best compression, that result was better. pngcrush saved 20.89% on average but always choosing the smallest PNG of the two, resulted in 23.83% savings. Similarly, jpegoptim was 5.40% on average, but JPGs in general compressed to 5.94% when using the better method for each image.

My next startup: BlueLine Game Studios!

I have two big changes to announce!

The first is that I’ve recently founded my next company: BlueLine Game Studios. It’s been a long time coming and I was really looking forward to the chance to work with Geoff Brown again. After the marathon in September – with all of the extra time from not training anymore – it only took a few days before my restlessness had me thinking over all of the things I’d been studying about Indie Gaming for the last several years.

It seems almost inevitable that I would start a gaming company someday and it was a no-brainer that Geoff would be the first man to pull on-board if I could get him. At this point, I just didn’t have anymore excuses to delay it. I jumped in and contacted the developers of the best boardgames I could find (mostly Mensa Select winners) to discuss licensing. Much to my surprise they were all very responsive, seemed eager to work together, and moved fast. Before I knew it, we had a licensing deal signed to bring one of the most awesome boardgames ever to Xbox 360!

Since then, we signed a deal with another boardgame (to be announced soon ;)). They were seriously the #1 and #2 boardgames that I wanted to make for Xbox 360!

Things have been going quite well. …and fast! Which brings me to my second big change:

Starting Dec. 12th, I’ll be cutting back to 3 days per week at Wikia. I’ve been working on LyricWiki since early 2006. A few months after Wikia acquired LyricWiki, they pulled me in to head up LyricWiki work again and help with development in general. It’s been a great couple of years in which I’ve met some amazing people, but it’s time for me to sew some other seeds also. Furthermore, Wikia is in a great spot and can certainly survive without my fulltime attention ;). Just this morning I noticed that according to Quantcast, Wikia is the 40th biggest site/group-of-sites on the internet (in terms of monthly US uniques). That’s awesome. When I joined, most people I met didn’t know the name Wikia. Two years later we’re starting to become a household name and we have more traffic than MySpace! It’s certainly been an exciting ride so-far.

Better yet, since I’m not even leaving (just cutting back) I can do my part to continue to help Wikia grow to its next big milestones.

I’m looking forward to the adventures in store with this new arrangement and am happy to be back “in the startup arena”*.

If you think this sounds intriguing at all, please follow BlueLine Games on twitter, facebook, google+, it’s blog, or signup to be notified when Hive for Xbox 360 goes live!


* Wikia is so profitable & has so many employees that it’s hard to call it a startup. So much job security… ugh! ;)

API Explorer for MediaWiki

Api Explorer: As part of my work trying to make the Wikia API more accessible to developers, I’ve created a basic version of an API Explorer.

Yo dawg: the entire API Explorer is actually written in javascript using the MediaWiki API to build the documentation about the API. Introspection win!

Open source: There is no reason this should be Wikia-specific, so I contributed it upstream to MediaWiki. You can find more info on the ApiExplorer Extension page on MediaWiki.org.

Future plans: when there is time, I’d like to make the MediaWiki API return its example URLs also. When that is working, hopefully we can make the API Explorer let the user issue those example requests and see the results live (and modify the URLs to issue new test requests).

Don’t forget to follow @WikiaAPI on twitter for more updates!

LyricWiki mobile app – Android and iPhone

Download for free from the Apple App Store
Download for free from the Android Market

I’m pleased to announce that the LyricWiki iPhone app has finally hit the App Store!

If you have an Android device, you can grab the LyricWiki Android app which made it out first.

Now that the iOS version is complete, we cover most smartphone users in the US. It’s been a long road (including the month-long submission process through Apple), but this version feels pretty solid.

Homescreen of LyricWiki Android app

Here’s what it’s got so far:

  • Auto-complete
  • Discographies for artists, grouped by album
  • iTunes Top 9 images on main screen
  • Over 1,500,000 lyrics with more added every day
  • Fully licensed: royalties are paid to publishers
  • Powered by a wiki community – constantly updated!
  • These lyrics have had 5 years of review and editing from hundreds of thousands of users! This is the single most accurate lyrics collection anywhere.

I’d also like to give a shout-out to all of the beta-testers and other users who have given feedback about the app… it’s extremely helpful to know what the community wants out of the app. Thank you!

Obviously, it’s never “done” & we’ll keep improving it as we go. If you have any feedback on the app, please let me know in the comments!

Another cool thing about the app is that, thanks to the help of a number of awesome translators, the app is available in these languages: English, German, Spanish, French, Italian, Polish, Finnish, Swedish, Norwegian, Czech, Dutch, Interlingua, Luxembourgish, Telugu, Macedonian, Breton, and Malay!

How to make (and use) a custom SASS function

The SASS (Syntactically Awesome StyleSheets) language is pretty neat. For my first use of SASS, I realized that one of the requirements for our system wasn’t easily supported by the language… but SASS is written in Ruby which is really easy to extend. The docs even mentioned the ability to create custom functions. However, I couldn’t find any docs on how to actually create and use a custom SASS function, so I figured I’d give a quick tutorial here of what I learned. This method is probably obvious to hardcore Ruby users, but I’d never used Ruby before. Turns out it’s pretty quick to learn. If you want to do the Matrix thing and pump the whole language into your brain like Neo in a ghetto dentist-chair, check out this Ruby crash course.

NOTE: This tutorial is targeted primarily at people using SASS via the command line (for PHP, Java, etc.), not as a Rails module.

My example: getting values from the sass command-line

I’m using SASS in a PHP environment (rather than Rails) and due to unique requirements, I need to be able to configure certain values in .scss at ‘compile’ time (referring to when the .scss is being compiled into .css).

One simple trick would be to simply write out a .scss file containing the key-value pairs. Unfortunately, the system I need to use SASS for already has tens of millions of page-requests per day so disk-writes would be a huge bottleneck (because disk i/o – even on solid state disks – is slow compared to many other methods). Custom SASS functions provided the perfect opportunity to completely skip this step. And yes: pre-generating all of the CSS files at code-deployment is out of the question because the number of possible stylesheets we need to support is intractably large.

The custom function

To create your custom SASS function, make a ruby file. We’ll call it sass_function.rb in this example. In the file, you need to define your function and then insert your module into SASS. Behold!


require 'sass'

module WikiaFunctions
        def get_command_line_param(paramName, defaultResult="")
                assert_type paramName, :String
                retVal = defaultResult.to_s

                # Look through the args given to SASS command-line
                ARGV.each do |arg|
                        # Check if arg is a key=value pair
                        if arg =~ /.=./
                                pair = arg.split(/=/)
                                if(pair[0] == paramName.value)
                                        # Found correct param name
                                        retVal = pair[1] || defaultResult
                                end
                        end
                end
		begin
			Sass::Script::Parser.parse(retVal, 0, 0)
		rescue
			Sass::Script::String.new(retVal)
		end
        end
end

module Sass::Script::Functions
  include WikiaFunctions
end

The particular SASS function in this example takes in name/value pairs defined on the sass command line and returns them if they’re there (or an optional default otherwise).

Calling SASS

Since this tutorial assumes that you’re using sass from the command-line, you’ll have to tweak the command a little bit to tell ruby to use your new module. Here is a simple example:
sass unicorn.scss unicorn.css -r sass_function.rb
That doesn’t make use of the awesomeness of our new, command-line parsing function though! So here is an example that WOULD make use of it:
sass unicorn.scss unicorn.css logoColor=#6495ED -r sass_function.rb

So now we have a function capable of reading the command-line and a command-line with some useful information in it. Now all that’s needed is some SASS code (.scss) to make use of all of that.

In this example, we’ll set the “logo” element to have a background-color that we get from the command-line (and default to white if no matching value is passed in on the command-line). Remember: this would go in SASS code such as unicorn.scss


$logoBackgroundColor: get_command_line_param("logoColor", "white");

#logo{
   background-color: $logoBackgroundColor;
}

Now we have all of the pieces:

  1. The custom SASS function (called get_command_line_param()) in sass_function.rb
  2. The code in unicorn.scss to use our function to set the style by command-line info.
  3. The command-line needed to include our custom code and to set the logoColor value.

So if we run
sass unicorn.scss unicorn.css logoColor=#6495ED -r sass_function.rb
we will have a unicorn.css which contains something like:


#logo{
   background-color: #6495ED;
   }

Just what we were going for! If you try this out, let me know in the comments if it worked for you or if you have any questions.

Best of luck!

Special thanks to Nathan Weizenbaum for pointing me down the right path with this stuff. Updated on 20100729 to change the return-value of the function based on the helpful comments below. Thanks!

2010 Goals – Q1 Review

To help keep myself on track, I’ve decided to do a quarterly review of my goals for 2010.

Visual Overview

I’ll try to be quick about this, so here is a visual overview of the progress. There are 32 tasks, so approximately 8 should be done. I’m about on track for that, but slightly behind (the green & yellow squares in the image add up to 7). Here is a key for the image:

  • Green: Completely done.
  • Yellow: Ongoing tasks which I’m on track for but which can’t be considered completed unless I keep them up for the whole year (such as “organize desks”).
  • Gray: Long tasks with measurable progress where I’m as far along as should be expected by the end of the first quarter.
  • Red: Tasks which I’m far off where I should expect to be at this point.

Some things of note

As you can see in the todo list widget, my email inbox is still way out of control. I do okay at clearing out the non-actionable junk, but real emails which I need to respond to just end up piling up. Unfortunately, beyond that useful piece of data (which I didn’t need the todo list widget to explain to me), I’ve noticed that the todo list widget isn’t terribly helpful anymore. When I had a few discrete places I could go to track my tasks it made sense, but now there are just too many places & it’s not worth trying to integrate each new one with my widget.

I’m doing really poorly at my blogging goals, but not for lack of interesting things to write about. The blog posts which I think of as being worthwhile to write seem to be time-consuming to do correctly and I haven’t been setting aside sufficient time for that.

Also, I realized early on that it could be dangerous to myopically focus on a list of goals set at the beginning of the year. Therefore, I’m more than willing to cancel or change some goals if I feel that they’re no longer worth pursuing as much as other, more urgent goals.

What is perhaps my favorite goal on the list – doubling LyricWiki‘s traffic – was almost attained early on. I had a couple of weeks to really focus on the site, then right at the end of those weeks we had a spike from an interesting page getting big on StumbleUpon. Unfortunately, I haven’t been able to focus as closely on the site and it’s stayed at about the level as right before the spike. So after the spike worked it’s way out of the monthly calculations, we’re only about half way to the overall goal for the year. I was really excited about the potential of being able to reach the annual goal in just the first quarter but it didn’t pan out.

That’s it for now. Let me know if you have any email tips!

Quick tip: clone of PHP’s microtime() function as a Perl subroutine.

Refer to the PHP manual for how the function is supposed to work. The short version is that you call “microtime(1)” to get this perl subroutine to return a float-formatted value containing the seconds and microseconds since the unix epoch.


	# Clone of PHP's microtime. - from http://seancolombo.com
	use Time::HiRes qw(gettimeofday);
	sub microtime{
		my $asFloat = 0;
		if(@_){
			$asFloat = shift;
		}
		(my $epochseconds, my $microseconds) = gettimeofday;
		if($asFloat){
			while(length("$microseconds") < 6){
				$microseconds = "0$microseconds";
			}
			$microtime = "$epochseconds.$microseconds";
		} else {
			$microtime = "$epochseconds $microseconds";
		}
		return $microtime;
	}

This is public domain, use it as you’d like. Please let me know if you find any bugs.

Hope it helps!

Open sourcing a MediaWiki bot framework

On my last post I asked what my readers wanted me to write about and all of the responses I got on the post or in person had the “how to write a MediaWiki bot in 10 minutes or less” at the top of the list.

I have that post mostly written, but in order to make that whole process easier, I’ve finally made the bot framework that I now use to be open sourced and easily accessible online.

Background

I used to use custom scripts for my bot, but this summer when LyricWiki transitioned over to Wikia, they all broke. My scripts pre-dated the MediaWiki API so they had depended on screen-scraping which no longer worked when we switched to Wikia’s skins which had a completely different layout.

When I had to get my bots running again, I looked at a few Perl frameworks for connecting to the MediaWiki API, and the one that seemed to have significantly less bugs than the others was a perl module by CBM.

Over the months, I’ve realized that there was some functionality that wasn’t implemented yet but which I needed – deleting pages, issuing purges, finding all templates included on a page – so I updated to the module. I tried to get access to the MediaWiki Tool Server where the project is currently hosted, but they must be really busy because they haven’t replied to the JIRA issue (request for an account) and it’s been months.

Since it has become quite a waiting game, I decided to just fork the project. Hopefully CBM will want access to the repository and we can just keep working on it together. Regardless, I’ve created all of the usual suspects for a project such as this (see next section).

Project links

So, without further delay, here are the beginnings of the Perl MediaWiki API

The links (especially the wiki) need a lot of work before it becomes obvious how to quickly get set up and use the module. The next blog post will take care of that!

However, if you’re curious & are already comfortable with Perl (and to some extent MediaWiki), you can jump right in. Let me know if you have any feedback. Thanks!

What do you want to see?

Note: If you’re seeing this on facebook, it is just pulled in from my blog at http://seancolombo.com

I’m in the mood to do some blogging in the next couple of days but have more ideas than time. What would YOU find most interesting? I’m thinking along the lines of either analyzing LyricWiki statistics or doing quick tutorials (“How to Write a MediaWiki Bot in 10 Minutes or Less”, or something similar).

Here were some ideas of stats I could do. They each take a decent amount of time, so please let me know which ones you are most interested in:

  • Views / #Songs by Genre
  • Views / #Pages by Language
  • Views / #Pages by Publisher
  • Infographic of Lables/Publishers in the Music Industry, how they relate to each other, and their prevalence in the market.
  • Our prevalence in a country vs. it’s prevalence online
  • Impact on page-views of being SOTD / AOTW / FMOM vs. not.
  • Views from songs that were on the iTunes Top 100 during the month vs. those that weren’t. Views/page for that same group.
  • Views by page-age and views/page by page-age.
  • Views by page freshness (last touched) and views/page by page freshness. Include histogram of freshness across all pages.

Let me know in the comments what you want to see! Those stats, the tutorial mentioned, or anything else are all fair game. Since I don’t have many readers yet, if you comment then I’ll probably do the post you’re asking for.

When to release a software project

This is something I’ve thought about for a long time and I think I’ve finally come up with a succinct answer: as soon as possible with the minimum set of features that still lets the project accomplish its core goals.

Before we get into this full-steam, there are obviously exceptions such as big title video games where the consumer expectation is to be delivered a single piece of completed software that never needs an update. This post is aimed primarily at web-apps but will also apply to most other applications (including indie video games).

Who cares?

This may seem like a very minor issue or a personal preference, but I really don’t think it is. As a developer or as someone who knows developers, you probably realize that the vast majority of software never gets finished. Furthermore, too much of it is poorly written. When you choose to release a product and how you get there are both major factors in addressing these concerns.

My recommendations

Bullets are fast, lets use them!

  • Figure out what makes your product concept compelling to users. If you can write this in one sentence, that would be good. Here’s an example you may recognize: LyricWiki is a free site which is a source where anyone can go to get reliable lyrics for any song, from any artist, without being hammered by invasive ads. Even if you don’t word it as a sentence (which you really should figure out at some point*) for now, just make sure you have the main features: {free, reliable lyrics (wiki-editable), good coverage, no invasive ads}.
  • Even if you want your product to have a bunch of tangential-features which you think will make it awesome, make a list of only the features that are mandatory to meet your project’s core goals.
  • When you’re writing a feature, write it right. While I think you should cut back on what you implement, my experience has been that you almost never get to go back and really polish features as much as you’d like to after-the-fact. Do it right the first time. Also, if you ever do get the chance to go back you won’t remember the code quite as well as when you’re writing it the first time.
  • Release it! As soon as you hit your minimum goals, don’t hesitate… put it live! If you have friendly users asking you for fixes and new features, that will push you to continue. You probably created the project because you wanted to make something that would be used… so people actually using the product and wanting to use it more is one of the best forcing-functions to get you to keep working. It will be an even better motivator if you actually use your product yourself because you will quickly start to yearn for new features or bugfixes.

*: A full sentence like this should guide your decisions as the project grows to keep it from bloating. Also, when people social-bookmark, blog about, or tweet your site they will quite frequently just paste this sentence. Having it as the first and most prominent sentence helps. Also, people are going to ask you “what is your site?” in loud, crowded rooms dozens of times. If your project is successful, you will literally have to describe your project hundreds of times. My current loud-environment elevator-pitch for LyricWiki is: “it’s like Wikipedia for song lyrics… called LyricWiki.” This evolved primarily because apparently people couldn’t hear my enunciation of “LyricWiki” in a loud room unless I had pre-prepped their brains for both lyrics and wikis.