Do A/B Testing in one line with this open source one-file PHP library

Years ago, pestilence I read a post by the always-awesome Eric Ries called “The one line split-test, or how to do A/B all the time” and I thought it was brilliant. He espoused that with a low enough setup-cost, A/B testing could be used prolifically.

A while later, I was involved in writing an A/B Testing framework at Wikia (Fandom) and we were able to create a fairly simple system which allowed Product Managers to create and view experiments easily from a panel. However, due to the type of site Wikia is, each experiment often wanted to track different metrics.

You may be expecting what comes next: the small amount of friction required from both a Product Manager to set up an experiment, and the extra friction of figuring out how to get custom metrics… lead to this rather-advanced* system being used very-little in practice.

Recently, due to the success of Burndown for Trello, I’ve been wanting to leverage the size of our user-base to use A/B testing that will let us make more data-driven design decisions. I re-visited Eric Ries’s post and was reminded of my past experiences that even the smallest amount of friction can prevent A/B testing from being used prolifically. I thought it would be great if there was a library to do that one-line testing… and if the library itself was so easy to set up that it would provide very little friction for projects to integrate it and get started.

So, I created LeanAb. It’s free, open source, and takes about 15 minutes to set up the first time and get to running an experiment. To be blunt, even if nobody else uses it, I’ll be happy to have it around so that I can continue to drop it into my various sites… but I think that if you’re considering split-testing on your site and have been putting it off, this could quite likely be a great help for your projects.

Here were my goals for the first version of LeanAb, all of which have been reached:

  • Have it be very simple one-line calls like those mentioned in the Eric Ries article
  • Self-contained to one file
  • Fast/Easy to set up
  • Self-installs the needed database schema
  • Ability to generate basic reports easily (eg: one line of code, or using the very basic report page provided in the repository)
  • Fail gracefully so that regardless of errors, the worst that happens is that nothing experiment-related gets written to database, and user sees control group.

Usage is as simple as this:

    $hypothesis = setup_experiment("FancyNewDesign1.2",
array(array("control", 50),
array("design1", 50)));
if( $hypothesis == "control" ) {
// do it the old way
} elseif( $hypothesis == "design1" ) {
// do it the fancy new way
}

The reports that it generates automatically look something like this:
LeanAb example report

If you’re interested, head over to the Lean AB github page to try it out, or follow to its ongoing development. Full installation/usage documentation is provided in the README on github.

If you use this in your project, please let me know in the comments below!

Hope that helps,
– Sean

*: the advanced parts were mostly hidden from end-users. It ran extremely fast even though the site served over a billion pages per day. It was also designed to allow either the server code or the front-end code to easily access a user’s treatment-group, possibly for several experiments per-page, all with no perceptible impact on performance.

Quick tip: Perl version of PHP’s in_array()

There are a million and one ways to do this in Perl, sick but here’s a fairly readable way that you can copy/paste into your Perl scripts:


####
# From http://www.seancolombo.com
# Equivalent to PHP's in_array.  If the first element is in the array
# passed in as the second parameter, then the sub-routine returns non-zero.
# If the element is not in the array, then the sub-routine returns zero.
####
sub in_array{
my $retVal = 0;
my $val = shift(@_);
foreach my $curr (@_){
if($curr eq $val){
$retVal = 1;
last;
}
}
return $retVal;
} # end in_array()

Example usage:


if(in_array($needle, @haystack)){
print "Found it!
";
}

Been using this for years & it’s made my life a bit easier. Hope that’s useful to someone!

How to make (and use) a custom SASS function

The SASS (Syntactically Awesome StyleSheets) language is pretty neat. For my first use of SASS, information pills I realized that one of the requirements for our system wasn’t easily supported by the language… but SASS is written in Ruby which is really easy to extend. The docs even mentioned the ability to create custom functions. However, more about I couldn’t find any docs on how to actually create and use a custom SASS function, prosthesis so I figured I’d give a quick tutorial here of what I learned. This method is probably obvious to hardcore Ruby users, but I’d never used Ruby before. Turns out it’s pretty quick to learn. If you want to do the Matrix thing and pump the whole language into your brain like Neo in a ghetto dentist-chair, check out this Ruby crash course.

NOTE: This tutorial is targeted primarily at people using SASS via the command line (for PHP, Java, etc.), not as a Rails module.

My example: getting values from the sass command-line

I’m using SASS in a PHP environment (rather than Rails) and due to unique requirements, I need to be able to configure certain values in .scss at ‘compile’ time (referring to when the .scss is being compiled into .css).

One simple trick would be to simply write out a .scss file containing the key-value pairs. Unfortunately, the system I need to use SASS for already has tens of millions of page-requests per day so disk-writes would be a huge bottleneck (because disk i/o – even on solid state disks – is slow compared to many other methods). Custom SASS functions provided the perfect opportunity to completely skip this step. And yes: pre-generating all of the CSS files at code-deployment is out of the question because the number of possible stylesheets we need to support is intractably large.

The custom function

To create your custom SASS function, make a ruby file. We’ll call it sass_function.rb in this example. In the file, you need to define your function and then insert your module into SASS. Behold!


require 'sass'

module WikiaFunctions
def get_command_line_param(paramName, defaultResult="")
assert_type paramName, :String
retVal = defaultResult.to_s

# Look through the args given to SASS command-line
ARGV.each do |arg|
# Check if arg is a key=value pair
if arg =~ /.=./
pair = arg.split(/=/)
if(pair[0] == paramName.value)
# Found correct param name
retVal = pair[1] || defaultResult
end
end
end
begin
Sass::Script::Parser.parse(retVal, 0, 0)
rescue
Sass::Script::String.new(retVal)
end
end
end

module Sass::Script::Functions
include WikiaFunctions
end

The particular SASS function in this example takes in name/value pairs defined on the sass command line and returns them if they’re there (or an optional default otherwise).

Calling SASS

Since this tutorial assumes that you’re using sass from the command-line, you’ll have to tweak the command a little bit to tell ruby to use your new module. Here is a simple example:
sass unicorn.scss unicorn.css -r sass_function.rb
That doesn’t make use of the awesomeness of our new, command-line parsing function though! So here is an example that WOULD make use of it:
sass unicorn.scss unicorn.css logoColor=#6495ED -r sass_function.rb

So now we have a function capable of reading the command-line and a command-line with some useful information in it. Now all that’s needed is some SASS code (.scss) to make use of all of that.

In this example, we’ll set the “logo” element to have a background-color that we get from the command-line (and default to white if no matching value is passed in on the command-line). Remember: this would go in SASS code such as unicorn.scss


$logoBackgroundColor: get_command_line_param("logoColor", "white");

#logo{
background-color: $logoBackgroundColor;
}

Now we have all of the pieces:

  1. The custom SASS function (called get_command_line_param()) in sass_function.rb
  2. The code in unicorn.scss to use our function to set the style by command-line info.
  3. The command-line needed to include our custom code and to set the logoColor value.

So if we run
sass unicorn.scss unicorn.css logoColor=#6495ED -r sass_function.rb
we will have a unicorn.css which contains something like:


#logo{
background-color: #6495ED;
}

Just what we were going for! If you try this out, let me know in the comments if it worked for you or if you have any questions.

Best of luck!

Special thanks to Nathan Weizenbaum for pointing me down the right path with this stuff. Updated on 20100729 to change the return-value of the function based on the helpful comments below. Thanks!

Quick tip: clone of PHP’s microtime() function as a Perl subroutine.

Refer to the PHP manual for how the function is supposed to work. The short version is that you call “microtime(1)” to get this perl subroutine to return a float-formatted value containing the seconds and microseconds since the unix epoch.


# Clone of PHP's microtime. - from http://seancolombo.com
use Time::HiRes qw(gettimeofday);
sub microtime{
my $asFloat = 0;
if(@_){
$asFloat = shift;
}
(my $epochseconds, buy  my $microseconds) = gettimeofday;
if($asFloat){
while(length("$microseconds") < 6){
$microseconds = "0$microseconds";
}
$microtime = "$epochseconds.$microseconds";
} else {
$microtime = "$epochseconds $microseconds";
}
return $microtime;
}

This is public domain, use it as you’d like. Please let me know if you find any bugs.

Hope it helps!

Quick Tip: Do huge MySQL queries in batches when using PHP

When using PHP to make MySQL queries, stomatology it is significantly better for performance to break one extremely large query into smaller queries. In my testing, there was a query which returned 1 million rows and took 19,275 seconds (5 hours, 20 minutes) to traverse the results. By breaking that query up into a series of queries that had about 10,000 results, the total time dropped to 152 seconds… yeah… less than 3 minutes.

While MySQL provides LIMIT and/or OFFSET functionality to do batching, if you have numeric id’s on the table(s) you’re querying, I’d recommend using your own system for offsets (code example below) rather than the default MySQL functionality since the hand-rolled method is much faster. See table in next section for performance comparisons.

Timing table

I’ll provide some example code below to show you how I did the batching (based on potentially sparse, unique, numeric ids). To start, here is a table of query-result-size vs. the total runtime for my particular loop. All timings are for traversing approximately 1,000,000 rows.

Query Batch Size Handrolled method MySQL “LIMIT” syntax
1,000,000+ 19,275 seconds 19,275 seconds
10,000 152 seconds 1,759 seconds
5,000 102 seconds 1,828 seconds
1,000 43 seconds ?
750 40 seconds ?

At the end, it was pretty clear that no more data was needed to continue to demonstrate that the LIMIT method was slow. Each one of those runs was taking about half an hour and about halfway through the 1,000 row test for the LIMIT method, it started causing the database to be backed up. Since this was on a live production system, I decided to stop before it caused any lag for users.

Example Code

This code is an example of querying for all of the pages in a MediaWiki database. I used similar code to this to make a list of all of the pages (and redirects) in LyricWiki. In the code, you’ll notice that the manual way I do the offsets based on id instead of using the MySQL “LIMIT” syntax doesn’t guarantee that each batch is the same size since ids might be sparse (ie: some may be missing if rows were deleted). That’s completely fine in this case and there is a significant performance boost from using this method. This test code just writes out a list of all of the “real” pages in a wiki (where “real” means that they are not redirects and they are in the main namespace as opposed to Talk pages, Help pages, Categories, etc.).


< ?php

$QUERY_BATCH_SIZE = 10000;
$titleFilenamePrefix = "wiki_pageTitles";

// Configure these database settings to use this example code.
$db_host = "localhost";
$db_name = "";
$db_user = "";
$db_pass = "";

$db = mysql_connect($db_host, $db_user, $db_pass);
mysql_select_db($db_name, $db);

$TITLE_FILE = fopen($titleFilenamePrefix."_".date("Ymd").".txt", "w");
$offset = 0;
$done = false;
$startTime = time();
while(!$done){
$queryString = "SELECT page_title, page_is_redirect FROM wiki_page WHERE page_namespace=0 AND page_id > $offset AND page_id < ".($offset+$QUERY_BATCH_SIZE);
if($result = mysql_query($queryString, $db)){
if(($numRows = mysql_num_rows($result)) && $numRows > 0){
for($cnt=0; $cnt < $numRows; $cnt++){
$title = mysql_result($result, $cnt, "page_title");
$isRedirString = mysql_result($result, $cnt, "page_is_redirect");
$isRedirect = ($isRedirString != "0");
if(!$isRedirect){
fwrite($TITLE_FILE, "$title
");
}
}
$offset += $QUERY_BATCH_SIZE;
print "	Done with $offset rows. 
";
} else {
$done = true;
}
}
mysql_free_result($result);
}
$endTime = time();
print "Total time to cache results: ".($endTime - $startTime)." seconds.
";
fclose($TITLE_FILE);

?>

Hope that helps!