Do A/B Testing in one line with this open source one-file PHP library

Years ago, pestilence I read a post by the always-awesome Eric Ries called “The one line split-test, or how to do A/B all the time” and I thought it was brilliant. He espoused that with a low enough setup-cost, A/B testing could be used prolifically.

A while later, I was involved in writing an A/B Testing framework at Wikia (Fandom) and we were able to create a fairly simple system which allowed Product Managers to create and view experiments easily from a panel. However, due to the type of site Wikia is, each experiment often wanted to track different metrics.

You may be expecting what comes next: the small amount of friction required from both a Product Manager to set up an experiment, and the extra friction of figuring out how to get custom metrics… lead to this rather-advanced* system being used very-little in practice.

Recently, due to the success of Burndown for Trello, I’ve been wanting to leverage the size of our user-base to use A/B testing that will let us make more data-driven design decisions. I re-visited Eric Ries’s post and was reminded of my past experiences that even the smallest amount of friction can prevent A/B testing from being used prolifically. I thought it would be great if there was a library to do that one-line testing… and if the library itself was so easy to set up that it would provide very little friction for projects to integrate it and get started.

So, I created LeanAb. It’s free, open source, and takes about 15 minutes to set up the first time and get to running an experiment. To be blunt, even if nobody else uses it, I’ll be happy to have it around so that I can continue to drop it into my various sites… but I think that if you’re considering split-testing on your site and have been putting it off, this could quite likely be a great help for your projects.

Here were my goals for the first version of LeanAb, all of which have been reached:

  • Have it be very simple one-line calls like those mentioned in the Eric Ries article
  • Self-contained to one file
  • Fast/Easy to set up
  • Self-installs the needed database schema
  • Ability to generate basic reports easily (eg: one line of code, or using the very basic report page provided in the repository)
  • Fail gracefully so that regardless of errors, the worst that happens is that nothing experiment-related gets written to database, and user sees control group.

Usage is as simple as this:

    $hypothesis = setup_experiment("FancyNewDesign1.2",
array(array("control", 50),
array("design1", 50)));
if( $hypothesis == "control" ) {
// do it the old way
} elseif( $hypothesis == "design1" ) {
// do it the fancy new way
}

The reports that it generates automatically look something like this:
LeanAb example report

If you’re interested, head over to the Lean AB github page to try it out, or follow to its ongoing development. Full installation/usage documentation is provided in the README on github.

If you use this in your project, please let me know in the comments below!

Hope that helps,
– Sean

*: the advanced parts were mostly hidden from end-users. It ran extremely fast even though the site served over a billion pages per day. It was also designed to allow either the server code or the front-end code to easily access a user’s treatment-group, possibly for several experiments per-page, all with no perceptible impact on performance.

Quick Tip: Do huge MySQL queries in batches when using PHP

When using PHP to make MySQL queries, stomatology it is significantly better for performance to break one extremely large query into smaller queries. In my testing, there was a query which returned 1 million rows and took 19,275 seconds (5 hours, 20 minutes) to traverse the results. By breaking that query up into a series of queries that had about 10,000 results, the total time dropped to 152 seconds… yeah… less than 3 minutes.

While MySQL provides LIMIT and/or OFFSET functionality to do batching, if you have numeric id’s on the table(s) you’re querying, I’d recommend using your own system for offsets (code example below) rather than the default MySQL functionality since the hand-rolled method is much faster. See table in next section for performance comparisons.

Timing table

I’ll provide some example code below to show you how I did the batching (based on potentially sparse, unique, numeric ids). To start, here is a table of query-result-size vs. the total runtime for my particular loop. All timings are for traversing approximately 1,000,000 rows.

Query Batch Size Handrolled method MySQL “LIMIT” syntax
1,000,000+ 19,275 seconds 19,275 seconds
10,000 152 seconds 1,759 seconds
5,000 102 seconds 1,828 seconds
1,000 43 seconds ?
750 40 seconds ?

At the end, it was pretty clear that no more data was needed to continue to demonstrate that the LIMIT method was slow. Each one of those runs was taking about half an hour and about halfway through the 1,000 row test for the LIMIT method, it started causing the database to be backed up. Since this was on a live production system, I decided to stop before it caused any lag for users.

Example Code

This code is an example of querying for all of the pages in a MediaWiki database. I used similar code to this to make a list of all of the pages (and redirects) in LyricWiki. In the code, you’ll notice that the manual way I do the offsets based on id instead of using the MySQL “LIMIT” syntax doesn’t guarantee that each batch is the same size since ids might be sparse (ie: some may be missing if rows were deleted). That’s completely fine in this case and there is a significant performance boost from using this method. This test code just writes out a list of all of the “real” pages in a wiki (where “real” means that they are not redirects and they are in the main namespace as opposed to Talk pages, Help pages, Categories, etc.).


< ?php

$QUERY_BATCH_SIZE = 10000;
$titleFilenamePrefix = "wiki_pageTitles";

// Configure these database settings to use this example code.
$db_host = "localhost";
$db_name = "";
$db_user = "";
$db_pass = "";

$db = mysql_connect($db_host, $db_user, $db_pass);
mysql_select_db($db_name, $db);

$TITLE_FILE = fopen($titleFilenamePrefix."_".date("Ymd").".txt", "w");
$offset = 0;
$done = false;
$startTime = time();
while(!$done){
$queryString = "SELECT page_title, page_is_redirect FROM wiki_page WHERE page_namespace=0 AND page_id > $offset AND page_id < ".($offset+$QUERY_BATCH_SIZE);
if($result = mysql_query($queryString, $db)){
if(($numRows = mysql_num_rows($result)) && $numRows > 0){
for($cnt=0; $cnt < $numRows; $cnt++){
$title = mysql_result($result, $cnt, "page_title");
$isRedirString = mysql_result($result, $cnt, "page_is_redirect");
$isRedirect = ($isRedirString != "0");
if(!$isRedirect){
fwrite($TITLE_FILE, "$title
");
}
}
$offset += $QUERY_BATCH_SIZE;
print "	Done with $offset rows. 
";
} else {
$done = true;
}
}
mysql_free_result($result);
}
$endTime = time();
print "Total time to cache results: ".($endTime - $startTime)." seconds.
";
fclose($TITLE_FILE);

?>

Hope that helps!

Quick Tip: Generate Random Numbers in a MySQL Query

Noticed a dearth of Google articles about this so I figured I’d post my solution…

If you need to generate a random number inside of a MySQL query (which is useful for making fake test-data), there you can do it as follows:

SELECT floor(rand() * 10) as randNum;

Which would generate a random integer from 0 to 9 inclusive. Just change the 10 to the number one higher than you want to generate. The important part is just the “floor(rand() * THE_EXCLUSIVE_UPPER_BOUND)”.

The full explanation is that rand() will generate a random floating point number that is greater than or equal to 0 but less than 1. After you multiply that number by your upper-bound, sales floor() gets rid of everything after the decimal point.

Hope that helps someone!