Mar/100
Quick tip: clone of PHP’s microtime() function as a Perl subroutine.
Refer to the PHP manual for how the function is supposed to work. The short version is that you call “microtime(1)” to get this perl subroutine to return a float-formatted value containing the seconds and microseconds since the unix epoch.
# Clone of PHP's microtime. - from http://seancolombo.com
use Time::HiRes qw(gettimeofday);
sub microtime{
my $asFloat = 0;
if(@_){
$asFloat = shift;
}
(my $epochseconds, my $microseconds) = gettimeofday;
if($asFloat){
while(length("$microseconds") < 6){
$microseconds = "0$microseconds";
}
$microtime = "$epochseconds.$microseconds";
} else {
$microtime = "$epochseconds $microseconds";
}
return $microtime;
}
This is public domain, use it as you’d like. Please let me know if you find any bugs.
Hope it helps!
Jul/090
Quick Tip: Do huge MySQL queries in batches when using PHP
When using PHP to make MySQL queries, it is significantly better for performance to break one extremely large query into smaller queries. In my testing, there was a query which returned 1 million rows and took 19,275 seconds (5 hours, 20 minutes) to traverse the results. By breaking that query up into a series of queries that had about 10,000 results, the total time dropped to 152 seconds… yeah… less than 3 minutes.
While MySQL provides LIMIT and/or OFFSET functionality to do batching, if you have numeric id’s on the table(s) you’re querying, I’d recommend using your own system for offsets (code example below) rather than the default MySQL functionality since the hand-rolled method is much faster. See table in next section for performance comparisons.
Timing table
I’ll provide some example code below to show you how I did the batching (based on potentially sparse, unique, numeric ids). To start, here is a table of query-result-size vs. the total runtime for my particular loop. All timings are for traversing approximately 1,000,000 rows.
| Query Batch Size | Handrolled method | MySQL “LIMIT” syntax |
|---|---|---|
| 1,000,000+ | 19,275 seconds | 19,275 seconds |
| 10,000 | 152 seconds | 1,759 seconds |
| 5,000 | 102 seconds | 1,828 seconds |
| 1,000 | 43 seconds | ? |
| 750 | 40 seconds | ? |
At the end, it was pretty clear that no more data was needed to continue to demonstrate that the LIMIT method was slow. Each one of those runs was taking about half an hour and about halfway through the 1,000 row test for the LIMIT method, it started causing the database to be backed up. Since this was on a live production system, I decided to stop before it caused any lag for users.
Example Code
This code is an example of querying for all of the pages in a MediaWiki database. I used similar code to this to make a list of all of the pages (and redirects) in LyricWiki. In the code, you’ll notice that the manual way I do the offsets based on id instead of using the MySQL “LIMIT” syntax doesn’t guarantee that each batch is the same size since ids might be sparse (ie: some may be missing if rows were deleted). That’s completely fine in this case and there is a significant performance boost from using this method. This test code just writes out a list of all of the “real” pages in a wiki (where “real” means that they are not redirects and they are in the main namespace as opposed to Talk pages, Help pages, Categories, etc.).
< ?php
$QUERY_BATCH_SIZE = 10000;
$titleFilenamePrefix = "wiki_pageTitles";
// Configure these database settings to use this example code.
$db_host = "localhost";
$db_name = "";
$db_user = "";
$db_pass = "";
$db = mysql_connect($db_host, $db_user, $db_pass);
mysql_select_db($db_name, $db);
$TITLE_FILE = fopen($titleFilenamePrefix."_".date("Ymd").".txt", "w");
$offset = 0;
$done = false;
$startTime = time();
while(!$done){
$queryString = "SELECT page_title, page_is_redirect FROM wiki_page WHERE page_namespace=0 AND page_id > $offset AND page_id < ".($offset+$QUERY_BATCH_SIZE);
if($result = mysql_query($queryString, $db)){
if(($numRows = mysql_num_rows($result)) && $numRows > 0){
for($cnt=0; $cnt < $numRows; $cnt++){
$title = mysql_result($result, $cnt, "page_title");
$isRedirString = mysql_result($result, $cnt, "page_is_redirect");
$isRedirect = ($isRedirString != "0");
if(!$isRedirect){
fwrite($TITLE_FILE, "$title\n");
}
}
$offset += $QUERY_BATCH_SIZE;
print "\tDone with $offset rows. \n";
} else {
$done = true;
}
}
mysql_free_result($result);
}
$endTime = time();
print "Total time to cache results: ".($endTime - $startTime)." seconds.\n";
fclose($TITLE_FILE);
?>
Hope that helps!
Jan/091
Quick Tip: Generate Random Numbers in a MySQL Query
Noticed a dearth of Google articles about this so I figured I’d post my solution…
If you need to generate a random number inside of a MySQL query (which is useful for making fake test-data), you can do it as follows:
SELECT floor(rand() * 10) as randNum;
Which would generate a random integer from 0 to 9 inclusive. Just change the 10 to the number one higher than you want to generate. The important part is just the “floor(rand() * THE_EXCLUSIVE_UPPER_BOUND)”.
The full explanation is that rand() will generate a random floating point number that is greater than or equal to 0 but less than 1. After you multiply that number by your upper-bound, floor() gets rid of everything after the decimal point.
Hope that helps someone!