Jan/101
When to release a software project
This is something I’ve thought about for a long time and I think I’ve finally come up with a succinct answer: as soon as possible with the minimum set of features that still lets the project accomplish its core goals.
Before we get into this full-steam, there are obviously exceptions such as big title video games where the consumer expectation is to be delivered a single piece of completed software that never needs an update. This post is aimed primarily at web-apps but will also apply to most other applications (including indie video games).
Who cares?
This may seem like a very minor issue or a personal preference, but I really don’t think it is. As a developer or as someone who knows developers, you probably realize that the vast majority of software never gets finished. Furthermore, too much of it is poorly written. When you choose to release a product and how you get there are both major factors in addressing these concerns.
My recommendations
Bullets are fast, lets use them!
- Figure out what makes your product concept compelling to users. If you can write this in one sentence, that would be good. Here’s an example you may recognize: LyricWiki is a free site which is a source where anyone can go to get reliable lyrics for any song, from any artist, without being hammered by invasive ads. Even if you don’t word it as a sentence (which you really should figure out at some point*) for now, just make sure you have the main features: {free, reliable lyrics (wiki-editable), good coverage, no invasive ads}.
- Even if you want your product to have a bunch of tangential-features which you think will make it awesome, make a list of only the features that are mandatory to meet your project’s core goals.
- When you’re writing a feature, write it right. While I think you should cut back on what you implement, my experience has been that you almost never get to go back and really polish features as much as you’d like to after-the-fact. Do it right the first time. Also, if you ever do get the chance to go back you won’t remember the code quite as well as when you’re writing it the first time.
- Release it! As soon as you hit your minimum goals, don’t hesitate… put it live! If you have friendly users asking you for fixes and new features, that will push you to continue. You probably created the project because you wanted to make something that would be used… so people actually using the product and wanting to use it more is one of the best forcing-functions to get you to keep working. It will be an even better motivator if you actually use your product yourself because you will quickly start to yearn for new features or bugfixes.
*: A full sentence like this should guide your decisions as the project grows to keep it from bloating. Also, when people social-bookmark, blog about, or tweet your site they will quite frequently just paste this sentence. Having it as the first and most prominent sentence helps. Also, people are going to ask you “what is your site?” in loud, crowded rooms dozens of times. If your project is successful, you will literally have to describe your project hundreds of times. My current loud-environment elevator-pitch for LyricWiki is: “it’s like Wikipedia for song lyrics… called LyricWiki.” This evolved primarily because apparently people couldn’t hear my enunciation of “LyricWiki” in a loud room unless I had pre-prepped their brains for both lyrics and wikis.
Jul/090
Quick Tip: Do huge MySQL queries in batches when using PHP
When using PHP to make MySQL queries, it is significantly better for performance to break one extremely large query into smaller queries. In my testing, there was a query which returned 1 million rows and took 19,275 seconds (5 hours, 20 minutes) to traverse the results. By breaking that query up into a series of queries that had about 10,000 results, the total time dropped to 152 seconds… yeah… less than 3 minutes.
While MySQL provides LIMIT and/or OFFSET functionality to do batching, if you have numeric id’s on the table(s) you’re querying, I’d recommend using your own system for offsets (code example below) rather than the default MySQL functionality since the hand-rolled method is much faster. See table in next section for performance comparisons.
Timing table
I’ll provide some example code below to show you how I did the batching (based on potentially sparse, unique, numeric ids). To start, here is a table of query-result-size vs. the total runtime for my particular loop. All timings are for traversing approximately 1,000,000 rows.
| Query Batch Size | Handrolled method | MySQL “LIMIT” syntax |
|---|---|---|
| 1,000,000+ | 19,275 seconds | 19,275 seconds |
| 10,000 | 152 seconds | 1,759 seconds |
| 5,000 | 102 seconds | 1,828 seconds |
| 1,000 | 43 seconds | ? |
| 750 | 40 seconds | ? |
At the end, it was pretty clear that no more data was needed to continue to demonstrate that the LIMIT method was slow. Each one of those runs was taking about half an hour and about halfway through the 1,000 row test for the LIMIT method, it started causing the database to be backed up. Since this was on a live production system, I decided to stop before it caused any lag for users.
Example Code
This code is an example of querying for all of the pages in a MediaWiki database. I used similar code to this to make a list of all of the pages (and redirects) in LyricWiki. In the code, you’ll notice that the manual way I do the offsets based on id instead of using the MySQL “LIMIT” syntax doesn’t guarantee that each batch is the same size since ids might be sparse (ie: some may be missing if rows were deleted). That’s completely fine in this case and there is a significant performance boost from using this method. This test code just writes out a list of all of the “real” pages in a wiki (where “real” means that they are not redirects and they are in the main namespace as opposed to Talk pages, Help pages, Categories, etc.).
< ?php
$QUERY_BATCH_SIZE = 10000;
$titleFilenamePrefix = "wiki_pageTitles";
// Configure these database settings to use this example code.
$db_host = "localhost";
$db_name = "";
$db_user = "";
$db_pass = "";
$db = mysql_connect($db_host, $db_user, $db_pass);
mysql_select_db($db_name, $db);
$TITLE_FILE = fopen($titleFilenamePrefix."_".date("Ymd").".txt", "w");
$offset = 0;
$done = false;
$startTime = time();
while(!$done){
$queryString = "SELECT page_title, page_is_redirect FROM wiki_page WHERE page_namespace=0 AND page_id > $offset AND page_id < ".($offset+$QUERY_BATCH_SIZE);
if($result = mysql_query($queryString, $db)){
if(($numRows = mysql_num_rows($result)) && $numRows > 0){
for($cnt=0; $cnt < $numRows; $cnt++){
$title = mysql_result($result, $cnt, "page_title");
$isRedirString = mysql_result($result, $cnt, "page_is_redirect");
$isRedirect = ($isRedirString != "0");
if(!$isRedirect){
fwrite($TITLE_FILE, "$title\n");
}
}
$offset += $QUERY_BATCH_SIZE;
print "\tDone with $offset rows. \n";
} else {
$done = true;
}
}
mysql_free_result($result);
}
$endTime = time();
print "Total time to cache results: ".($endTime - $startTime)." seconds.\n";
fclose($TITLE_FILE);
?>
Hope that helps!
Mar/090
Old, quick app – “Runner’s Calculator”
Thanks to Google Alerts, I ran into an old web-application I made. It was a simple Runner’s Calculator and it was one of my first little standalone PHP programs. It’s literally from the summer of 2004 (ancient history!) before the term “AJAX” was even coined, so you actually have to press a submit button to get the results (the horror!). Old programs are funny.
I had thrown it together while prepping for a 5k that my co-op company was running in that summer. Given the recent upturn in the weather, I figured it be appropriate to dust it off in so it could get some use on the interwebs:
My personal favorite use is the “Generate Time Table” feature… plop in how far you’re going to be racing and approximate upper and lower bounds on how long you think you might take & it will tell you the splits for a bunch of different times in that range.
Enjoy! :)
Dec/080
“What gets measured gets managed”
That quote is a relatively common business adage which is attributed to a ton of different people – so many that I’ll just consider it public domain. The reason for this post is just that I’d like to draw attention to it and maybe as you read this post, you can think of how this can be used to help your own productivity and success.
As with any business advice, this tends to be more digestible with some anecdotes so I’ll give a couple of examples of how this has helped me lately with running LyricWiki.org.
Example Uno: Server Uptime
Without a doubt, the biggest problem with LyricWiki up until recently had been uptime. The site was constantly slow or completely unavailable from its inception. The reason is that we were always short on servers since the company had minimal capital, and setting up new servers took a great deal of time and fell on my shoulders during a time in my life where I always had at least one fulltime responsibility other than LyricWiki.
When I cut back my other job a bit to give me more time to devote to LyricWiki, one of the first things I set out to fix was the reliability of the site. In order to know if it was improving, I would need to know how long the site was down each month so I could track whether the number of minutes was going up or down.
I created a small spreadsheet to track outages, and each time the site went down I logged when the outage was, the duration (in minutes) of the outage, which server(s) had the problems, the apparent cause and whether or not the cause was resolved. Much to my surprise, I didn’t really need to graph this over the months. After the first couple of days, it became very apparent that there was one huge problem still lingering and it would be worth my time to automate a fix to it instead of responding ad-hoc each time the problem cropped up.
I whipped up the code to solve the problem while I was on a layover in Philadelphia, uploaded it when I got home – and the site stayed up for almost a month straight! That’s pretty huge. I don’t think that had ever happened in LyricWiki history until just now. Very cool.
Example Dos: Intractable To-do Lists
A major personal time-management problem I had recently was that I couldn’t tell if I was just spinning my wheels or if I was actually making progress on my backlog of LyricWiki tasks. It felt as though I was getting emailed so much work that I barely ever got to break out into the tasks on my talk page let alone the “Mid-term Actions” list I made which was my conscious plan of how to make LyricWiki rock your socks off in the mid-term.
To solve this problem of not having a grasp on my tasks, I whipped up a little widget which you can see in the side-column of the blog, near the bottom. The results were much better than I expected: I actually instantly felt like I was in more control, I felt comfortable removing tasks that were duplicated across lists, and I can finally tell when I’m moving fast enough to make forward progress.
The widget tracks the size of all of my various to-do lists and updates that number hourly. There are two reasons for the caching: 1) checking the lists requires hitting the sites and that takes at least 3 seconds total 2) if it updated in real-time, I’d sit there refreshing the thing all day!
The widget actually has some neat hidden features. Here is the to-do list widget on its own page which also tracks the number of tasks I’ve had at the end of each day and charts my progress (using the Google Charts API). If anyone is interested, maybe I’ll write another post where I give out the source-code to the widget.
Conveniently, the widget is narrow enough to both fit in a blog sidebar and be displayed on a smartphone. It’s currently the homepage on my blackberry and I don’t see that changing any time soon!
Conclusion
The quote “what gets measured gets managed” always seems to ring true for me. I think of it again and again – usually after-the-fact when I’ve just saved time by being OCD about something. I strongly recommend that you take a second (right now) and think of an area of your life or your work which you feel isn’t getting sufficient attention and consider tracking meaningful statistics about it. Please share any similar successful experiences in the comments!
Jul/082
How to restore WordPress categories
After upgrading to the latest version of WordPress, the text of all of the categories were deleted. It appears this may be my fault for not disabling all of the plugins before doing the upgrade (oops). They’re back now though. It was somewhat annoying to have to do, but as far as “data loss error”s go, it wasn’t very bad at all.
I noticed a bunch of other people online had similar problems but nobody mentioned a solution so I figured I might as well throw up a quick description of what I did.
I used google to find a cached version of my page which listed all of the categories in the side-bar. Now I had all of the categories but I didn’t know which id in the database matched which category. I found that out by running this query:
SELECT post_title, term_taxonomy_id FROM wp_posts,wp_term_relationships WHERE wp_term_relationships.object_id=wp_posts.ID ORDER BY term_taxonomy_id;
Which shows what posts were assigned to each category. After reading through the list and figuring it out, I was able to fix the text by typing queries along the lines of:
UPDATE wp_terms SET name='Motive Force', slug='motive-force' WHERE term_id=8;
Since I brushed over it above:
To find a cached version of your site on google, just search for “site:YOURDOMAINNAMEHERE.com”, then click on “Cached” next to any of the results. They clear those out after a while, so don’t put it off!
Jul/080
Quick Tip: Beep when a long process is complete over SSH
When you’re running a long command on a terminal over SSH, you may end up wasting a great deal of time checking back repeatedly to see if the process is complete. A quick alternative would be to make the shell beep when it’s complete. Assuming you are running a script called “longScript.sh”, then simply typing a line like:
> longScript.sh; printf \\a
Will cause most SSH clients to beep after longScript.sh is finished running.
Hope that helps!
Jun/081
Must-have feature for web-apps: auto-update.
It’s been a pet-peeve of mine for some time now that creators of web-apps (like myself) haven’t been holding ourselves to the same standards as creators of desktop apps. One of the major justifications for calling our products “applications” (as opposed to just “really good web sites”) is that we’re claiming you get the same level of functionality as a comparable downloadable desktop version.
Currently this just isn’t the case. When is the last time you went through the following process to update a program on your desktop:
- Check the creator’s website frequently to see if there are updates
- If there is an update, go through a directory of a bunch of different versions and figure out which one you want since “most-recent” and “stable” are almost always different versions
- Download the files
- Unzip the files into a separate folder
- Make backups of your existing installation if desired
- Copy the files from the new folder into your existing installation
Seriously? This may look absurd to most users (I hope it does), but this is exactly the status quo for updating web-apps. I don’t know of a single web-app with acceptable update abilities. The closest thing is the DreamHost one-click-installs/updates of applications, but that’s something they have to write themselves for each app and isn’t part of the applications themselves.
I recently found out that I’m not alone in desiring this: it is currently the most-requested idea on WordPress Ideas site (which is like Motive Suggest for WordPress).
It’d certainly be complex to write an auto-updater since there are so many different systems the apps can be installed on. However, desktop installers were never easy to write either and we’ve come to expect them. My suggestion is that web-apps have one setting that allows complete auto-updates (but this option is disabled by default). The more normal use-case would be that the app checks a webservice to figure out if it’s up to date. If not, a very visible “upgrade now” button would be shown to the administrators when they log in. This button could be highlighted yellow for normal updates and red for security-critical updates. For extreme security-crucial updates, the app could actually email its administrators to notify them that there is a critical update and should have a link to kick off the upgrade. Since each version of the app itself is sending this alert, the administrators don’t have to join anyone’s mailing list to get these alerts (so they maintain their privacy).
Just a thought. I’m anxious to release a downloadable framework now just so I could include updating and hopefully start a trend (OffhandWay, MotiveSuggest, SiloSync?). Also, if you see instances of this type of feature in the wild, please let me know!
May/087
Quick 3-question SiloSync Survey
SiloSync is a sizable undertaking and there are a number of different potential places to start from. I want to make sure I have a decent idea of where the demand is, so I’ve put together a quick 3-question survey. Please take a minute to fill it out for me! (you can be anonymous if you’d like)
To answer, please just leave a comment. I’ll leave my own answers in a comment as an example.
Question 1: What would you be most anxious to use SiloSync for?
A. Syncing up data & friendships between profiles on different sites so that they are all up-to-date.
B. Backup up data (photos, etc.) & friendship connections so that they never get lost and are all in one place.
C. Changing services if one of them does something unacceptable (along the lines of the Facebook Beacon debacle).
D. Quickly joining new services w/o the trouble of re-finding everyone and re-typing everything.
For this, please just type all of the letters you are interested in from highest-to-lowest
Question 2: What services would you most like to be able to pull data into SiloSync from?
(examples: do you want to pull your data from Facebook, Flickr, LiveJournal, Wordpress, Twitter, MySpace?)
Again, please type the most-desired first.
Question 3: What services are most important to export data to?
(examples: do you want to send-data-to/sync-data-with Facebook, LinkedIn, Twitter, or a bunch of new and exciting social networks we don’t know about yet?)
Hopefully that was quick! Thanks for taking some time to help me out :)
May/083
Quick Tip: Delete old log-files if you use mySQL replication
There was a bit of a mess over on LyricWiki off and on for a few days. The culprit was a known bug in mySQL which messes up master-slave replication if you run out of hard-disk space (which you will if you’re using master-slave replication).
The preventative solution is to set up a daily cron-job which will find out what log the slave is using and delete all of the binary log-files that are older than that file. The alternative is an immense pile of unneeded files which will eventually cause you to run out of space and completely break your replication. To give you some idea, we filled up 100gigs of log-files from LyricWiki (which has hundreds of times as many reads as writes) in about 2.5 months.
Hope that helps!
UPDATE: I just wrote this script and figured I’d release it publicly to save others some time. You can get the code from deleteOldBinLogs.txt (that’s just a .txt so you can view the code… save it as a “.pl” file). Once you’ve filled out the “configuration” part at the top and have uploaded the file to the “/root” directory on your Master database server, add this line to your crontab file (by typing “crontab -e” into the command line):
0 4 * * * perl /root/deleteOldBinLogs.pl
That will make the script run at 4am each morning.
May/080
Pitt talk was fun
The talk I gave this week on SiloSync at Pitt was a fun venue. Their Lunch-and-Learn series is a really cool idea and sounds like it’s getting even more interesting. Next month’s talk is going to be done by a VP from Sun Microsystems. Prior to presenting, I jumped back into the SiloSync code and wrote the beginnings of the importer for Facebook.
As a side-note: one of the things that’s fascinating about this project is that I get to see all of the half-implemented security that different sites use. LiveJournal had a secure way of sending passwords, but shockingly stores passwords as plain-text (a big security faux-pas). Similarly, I saw some left-over fields in Facebook’s login form, but it appears that they just punted and used https (a secure web connection using SSL encryption) to just encrypt the whole login.
Back to the crux of this post: I’ve been rather tempted lately to actually finish SiloSync – which I had previously shelved in hopes that Open Social and other big-name initiatives would fix the problem (they didn’t). Google, Facebook, and MySpace have all announced fake data portability initiatives in the last week or so, which shows that if we want our data to be free, we’re going to have to take it (see my previous post on freeing the social graph for why this is important).
I decided it would be best to make a habit of posting my slide-decks when I present (I appreciate it when other people do that), here are the PowerPoint and Open Document (Open Office) versions. In the process of making the presentation, I ended up creating a visual representation of SiloSync which I think does a great job of summing up the whole idea for someone who hasn’t been exposed to it yet. That’s the picture above and to the right… click it to see the full-size version.
Interestingly, with these effectively useless announcements from the major Social Networks, a lot of non-technical press has been declaring that data is now free. Okay, cool, let’s all go home.
Fortunately, most of the technical press is calling them on it. Everyone from TechCrunch to David Recordon (of OpenId fame) is telling it like it is.
If you are interested in seeing SiloSync pushed to fruition (more than you’re interested in seeing Motive Suggest or doItLater v2.0), let me know so that I can weigh off the interest between the several projects competing for my time. Also, feel free to leave comments about your thoughts on the various “fake” data portability. This seems to be the topic which always gets the most vocal response on my blog.