My next startup: BlueLine Game Studios!

I have two big changes to announce!

The first is that I’ve recently founded my next company: BlueLine Game Studios. It’s been a long time coming and I was really looking forward to the chance to work with Geoff Brown again. After the marathon in September – with all of the extra time from not training anymore – it only took a few days before my restlessness had me thinking over all of the things I’d been studying about Indie Gaming for the last several years.

It seems almost inevitable that I would start a gaming company someday and it was a no-brainer that Geoff would be the first man to pull on-board if I could get him. At this point, urologist I just didn’t have anymore excuses to delay it. I jumped in and contacted the developers of the best boardgames I could find (mostly Mensa Select winners) to discuss licensing. Much to my surprise they were all very responsive, price seemed eager to work together, and moved fast. Before I knew it, we had a licensing deal signed to bring one of the most awesome boardgames ever to Xbox 360!

Since then, we signed a deal with another boardgame (to be announced soon ;)). They were seriously the #1 and #2 boardgames that I wanted to make for Xbox 360!

Things have been going quite well. …and fast! Which brings me to my second big change:

Starting Dec. 12th, I’ll be cutting back to 3 days per week at Wikia. I’ve been working on LyricWiki since early 2006. A few months after Wikia acquired LyricWiki, they pulled me in to head up LyricWiki work again and help with development in general. It’s been a great couple of years in which I’ve met some amazing people, but it’s time for me to sew some other seeds also. Furthermore, Wikia is in a great spot and can certainly survive without my fulltime attention ;). Just this morning I noticed that according to Quantcast, Wikia is the 40th biggest site/group-of-sites on the internet (in terms of monthly US uniques). That’s awesome. When I joined, most people I met didn’t know the name Wikia. Two years later we’re starting to become a household name and we have more traffic than MySpace! It’s certainly been an exciting ride so-far.

Better yet, since I’m not even leaving (just cutting back) I can do my part to continue to help Wikia grow to its next big milestones.

I’m looking forward to the adventures in store with this new arrangement and am happy to be back “in the startup arena”*.

If you think this sounds intriguing at all, please follow BlueLine Games on twitter, facebook, google+, it’s blog, or signup to be notified when Hive for Xbox 360 goes live!


* Wikia is so profitable & has so many employees that it’s hard to call it a startup. So much job security… ugh! ;)

How Google Steals from Publishers

I recently discovered a very shady practice in AdSense for Search where they make you believe you are earning revenue, no rx then quietly at the end of the month they deduct a fee at payment-time equal to the amount you earned (ie: you net nothing).

It’s pretty hard to believe, what is ed to the point that you may think I’m confused. However, I’ve checked. I’ve double checked. I’ve uttered expletives, calmed myself down and checked a few more times. Here’s the money shot:
AdSense sketchinessI checked back and they’ve been doing this for months. I checked the documentation that’s linked to from that “Fees” line, and long-story-short: it’s a hidden fee affecting “less than 1%” of users which they don’t warn you about nor give an explanation of how to avoid it. It’s also barely mentioned at all in their Terms & Conditions (section 11, second sentence) which, of course, protects them legally but is an extremely shady business practice even if they had explained it in detail in the T&C.

This post probably isn’t very eloquent at the moment. It’s hard to write intelligently with steam pouring out of your ears. So without further ado, here is a copy of the letter I sent to their Customer Support which has much more information.

Open Letter to Google AdSense

I’d like to express my extreme dissatisfaction with the hidden AdSense for Search fees. I feel that the way in which you handle these fees is disingenuous on several levels and jeopardizes your integrity from a publisher’s perspective.

There are several areas where I think you guys really dropped the ball:
– When we sign up for AdSense for search there is no visible indication that there was even a possibility of getting charged. Burying it deep in the Terms and Conditions isn’t sufficient when it is completely contrary to the ad-network model of publishers-use-your-network… publishers-get-paid.
– There was no warning at all while this was happening. I very regularly check my stats on AdSense for the day and the month and there wasn’t any indication that each day my revenue was in fact zero, not the number I was being shown. Even if users are aware of this policy, they won’t know that they’re not making money until after the month is over and you decide to take it away.
– There is no easy way to notice this discrepancy after the fact. Old monthly reports will STILL show fake revenue (“fake” because you may take away every cent of it), and they do not show the deduction. Even upon looking at the payment history, there is no indication on THAT page of the deduction unless you drill down to yet ANOTHER page.
– There was never a notification that this was happening even though your FAQ alludes to this being extremely rare (“less than 1% will be affected” in answer 9890), my notification settings are set to the highest level possible, and this program even had the nerve to send me an ad on “New and improved AdSense for search” so I could generate even more money for your program of which I would not see a cent.
– The documentation mentions that some magical equation may take our money at the end of the month but does not provide any transparency into 1) what the metrics were that caused this or 2) how to avoid having this happen again. For what it’s worth: I’ve looked back through my payment history and every cent of revenue from when I first started AdSense for Search back in September has been taken every month, so this isn’t a fluke. Something about my site is angering your algorithm.

I feel robbed. Worse than just tricked; I feel like you stole money (hundreds of dollars) that I had every reason to expect was earned and was going to be direct-deposited.

One of AdSense’s differentiators is that it’s simple and turn-key so that publishers can focus on creating a valuable website instead of dealing with ad sales or applying to ad networks. If publishers have to spend time pouring over every available report to watch our backs for hidden fees, then your offering would clearly fall short in providing that value.

Based on this lack of disclosure, I think your fees should be re-evaluated. I ask that you let me know what you think is fair.

*Shuts eyes & takes a deep breath*. Hokay, I’m back now. I’ll keep you updated when/if I hear back from them.

Must-have feature for web-apps: auto-update.

It’s been a pet-peeve of mine for some time now that creators of web-apps (like myself) haven’t been holding ourselves to the same standards as creators of desktop apps. One of the major justifications for calling our products “applications” (as opposed to just “really good web sites”) is that we’re claiming you get the same level of functionality as a comparable downloadable desktop version.

Currently this just isn’t the case. When is the last time you went through the following process to update a program on your desktop:

  1. Check the creator’s website frequently to see if there are updates
  2. If there is an update, cialis 40mg go through a directory of a bunch of different versions and figure out which one you want since “most-recent” and “stable” are almost always different versions
  3. Download the files
  4. Unzip the files into a separate folder
  5. Make backups of your existing installation if desired
  6. Copy the files from the new folder into your existing installation

Seriously? This may look absurd to most users (I hope it does), diagnosis but this is exactly the status quo for updating web-apps. I don’t know of a single web-app with acceptable update abilities. The closest thing is the DreamHost one-click-installs/updates of applications, but that’s something they have to write themselves for each app and isn’t part of the applications themselves.

I recently found out that I’m not alone in desiring this: it is currently the most-requested idea on WordPress Ideas site (which is like Motive Suggest for WordPress).

It’d certainly be complex to write an auto-updater since there are so many different systems the apps can be installed on. However, desktop installers were never easy to write either and we’ve come to expect them. My suggestion is that web-apps have one setting that allows complete auto-updates (but this option is disabled by default). The more normal use-case would be that the app checks a webservice to figure out if it’s up to date. If not, a very visible “upgrade now” button would be shown to the administrators when they log in. This button could be highlighted yellow for normal updates and red for security-critical updates. For extreme security-crucial updates, the app could actually email its administrators to notify them that there is a critical update and should have a link to kick off the upgrade. Since each version of the app itself is sending this alert, the administrators don’t have to join anyone’s mailing list to get these alerts (so they maintain their privacy).

Just a thought. I’m anxious to release a downloadable framework now just so I could include updating and hopefully start a trend (OffhandWay, MotiveSuggest, SiloSync?). Also, if you see instances of this type of feature in the wild, please let me know!

SiloSync Survey Results

The overarching theme of the SiloSync Survey was that people really want to back up their data from Facebook!

Since the results were in order of preference and the questions could all have multiple answers, caries I made a simple scoring scheme. I assigned 5 points for the first answer, medicine 4 for the second, etc.. The 5th answer and beyond were all 1 point.

The results:

What would you use SiloSync for?

  1. 24 points – Backing up data
  2. 13 points – Leaving a social network if it gets shady
  3. 8 points – Keeping profiles on multiple social networks in sync
  4. 4 points – Quickly joining new social networks w/o re-setting up

Those results were amusing since the whole mainstream-blogosphere data-portability argument seems to be focused on the lowest-ranking result. Basically small startups want to take away Facebook’s edge so they’re pushing that one the hardest (I don’t blame them). It’s good to have some actual input from users so I can make SiloSync the best possible!

Onto which social networks the respondents were interested in:

Which sites would you like to be able to grab data FROM?

  1. 27 points – Facebook
  2. 11 points – WordPress
  3. 7 points – LinkedIn
  4. 6 points – Blogger
  5. 5 points – GMail & Flickr
  6. 3 points – LiveJournal, Voicemail
  7. 1 point – Twitter

No suprise that Facebook dominated that one. MySpace got served: no one even mentioned it. MySpace also got ignored in the next section on exporting data. However, keep in mind that this section isn’t too important since the people surveyed indicated that they are much more interested in getting their data out than sending it other places.

Where would you like to be able to export your data TO?

  1. 17 points – LinkedIn
  2. 11 points – Facebook
  3. 8 points – Twitter & SiloXML (ie: an open XML format)
  4. 5 points – Gmail
  5. 4 points – Pedlr
  6. 3 points – WordPress
  7. 2 points – Blogger
  8. 1 point – LiveJournal

There you have it! Everyone hearts Facebook a whole-bunch but doesn’t trust them one iota with their personal data. It’s a good thing too because they’re riddled with security flaws (but that’s for another post) and pull stuff like the NewsFeed & Beacon.

Big thanks to everyone who took the survey for me. It helps a ton when approaching a project this large to know which parts are desired the most.

… guess I’d better get to work on SiloSync now, huh?

Pitt talk was fun

The talk I gave this week on SiloSync at Pitt was a fun venue. Their Lunch-and-Learn series is a really cool idea and sounds like it’s getting even more interesting. Next month’s talk is going to be done by a VP from Sun Microsystems. Prior to presenting, approved I jumped back into the SiloSync code and wrote the beginnings of the importer for Facebook.

As a side-note: one of the things that’s fascinating about this project is that I get to see all of the half-implemented security that different sites use. LiveJournal had a secure way of sending passwords, but shockingly stores passwords as plain-text (a big security faux-pas). Similarly, I saw some left-over fields in Facebook’s login form, but it appears that they just punted and used https (a secure web connection using SSL encryption) to just encrypt the whole login.

Back to the crux of this post: I’ve been rather tempted lately to actually finish SiloSync – which I had previously shelved in hopes that Open Social and other big-name initiatives would fix the problem (they didn’t). Google, Facebook, and MySpace have all announced fake data portability initiatives in the last week or so, which shows that if we want our data to be free, we’re going to have to take it (see my previous post on freeing the social graph for why this is important).

I decided it would be best to make a habit of posting my slide-decks when I present (I appreciate it when other people do that), here are the PowerPoint and Open Document (Open Office) versions. In the process of making the presentation, I ended up creating a visual representation of SiloSync which I think does a great job of summing up the whole idea for someone who hasn’t been exposed to it yet. That’s the picture above and to the right… click it to see the full-size version.

Interestingly, with these effectively useless announcements from the major Social Networks, a lot of non-technical press has been declaring that data is now free. Okay, cool, let’s all go home.

Fortunately, most of the technical press is calling them on it. Everyone from TechCrunch to David Recordon (of OpenId fame) is telling it like it is.

If you are interested in seeing SiloSync pushed to fruition (more than you’re interested in seeing Motive Suggest or doItLater v2.0), let me know so that I can weigh off the interest between the several projects competing for my time. Also, feel free to leave comments about your thoughts on the various “fake” data portability. This seems to be the topic which always gets the most vocal response on my blog.

Speaking at University of Pittsburgh, May 14th.

I’ll be speaking at the University Of Pittsburgh’s School of Pharmacy (in 810B) for a “Lunch and Learn” on May 14th. The talk will be on SiloSync (which will need to be updated quite a bit before then) and will probably go into a more general discussion of Social Networking and Freeing the Social Graph during Q&A.

From what I understand, more about the Lunch and Learn series is mostly attended by faculty and staff, visit but we’ll see. The last talk was by Jesse Schell of Schell Games, so I guess I’m in good company!

Thanks for inviting me, Pitt!

New kind of spam: Invite-Spam

I’ve noticed over the past week or so that the spammers have a new trick up their sleeves. Within the last week I’ve gotten invites to iLike.com, information pills IMVU, glaucoma and myYearbook from people I don’t know, viagra sent to an email address that I don’t really use (it’s forwarded to the same place like all the rest, but I don’t give it to anyone).

Bot-nets

I’ve had to fight spammers quite a bit on LyricWiki.org, and I’m beginning to realize a little bit more about why things work the way they work. As far as I can tell, the state-of-the-art in spamming is that tech-criminals build up bot-nets and then sell them as spamming machines. They use the zombies to attack popular technology in ways that uses other people’s web-servers to send out spam. This way, they can use the reputation of these servers to assure higher delivery-rates and they can count on the people running the servers to try to keep their reputation w/spam-filters as high as possible.

For a little more background for the uninformed: a bot-net is a vast array of hacked computers (zombies) that can be controlled remotely. Basically these are just everyday people who have been infected and are none-the-wiser. Years ago when your computer got infected, you generally got viruses that caused a ton of popups and eventually you sought help to remove the viruses. But with today’s bot-nets, the infected user generally has no knowledge of the problem and therefore doesn’t clean off their computer. When the bot-herder (who runs the bot-net) wants to do something, they use Trojan Horses which they’ve installed on the computer to send updates with what the computer should silently do.

For instance, I run MediaWiki on LyricWiki.org, and many bots have been trained to vandalize pages with random letters (I’m assuming it’s random… it might actually be a tracking-code) which they later come back and check for. If the wiki is not well-patrolled, then they come back and spam these pages with links. This way, they don’t have to reveal what product they are promoting unless they know it is some small wiki potentially with low resources – this prevents them from being tracked down by huge companies and reported to authorities. An added bonus of the bot-net approach is that each computer has a different IP address, so it’s hard to block all of them.

Invite-Spam

In this new flavor of spam, it appears bot-nets are signing up for profiles at social networking sites, and sending out invites to victims. This is a great way to use other sites’ reputable servers to send out spam that is highly likely to get delivered and also to make it through contextual spam filtering (since they look like any other invite).

This creates an interesting conflict for the sites who are being used to send the spam: on the one hand, these bots are out promoting them for free, getting new users to sign up out of curiosity (“Do I know this person? The name sounds vaguely familiar…”). On the other hand, these are ill-gotten users, and the spam that’s being sent out probably moves their servers on to more and more blacklists. Both options are a mixed-bag, and in the end I feel that it’s always best in business to do the right thing without immolating yourself. You didn’t earn these new users, so just take a stand and try to solve the spamming issue if you can. Aye, there’s the rub: often, a startup’s most rare asset is time. How much time should a company devote to trying to fix a problem like this? They could be out promoting their site, adding new features, or fixing bugs. They’re always understaffed, and there is always more work to be done.

This is a hard problem to deal with since you’re either protecting strangers from a bunch of spam that’s coming from your servers (which you really had nothing to do with), or you’re adding features for your users. A tough call to make. Hopefully some of these companies can co-operate to come up with a technical solution that they can share amongst each other to make it practicable for them all to implement it. The three companies whose servers spammed me aren’t even direct competitors – one is chat (IMVU), one is youth social-networking (myYearbook), and one is music-focused (iLike).

I’ve emailed a friend at one of the companies and explained the situation. It will be interesting to see how they respond.

PS: Please don’t comment about just adding a CAPTCHA. Those things are horribly useless against talented programmers and have an inherent “economic” flaw. I’ll probably write more about it later, but to put it simply, every time I see a site using “ReCAPTCHA” in a place where they should have actual decent Turing-test security, I cringe. It doesn’t do that!

Open Letter to LiveJournal – Please protect my password :(

Dear LiveJournal, caries

It would appear that you are storing an md5 hash of each user’s password in your database. Although I certainly could be wrong, I have reason to believe this is your method (see below), and it worries me greatly. I am concerned that my password and the passwords of all other LiveJournal users are highly vulnerable to attack. In this day and age, this method is almost no different than simply storing my password in plaintext.

To reiterate what many of you probably know, the original purpose of storing an MD5 hash over plaintext is that the passwords would ideally be unrecoverable even in the event that an attacker was able to obtain a copy of your database. This security is needed because such attacks do happen successfully even to companies that take network security seriously.

With advances in hash-table attacks (eg: “Rainbow Crack“), it is conceivable that any attacker capable of obtaining your database would have no trouble whatsoever converting all of these hashes to their original passwords in a short amount of time with even very basic computing equipment.

I appreciate your efforts to not send passwords in cleartext even on non-encrypted connections. This is above and beyond the usual call of duty, however, the storage method is antiquated and no longer safe.

It would be rude for me to bring up the problem and just leave you hanging, so I will humbly make a recommendation: store the passwords using a salt that is randomly generated (by mt_rand(), not rand()) for each user, and then hashing the salt and password using a more secure method of hashing such as bcrypt. I will include references that explain the reasons for each of these choices.

LiveJournal has always been very proactive in adopting or even creating new technology to take care of serious issues like openness, scalability, and even security. I realize that many other large sites may be guilty of this oversight also, but that doesn’t make your users any safer. Please address this issue as soon as is healthily possible. I – and I’m sure, others – would be more than willing to provide more info if that would help you make the conversion even faster.

If I was wrong about how you are storing passwords, please correct me so that I can clear the air (and apologize profusely).

Thank you for your time,
– Sean Colombo

PS: Thanks for memcached, it makes running my own sites much more cost-effective.

EXTRA INFO:

  • Everyone is doing it… but they’re doing it wrong.
  • Since LJ uses PHP, please generate the random salts with mt_rand() instead of rand(). I don’t mean to patronize you if you already know about mt_rand, I’m just trying to cover all bases here.
  • More info than you’d ever want to know about securely storing passwords
  • A really solid implementation of using Rainbow Tables to crack md5 hashes: Ophcrack.
  • For the curious: my indication that the passwords are stored as a simple md5-hash comes from the code used to encrypt the password before sending it to the LiveJournal login code. This is extremely nice that they do this, but its aim is to protect against sniffing out packets to find your password. At the same time, a site like LiveJournal has a nice juicy database full of millions of tasty passwords… enough to entice an attacker to steal the whole thing and steal millions of identities instead of victimizing individual users, thus creating a much bigger problem.

    LJ sends out a ‘challenge’ with the login form. This challenge (chal) is combined with your password (pass) as follows before being sent (in ‘res’) to LiveJournal:
    var res = MD5(chal + MD5(pass));
    What this implies is that LJ has an md5 hash of each user’s password, which they then combine with the challenge you send them and compare against your response. This is a good zero-knowledge proof that you know your password (or at least its md5 hash). This “extra security” while well-intentioned, actually means that an attacker could log into your LiveJournal account using your hash even before cracking it… but this is a very small problem since the main reason we care about the way a password is stored is that you probably also use your password for other (possibly sensitive) accounts (such as your online-banking/paypal/etc.).

UPDATE: I thought I’d wait to get some verification that I was right that they store the passwords like this before bugging LJ, but I’d want someone to report things like this to me ASAP if one of my sites had a problem… so I sent this along to them now (as a support ticket on LiveJournal). I’ll update if they get back to me.

UPDATE: Remembered LiveJournal is open source… started browsing the code. Found out it’s perl, not PHP (oops).

UPDATE: This keeps getting worse. It turns out they store the passwords as plaintext! see the comments below for more info.

Freedom! – Opening the Social Graph and all of its data

Braveheart battle-cry

There has been tons of buzz lately over the “Social Graph”: an atrocious misnomer (won’t get into why) which is used by Mark Zuckerberg to mean “the data which represents all of a user’s social connections”. Facebook is getting a $10 billion to $15 billion valuation because they “own” this graph, search and the entire world of developers is supposed to be forced to bow and write all future social web-applications as facebook apps.

While I would still consider it a decent investment in Facebook at this point because they have this data locked down, I cannot support this tyranny. It is not only intuitive, but now also the general internet consensus that users own their own data.

So what on earth are we to do? Free the data! Brad Fitzpatrick of LiveJournal/OpenID fame and David Recordon who worked with Brad on OpenID stirred up this whole movement in Brad’s widely cited braindump. They laid the groundwork for an uncentralized set of tools to use microformats and clever spidering to figure out a user’s ownership of several accounts from a single place and calculate all of their friendships to find missing links. Missing links would be for example, if you have someone in your gmail contact-list and as a facebook friend, but you don’t follow their twitter account.

Subsequently, both of these hackers have built code and convinced their companies to open their data and have made announcements to that effect – Brad at Google and David at Six Apart.

I’ve been involved in the conversation a bit, and as I’ve mentioned before, I think that not just friendships, but other data is an equally important part of a user’s data, and they need to own that too.

Right now, the users’ data is spread throughout many silos: their photos in Flickr, their blog posts on wordpress, etc.. This is a major limitation and is starting to get on people’s nerves. As of right now, there is no �bersilo where a user can sync up their info and social connections.

The solution? A commercial, but completely open site which lets a user aggregate all of their frienship data AND all of their other data (photos, videos, blog posts, tweets, bookmarks, etc.). This data can then be pushed to other networks on the demand of the user. Furthermore, the user can export all of this data in a standard format and just up and leave the site if they don’t like how they’re being treated. Beyond that, new social applications will be able to implement an API that can pull the user’s data down for them (only with their permission of course).

Side note: I bounced this idea off of Brad Fitzpatrick who said I should “go for it”… there really is no conflict of interest in being a commercial site in an open endeavor.

This solution would have to exhibit several traits:

entry posted to:
SeanColombo.com
Motive Blog
  • No compliance required – to be useful, this tool has to work with the most popular networks, even before they explicitly open their data through APIs. Since users are accessing their own data, this doesn’t violate ethics or terms of service… it just takes more code to accomplish this.
  • Extensibility – it has to be easy to add an arbitrary amount of new networks even if the site doesn’t have any idea what these networks are. Likewise, it has to be equally easy to add new types of data. For instance, tweets were a new concept… the system has to be able to sync up with entirely new types of data seamlessly.
  • Portability – it’s the problem we’re here to solve, so obviously this tool can’t lock down the data. It has to go to absurd lengths to make sure the data can be moved around easily.
  • Clarity – everyday users don’t know what all this “social graph”, “XFN”, “FOAF”, “microformat” talk is. The tool has to be extremely easy to comprehend for all users, not just �ber-geeks and technocrats.
  • Privacy & Controlthe user has to be the one in control of the data. Not the tool… not the social networks accessing this ubersilo… the user. They have to control what goes where, and they need to be able to easily control how this data will be accessed on other sites.

Sounds pretty sweet, huh? Well I’m not one to sit back and watch an important fight from the sidelines… I’m going to have to do something about this.

Google… the new Microsoft? (response)

In response to a fellow coder/entrepreneur’s article that Google is the New Microsoft, anaemia I wrote a comment that I later realized should be a post… here is my reply:

I feel your pain… but keep your chin up!

I agree that Google is flattening competition in a very Microsoft-esque way. Their honeymoon should have been over after they started ignoring customers as a policy or maybe when they refused to tell publishers what percent of the share they will get on AdSense. If everyday people were exposed to Google Labs, order they may also have noticed that the number and scope of projects they are undertaking leaves very little space for anyone else.

On the brighter side… in a keynote that David Koretz (CEO of BlueTie) gave at RIT, he pointed out that in all areas other than search, Google is number 2. It doesn’t matter whether it’s calendars, video, picture-sharing, etc… they crush 90-some percent of the businesses, but someone still beats them. This shows that if you can get to a market before Google does, there is still some (although not much) hope for you.

Even with this encouraging news, that’s not going to help you get funding. I’m also trying to raise VC for a web-application, and although my site is in an area that is miles from where Google has ever touched (because “Sergey doesn’t like music”), I always hear the same question: “How are you going to stop Google from coming in and beating you?” I don’t know what they expect to hear in this scenario, because they already have Google built up in their minds as the brightest and fasted coders on the planet (far from true).

Fortunately, Google gives us one parting gift… hype. Their IPO, then their large buyout of YouTube paints quite a picture in the average investor’s mind (talking about laypeople here, not VCs) that The Bubble is Back.

Good luck to you!

For more information about Google, I recommend reading The Search by John Battelle.