Quick 3-question SiloSync Survey

SiloSync is a sizable undertaking and there are a number of different potential places to start from. I want to make sure I have a decent idea of where the demand is, online so I’ve put together a quick 3-question survey. Please take a minute to fill it out for me! (you can be anonymous if you’d like)

To answer, please just leave a comment. I’ll leave my own answers in a comment as an example.

Question 1: What would you be most anxious to use SiloSync for?
A. Syncing up data & friendships between profiles on different sites so that they are all up-to-date.
B. Backup up data (photos, etc.) & friendship connections so that they never get lost and are all in one place.
C. Changing services if one of them does something unacceptable (along the lines of the Facebook Beacon debacle).
D. Quickly joining new services w/o the trouble of re-finding everyone and re-typing everything.

For this, please just type all of the letters you are interested in from highest-to-lowest

Question 2: What services would you most like to be able to pull data into SiloSync from?
(examples: do you want to pull your data from Facebook, Flickr, LiveJournal, WordPress, Twitter, MySpace?)
Again, please type the most-desired first.

Question 3: What services are most important to export data to?
(examples: do you want to send-data-to/sync-data-with Facebook, LinkedIn, Twitter, or a bunch of new and exciting social networks we don’t know about yet?)

Hopefully that was quick! Thanks for taking some time to help me out :)

Quick Tip: Delete old log-files if you use mySQL replication

There was a bit of a mess over on LyricWiki off and on for a few days. The culprit was a known bug in mySQL which messes up master-slave replication if you run out of hard-disk space (which you will if you’re using master-slave replication).

The preventative solution is to set up a daily cron-job which will find out what log the slave is using and delete all of the binary log-files that are older than that file. The alternative is an immense pile of unneeded files which will eventually cause you to run out of space and completely break your replication. To give you some idea, health we filled up 100gigs of log-files from LyricWiki (which has hundreds of times as many reads as writes) in about 2.5 months.

Hope that helps!

UPDATE: I just wrote this script and figured I’d release it publicly to save others some time. You can get the code from deleteOldBinLogs.txt (that’s just a .txt so you can view the code… save it as a “.pl” file). Once you’ve filled out the “configuration” part at the top and have uploaded the file to the “/root” directory on your Master database server, case add this line to your crontab file (by typing “crontab -e” into the command line):
0 4 * * * perl /root/deleteOldBinLogs.pl
That will make the script run at 4am each morning.

Pitt talk was fun

The talk I gave this week on SiloSync at Pitt was a fun venue. Their Lunch-and-Learn series is a really cool idea and sounds like it’s getting even more interesting. Next month’s talk is going to be done by a VP from Sun Microsystems. Prior to presenting, approved I jumped back into the SiloSync code and wrote the beginnings of the importer for Facebook.

As a side-note: one of the things that’s fascinating about this project is that I get to see all of the half-implemented security that different sites use. LiveJournal had a secure way of sending passwords, but shockingly stores passwords as plain-text (a big security faux-pas). Similarly, I saw some left-over fields in Facebook’s login form, but it appears that they just punted and used https (a secure web connection using SSL encryption) to just encrypt the whole login.

Back to the crux of this post: I’ve been rather tempted lately to actually finish SiloSync – which I had previously shelved in hopes that Open Social and other big-name initiatives would fix the problem (they didn’t). Google, Facebook, and MySpace have all announced fake data portability initiatives in the last week or so, which shows that if we want our data to be free, we’re going to have to take it (see my previous post on freeing the social graph for why this is important).

I decided it would be best to make a habit of posting my slide-decks when I present (I appreciate it when other people do that), here are the PowerPoint and Open Document (Open Office) versions. In the process of making the presentation, I ended up creating a visual representation of SiloSync which I think does a great job of summing up the whole idea for someone who hasn’t been exposed to it yet. That’s the picture above and to the right… click it to see the full-size version.

Interestingly, with these effectively useless announcements from the major Social Networks, a lot of non-technical press has been declaring that data is now free. Okay, cool, let’s all go home.

Fortunately, most of the technical press is calling them on it. Everyone from TechCrunch to David Recordon (of OpenId fame) is telling it like it is.

If you are interested in seeing SiloSync pushed to fruition (more than you’re interested in seeing Motive Suggest or doItLater v2.0), let me know so that I can weigh off the interest between the several projects competing for my time. Also, feel free to leave comments about your thoughts on the various “fake” data portability. This seems to be the topic which always gets the most vocal response on my blog.

Speaking at University of Pittsburgh, May 14th.

I’ll be speaking at the University Of Pittsburgh’s School of Pharmacy (in 810B) for a “Lunch and Learn” on May 14th. The talk will be on SiloSync (which will need to be updated quite a bit before then) and will probably go into a more general discussion of Social Networking and Freeing the Social Graph during Q&A.

From what I understand, more about the Lunch and Learn series is mostly attended by faculty and staff, visit but we’ll see. The last talk was by Jesse Schell of Schell Games, so I guess I’m in good company!

Thanks for inviting me, Pitt!

Running for my life… always fun.

Today I’m visiting my brother in his office on the 42nd floor of a skyscraper in downtown Pittsburgh. He has some stuff to read for work and I have some code to write. We saw a couple of helicopters hovering over the Strip District, page so I went to a bunch of local news stations to see what was going on. It’s Saturday so there isn’t really anyone else around to ask about it.

The local news is pretty miserable at giving up-to-date info, unhealthy so they didn’t even have anything about the fire that traffic was being diverted around on McKnight Road last night at about 1:30am. My guess on that is that it was a natural gas fire. On my way to the city today, there were still fire-crews and policemen all over the block.

Since I have little confidence in the timeliness of the local old-fashioned media outlets (they try… but they’re years behind), I had several sites open. The closest thing I could find was that a man was shot to death last night after firing at policemen in the Strip District.

My brother had stepped out for a second when I suddenly heard and felt a deep rumbling. It reminded me of explosions. The first natural response is denial “hmm, I wonder what completely normal event that could be… a dump truck, construction in this building, fireworks for some celebration?”. After a few seconds denial was shot… something was up. I went to the window to see if I could find any visual explanation because the rumbling was still going on and was very loud.

I couldn’t see any dust clouds, fireworks, crashes or anything else that would explain what was going on. It was somewhere in the range of 5 to 10 seconds total and I could definitely feel that the building was shaking. Right after it stopped, my brother came back into the room and said “did you feel that?”.

Thinking back to 9/11 and basically any other your-skyscraper-is-pwned event, the secret of survival is NOT to be standing on the 42nd floor wondering what is going on. You GTFO and figure it out later.

So we just bolted to the stairs and started heading down. At about floor 38 it sunk in that this was going to take awhile. After quite some time of running for our lives down the stairs and out the building (that’s a lot of stairs) we got outside and realized that the building looked fine, there was no panic in the streets and the security staff seemed not to have noticed anything.

Shweet, everyone gets to live.

My brother said there was construction on the floor below him, so maybe something really weird was happening there. It still didn’t totally make sense though. We approached a security guard and had a somewhat amusing conversation:
“Hi, um, did you feel the building shake a couple of minutes ago?”
“Yeah.”
(pause) “… do you know what it was?”
“That was the implosion.”
(pause) “… what implosion?”
“The old St. Francis Hospital they’re getting rid of for the new [Penguins] stadium.”

Thanks for the forewarning, local news websites! :-P

We realized ahead of time that it was probably no big problem, but the evidence we had pointed to street-level being a much better location than a skyscraper while we figured out what happened. I’m glad that we made the choice to bolt so quickly instead of thinking about whether or not we’d be embarrassed by running for no reason. If only one out of every 1,000 times I’m in a situation that abnormal there is actually a catastrophe, I’d be more than happy to sacrifice the 5 to 10 minutes getting to a safe place each time before figuring out what happened.

That’ll get your blood flowing!

UPDATES: Two quick observations…
1: out the window, way out to the right (between two other buildings) is the wreckage of the implosion. If I’d looked that direction well enough I might have been able to see the cloud. That would have been sweet (and saved me a workout).
2: Geek-reference: We’re on floor 42… Hitchhikers Guide… “Don’t Panic”. LOL. I should have brought a towel.

New kind of spam: Invite-Spam

I’ve noticed over the past week or so that the spammers have a new trick up their sleeves. Within the last week I’ve gotten invites to iLike.com, information pills IMVU, glaucoma and myYearbook from people I don’t know, viagra sent to an email address that I don’t really use (it’s forwarded to the same place like all the rest, but I don’t give it to anyone).

Bot-nets

I’ve had to fight spammers quite a bit on LyricWiki.org, and I’m beginning to realize a little bit more about why things work the way they work. As far as I can tell, the state-of-the-art in spamming is that tech-criminals build up bot-nets and then sell them as spamming machines. They use the zombies to attack popular technology in ways that uses other people’s web-servers to send out spam. This way, they can use the reputation of these servers to assure higher delivery-rates and they can count on the people running the servers to try to keep their reputation w/spam-filters as high as possible.

For a little more background for the uninformed: a bot-net is a vast array of hacked computers (zombies) that can be controlled remotely. Basically these are just everyday people who have been infected and are none-the-wiser. Years ago when your computer got infected, you generally got viruses that caused a ton of popups and eventually you sought help to remove the viruses. But with today’s bot-nets, the infected user generally has no knowledge of the problem and therefore doesn’t clean off their computer. When the bot-herder (who runs the bot-net) wants to do something, they use Trojan Horses which they’ve installed on the computer to send updates with what the computer should silently do.

For instance, I run MediaWiki on LyricWiki.org, and many bots have been trained to vandalize pages with random letters (I’m assuming it’s random… it might actually be a tracking-code) which they later come back and check for. If the wiki is not well-patrolled, then they come back and spam these pages with links. This way, they don’t have to reveal what product they are promoting unless they know it is some small wiki potentially with low resources – this prevents them from being tracked down by huge companies and reported to authorities. An added bonus of the bot-net approach is that each computer has a different IP address, so it’s hard to block all of them.

Invite-Spam

In this new flavor of spam, it appears bot-nets are signing up for profiles at social networking sites, and sending out invites to victims. This is a great way to use other sites’ reputable servers to send out spam that is highly likely to get delivered and also to make it through contextual spam filtering (since they look like any other invite).

This creates an interesting conflict for the sites who are being used to send the spam: on the one hand, these bots are out promoting them for free, getting new users to sign up out of curiosity (“Do I know this person? The name sounds vaguely familiar…”). On the other hand, these are ill-gotten users, and the spam that’s being sent out probably moves their servers on to more and more blacklists. Both options are a mixed-bag, and in the end I feel that it’s always best in business to do the right thing without immolating yourself. You didn’t earn these new users, so just take a stand and try to solve the spamming issue if you can. Aye, there’s the rub: often, a startup’s most rare asset is time. How much time should a company devote to trying to fix a problem like this? They could be out promoting their site, adding new features, or fixing bugs. They’re always understaffed, and there is always more work to be done.

This is a hard problem to deal with since you’re either protecting strangers from a bunch of spam that’s coming from your servers (which you really had nothing to do with), or you’re adding features for your users. A tough call to make. Hopefully some of these companies can co-operate to come up with a technical solution that they can share amongst each other to make it practicable for them all to implement it. The three companies whose servers spammed me aren’t even direct competitors – one is chat (IMVU), one is youth social-networking (myYearbook), and one is music-focused (iLike).

I’ve emailed a friend at one of the companies and explained the situation. It will be interesting to see how they respond.

PS: Please don’t comment about just adding a CAPTCHA. Those things are horribly useless against talented programmers and have an inherent “economic” flaw. I’ll probably write more about it later, but to put it simply, every time I see a site using “ReCAPTCHA” in a place where they should have actual decent Turing-test security, I cringe. It doesn’t do that!