A few days ago I blogged about my plans to make RSS Bandit a desktop client for Google Reader. As part of that process I needed to verify that it is possible to programmatically interact with Google Reader from a desktop client in a way that provides a reasonable user experience. To this end, I wrote a command line client in IronPython based on the documentation I found at the pyrfeed Website.

The command line client isn't terribly useful on its own as a way to read your feeds but it might be useful for other developers who are trying to interact with Google Reader programmatically who would learn better from  code samples than reverse engineered API documentation.


PS: Note the complete lack of error handling. I never got a hang of error handling in Python let alone going back and forth between handling errors in Python vs. handling underlying .NET/CLR errors.

import sys
from System import *
from System.IO import *
from System.Net import *
from System.Text import *
from System.Globalization import DateTimeStyles
import clr
from System.Xml import *
from System.Web import *

# USAGE: ipy greader.py <Gmail username> <password> <path-to-directory-for-storing-feeds>
# username & password are required
# feed directory location is optional, defaults to C:\Windows\Temp\

auth_url          = rhttps://www.google.com/accounts/ClientLogin?continue=http://www.google.com&service=reader&source=Carnage4Life&Email=%s&Passwd=%s
feed_url_prefix   = rhttp://www.google.com/reader/atom/
api_url_prefix    = rhttp://www.google.com/reader/api/0/
feed_cache_prefix = r"C:\\Windows\Temp\\"
add_url           = r"http://www.google.com/reader/quickadd"

(add_label, remove_label) = range(1,3)

class TagList:
    """Represents a list of the labels/tags used in Google Reader"""
    def __init__(self, userid, labels):
        self.userid = userid
        self.labels = labels

class SubscriptionList:
    """Represents a list of RSS feeds subscriptions"""
    def __init__(self, modified, feeds):
        self.modified = modified
        self.feeds    = feeds

class Subscription:
    """Represents an RSS feed subscription"""
    def __init__(self, feedid, title, categories, firstitemmsec):
        self.feedid        = feedid
        self.title         = title
        self.categories    = categories
        self.firstitemmsec = firstitemmsec

def MakeHttpPostRequest(url, params, sid):
    """Performs an HTTP POST request to a Google service and returns the results in a HttpWebResponse object"""
    req = HttpWebRequest.Create(url)
    req.Method = "POST"
    SetGoogleCookie(req, sid)

    encoding = ASCIIEncoding();
    data     = encoding.GetBytes(params)

    req.ContentLength = data.Length
    resp = req.GetResponse()
    return resp

def MakeHttpGetRequest(url, sid):
    """Performs an HTTP GET request to a Google service and returns the results in an XmlDocument"""
    req          = HttpWebRequest.Create(url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    doc          = XmlDocument()
    return doc

def GetToken(sid):
    """Gets an edit token which is needed for any edit operations using the Google Reader API"""
    token_url = api_url_prefix + "token"
    req          = HttpWebRequest.Create(token_url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    return reader.ReadToEnd()

def MakeSubscription(xmlNode):
    """Creates a Subscription class out of an XmlNode that was obtained from the feed list"""
    id_node     = xmlNode.SelectSingleNode("string[@name='id']")
    feedid      = id_node and id_node.InnerText or ''
    title_node  = xmlNode.SelectSingleNode("string[@name='title']")
    title       = title_node and title_node.InnerText or ''
    fim_node    =  xmlNode.SelectSingleNode("string[@name='firstitemmsec']")
    firstitemmsec = fim_node and fim_node.InnerText or ''
    categories  = [MakeCategory(catNode) for catNode in xmlNode.SelectNodes("list[@name='categories']/object")]
    return Subscription(feedid, title, categories, firstitemmsec)

def MakeCategory(catNode):
    """Returns a tuple of (label, category id) from an XmlNode representing a feed's labels that was obtained from the feed list"""
    id_node     = catNode.SelectSingleNode("string[@name='id']")
    catid       = id_node and id_node.InnerText or ''
    label_node  = catNode.SelectSingleNode("string[@name='label']")
    label       = label_node and label_node.InnerText or ''
    return (label, catid)

def AuthenticateUser(username, password):
    """Authenticates the user and returns a username/password combination"""
    req = HttpWebRequest.Create(auth_url % (username, password))
    reader = StreamReader(req.GetResponse().GetResponseStream())
    response = reader.ReadToEnd().split('\n')
    for s in response:
        if s.startswith("SID="):
            return s[4:]

def SetGoogleCookie(webRequest, sid):
    """Sets the Google authentication cookie on the HttpWebRequest instance"""
    cookie = Cookie("SID", sid, "/", ".google.com")
    cookie.Expires = DateTime.Now + TimeSpan(7,0,0,0)
    container      = CookieContainer()
    webRequest.CookieContainer = container

def GetSubscriptionList(feedlist, sid):
    """Gets the users list of subscriptions"""
    feedlist_url = api_url_prefix + "subscription/list"
    #download the JSON-esque XML feed list
    doc = MakeHttpGetRequest(feedlist_url, sid)

    #create subscription nodes
    feedlist.feeds  = [MakeSubscription(node) for node in doc.SelectNodes("/object/list[@name='subscriptions']/object")]
    feedlist.modified = False

def GetTagList(sid):
  """Gets a list of the user's tags"""
  taglist_url = api_url_prefix + "tag/list"
  doc = MakeHttpGetRequest(taglist_url, sid)
  #get the user id needed for creating new labels from Google system tags

  userid = doc.SelectSingleNode("/object/list/object/string[contains(string(.), 'state/com.google/starred')]").InnerText
  userid = userid.replace("/state/com.google/starred", "");
  userid = userid[5:]
  #get the user-defined labels
  tags = [node.InnerText.Replace("user/" + userid + "/label/" ,"") for node in doc.SelectNodes("/object/list[@name='tags']/object/string[@name='id']") if node.InnerText.IndexOf( "/com.google/") == -1 ]
  return TagList(userid, tags)

def DownloadFeeds(feedlist, sid):
    """Downloads each feed from the subscription list to a local directory"""
    for feedinfo in feedlist.feeds:
        unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)
        oneweek_ago   = DateTime.Now - TimeSpan(7,0,0,0)
        ifmodifiedsince = oneweek_ago - unixepoch
        feed_url = feed_url_prefix + feedinfo.feedid +  "?n=25&r=o&ot=" + str(int(ifmodifiedsince.TotalSeconds))
        continuation = True
        continuation_token = ''
        feedDoc      = None

        while True:
            print "Downloading feed at %s" % (feed_url  + continuation_token)
            doc = MakeHttpGetRequest(feed_url + continuation_token, sid)
            continuation_node     = doc.SelectSingleNode("//*[local-name()='continuation']")
            continuation_token    = continuation_node and ("&c=" + continuation_node.InnerText) or ''

            if feedDoc is None:
                feedDoc = doc
                for node in doc.SelectNodes("//*[local-name()='entry']"):
                    node = feedDoc.ImportNode(node, True)

            if continuation_token == '':

        print "Saving %s" % (feed_cache_prefix + feedinfo.title + ".xml")
        feedDoc.Save(feed_cache_prefix + feedinfo.title + ".xml")

def ShowSubscriptionList(feedlist, sid):
    """Displays the users list of subscriptions including the labels applied to each item"""
    if feedlist.modified:
        GetSubscriptionList(feedlist, sid)
    count = 1
    for feedinfo in feedlist.feeds:
        print "%s. %s (%s)" % (count, feedinfo.title, [category[0] for category in feedinfo.categories])
        count = count + 1

def Subscribe(url, sid):
    """Subscribes to the specified feed URL in Google Reader"""
    params        = "quickadd=" + HttpUtility.UrlEncode(url) + "&T=" + GetToken(sid)
    resp = MakeHttpPostRequest(add_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "%s successfully added to subscription list" % url
        return True
        print resp.StatusDescription
        return False

def Unsubscribe(index, feedlist, sid):
    """Unsubscribes from the feed at the specified index in the feed list"""
    unsubscribe_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=unsubscribe&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(unsubscribe_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully removed from subscription list" % feed.title
        return True
        print resp.StatusDescription
        return False

def Rename(new_title, index, feedlist, sid):
    """Renames the feed at the specified index in the feed list"""
    api_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + new_title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(api_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully renamed to '%s'" % (feed.title, new_title)
        return True
        print resp.StatusDescription
        return False

def EditLabel(label, editmode, userid, feedlist, index, sid):
    """Adds or removes the specified label to the feed at the specified index depending on the edit mode"""
    full_label = "user/" + userid + "/label/" + label
    label_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid

    if editmode == add_label:
        params = params + "&a=" + full_label
    elif editmode == remove_label:
        params = params + "&r=" + full_label

    resp = MakeHttpPostRequest(label_url, params, sid)
    if resp.StatusCode == HttpStatusCode.OK:
        print "Successfully edited label '%s' of feed '%s'" % (label, feed.title)
        return True
        print resp.StatusDescription
        return False

def MarkAllItemsAsRead(index, feedlist, sid):
    """Marks all items from the selected feed as read"""
    unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)

    markread_url = api_url_prefix + "mark-all-as-read"
    feed = feedlist.feeds[index]
    params = "s=" + feed.feedid + "&T=" + GetToken(sid) + "&ts=" + str(int((DateTime.Now - unixepoch).TotalSeconds))
    MakeHttpPostRequest(markread_url, params, sid)
    print "All items in '%s' have been marked as read" % feed.title

def GetFeedIndexFromUser(feedlist):
    """prompts the user for the index of the feed they are interested in and returns the index as the result of this function"""
    print "Enter the numeric position of the feed from 1 - %s" % (len(feedlist.feeds))
    index = int(sys.stdin.readline().strip())
    if (index < 1) or (index > len(feedlist.feeds)):
        print "Invalid index specified: %s" % feed2label_indx
        return -1
        return index

if __name__ == "__main__":
       if len(sys.argv) < 3:
           print "ERROR: Please specify a Gmail username and password"
           if len(sys.argv) > 3:
               feed_cache_prefix = sys.argv[3]

           SID = AuthenticateUser(sys.argv[1], sys.argv[2])
           feedlist = SubscriptionList(True, [])
           GetSubscriptionList(feedlist, SID)
           taglist = GetTagList(SID)

           options = "***Your options are (f)etch your feeds, (l)ist your subscriptions, (s)ubscribe to a new feed, (u)nsubscribe, (m)ark read , (r)ename, (a)dd a label to a feed, (d)elete a label from a feed or (e)xit***"
           print "\n"

           while True:
               print options
               cmd = sys.stdin.readline()
               if cmd == "e\n":
               elif cmd == "l\n": #list subscriptions
                   ShowSubscriptionList(feedlist, SID)
               elif cmd == "s\n": #subscribe to a new feed
                   print "Enter url: "
                   new_feed_url = sys.stdin.readline().strip()
                   success = Subscribe(new_feed_url, SID)

                   if feedlist.modified == False:
                       feedlist.modified = success
               elif cmd == "u\n": #unsubscribe from a feed
                   feed2remove_indx = GetFeedIndexFromUser(feedlist)
                   if feed2remove_indx != -1:
                       success = Unsubscribe(feed2remove_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "r\n": #rename a feed
                   feed2rename_indx = GetFeedIndexFromUser(feedlist)
                   if feed2rename_indx != -1:
                       print "'%s' selected" % feedlist.feeds[feed2rename_indx -1].title
                       print "Enter the new title for the subscription:"
                       success = Rename(sys.stdin.readline().strip(), feed2rename_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "f\n": #fetch feeds
                   feedlist = DownloadFeeds(feedlist, SID)
               elif cmd == "m\n": #mark all items as read
                   feed2markread_indx = GetFeedIndexFromUser(feedlist)
                   if feed2markread_indx != -1:
                       MarkAllItemsAsRead(feed2markread_indx-1, feedlist, SID)
               elif (cmd == "a\n") or (cmd == "d\n"): #add/remove a label on a feed
                   editmode = (cmd == "a\n") and add_label or remove_label
                   feed2label_indx = GetFeedIndexFromUser(feedlist)
                   if feed2label_indx != -1:
                       feed = feedlist.feeds[feed2label_indx-1]
                       print "'%s' selected" % feed.title
                       print "%s" % ((cmd == "a\n") and "Enter the new label:" or "Enter the label to delete:")
                       label_name = sys.stdin.readline().strip()
                       success = EditLabel(label_name, editmode, taglist.userid, feedlist, feed2label_indx-1, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
                   print "Unknown command"

Now Playing: DJ Drama - Cannon (Remix) (Feat. Lil Wayne, Willie The Kid, Freeway And T.I.)


Categories: Programming

December 30, 2007
@ 11:19 PM


POST /reader/api/0/subscription/edit HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Host: www.google.com
Cookie: SID=DQAAAHoAAD4SjpLSFdgpOrhM8Ju-JL2V1q0aZxm0vIUYa-p3QcnA0wXMoT7dDr7c5FMrfHSZtxvDGcDPTQHFxGmRyPlvSvrgNe5xxQJwPlK_ApHWhzcgfOWJoIPu6YuLAFuGaHwgvFsMnJnlkKYtTAuDA1u7aY6ZbL1g65hCNWySxwwu__eQ
Content-Length: 182
Expect: 100-continue



HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Set-Cookie: GRLD=UNSET;Path=/reader/
Transfer-Encoding: chunked
Cache-control: private
Date: Sun, 30 Dec 2007 23:08:51 GMT
Server: GFE/1.3

<html><head><title>500 Server Error</title>
<style type="text/css">
      body {font-family: arial,sans-serif}
      div.nav {margin-top: 1ex}
      div.nav A {font-size: 10pt; font-family: arial,sans-serif}
      span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}
      div.nav A,span.big {font-size: 12pt; color: #0000cc}
      div.nav A {font-size: 10pt; color: black}
      A.l:link {color: #6f6f6f}
<body text="#000000" bgcolor="#ffffff"><table border="0" cellpadding="2" cellspacing="0" width="100%"></table>
<table><tr><td rowspan="3" width="1%"><b><font face="times" color="#0039b6" size="10">G</font><font face="times" color="#c41200" size="10">o</font><font face="times" color="#f3c518" size="10">o</font><font face="times" color="#0039b6" size="10">g</font><font face="times" color="#30a72f" size="10">l</font><font face="times" color="#c41200" size="10">e</font>&nbsp;&nbsp;</b></td>
<tr><td bgcolor="#3366cc"><font face="arial,sans-serif" color="#ffffff"><b>Error</b></font></td></tr>
<blockquote><h1>Server Error</h1>
The server encountered a temporary error and could not complete your request.<p></p> Please try again in 30 seconds.

<table width="100%" cellpadding="0" cellspacing="0"><tr><td bgcolor="#3366cc"><img alt="" width="1" height="4"></td></tr></table></body></html>


Categories: Platforms | Programming | XML Web Services

With the v1.6.0.0 out of the door, I've shipped what I think is our most interesting feature in years and resolved  an issue that was making RSS Bandit a nuisance to lots of sites on the Internet.

The feature I'm currently working on is an idea I'm calling supporting multiple feed sources. For a few years, we've had support for roaming your feed list and read/unread state between two computers using an FTP site, a shared folder or NewsGator Online. Although useful, this functionality has always seemed bolted on. You have to manually upload and download feeds from these locations instead of things happening automatically and transparently as they do with the typical mail reader + mail server scenario (e.g. Outlook + Exchange) which is the most comparable model.

My original idea for the feature was simply to make the existing NewsGator and RSS Bandit integration work automatically instead of via a manual download so it could be more like Outlook + Exchange. Then I realized that there could never be full integration because there are feeds that RSS Bandit can read that a Web-based feed reader like NewsGator Online can not (e.g. feeds within your company's intranet if you read feeds at work). This meant that we would need an explicit demarcation of feeds that roamed in NewsGator Online and those that were local to that machine.

In addition, I got a bunch of feedback from our users that there were a lot more of them using Google Reader than using NewsGator Online. Since I was already planning to do a bunch of work to streamline synchronizing with NewsGator Online, adding another Web-based feed reader didn't seem like a stretch. I'm currently working on a command line only prototype in IronPython which uses the information from the reverse engineered Google Reader API documentation to retrieve and update my feed subscriptions. I'm about part way through and it seems that the Google Reader API is as full featured as the NewsGator API so we should be good to go.  I should be able to integrate this functionality into RSS Bandit within the next few weeks.

The tricky part will be how the UI integration should work. For example, Google Reader doesn't support hierarchical folders of feeds like we do. Instead there is a flat namespace of tag names but each feed can have one or more tags applied to it. On the flip side, NewsGator Online uses the hierarchical folder model like RSS Bandit does. I'm considering moving to a more Google Reader friendly model in the next release where we flatten hierarchies and instead go with a flat tag-based approach to organizing feeds. For the case, of feeds synchronized from NewsGator Online we will prevent users from putting feeds in multiple categories since that won't be supported by the service.

Now Playing: Eminem - Evil Deeds


Categories: RSS Bandit

December 26, 2007
@ 05:22 PM

The new version of RSS Bandit is now available. This release fixes a bug that causes the application to repeatedly request favicons from a feed's website in a manner that eventually resembles a denial of service attack. The new feature in this release is the [Top Stories] button.

The rationale for the new feature is given in Omar Shahine's blog post entitled Google Reader needs Mute. Omar wrote

Here is a feature that Google Reader needs: Mute.

Why, Cause subscribing to a lot of tech bloggers, a-list folks, and news outlets is extremely annoying when they write about the same thing. You get tired of seeing dozens or hundreds of posts about Kindle, Facebook, ThinkSecret and on and on.

These days I feel like my blogging info is like the local news (which I stopped watching some time back in high school).

So, please google, let me mute or mark read all feed items on a certain topic as read and save me the hassle of suffering through the repetition and pain.

The Top Stories feature is meant to target exactly this scenario. When you click on it, you get a list of the most recently popular items among your subscriptions. From there you can hit "Mark Items as Read" and mark all of the linking posts as read once you've gotten the gist of the story.

We don't have a Mute option where all posts that link to a story are automatically marked as read or deleted after being downloaded. This seems like overkill to me but would love to get some feedback from our users if this would be a desirable feature.

This release is available in the following languages; English, German, Polish, French, Simplified Chinese, Russian, Brazilian Portuguese, Turkish, Dutch, Italian, Serbian and Bulgarian.

Download the installer from RssBandit1.6.0.0a_Installer.zip . A snapshot of the source code will be available later in the week as a source code release.

New Features

  • Top Stories button shows the ten most recently popular links in your subscriptions.
  • Twitter plugin enables posting tweets about news stories or responding to tweets in an RSS feed.

Major Bug Fixes

  • Del.ico.us plugin silently fails when posting items with tags containing special characters like '#' or '+'
  • Downloading feed list from NewsGator Online deletes local machine and intranet feeds
  • KeyNotFoundException if "Mark All Items as Read" clicked shortly after changing the URL for a subscribed feed.
  • 100% CPU used when an RSS feed with no <channel> element is encountered.
  • Downloading favicons happens several times while the application is running instead of just once.
  • The "Check for updates" feature would sometimes result in the application crashing.


Categories: RSS Bandit

There is a post in a Slashdot user Felipe Hoffa's journal entitled Google Reader shares private data, ruins Christmas (alternate link) which contains a very damning indictment of the Google Reader team. It all starts with the release of the Sharing with Friends feature which is described below

We've just launched a new feature that makes it easier to follow your
friends' shared items in Google Reader. Check out the announcement on
our blog:

The short description of it is this: If any of your friends from
Google Talk are using Reader and sharing items, you'll see them listed
in your sidebar under "Friends' shared items." Similarly, they'll be
able to see any items you're sharing. You can hide items from any
friend you don't want to see, and you can also opt out of sharing by
removing all your shared items. For full details, check out the
following help articles:

This is still a very experimental feature, so we'd love to hear what
you think of it.

Unsurprisingly, there has been a massive negative outcry about this feature. The main reason for the flood of complaints (many of which are excerpted in Felipe Hoffa's journal) is the fact that the Google Reader team has decided to define "friends" as anyone in your Gmail contact list.

On the surface this seems a lot like the initial backlash over the Facebook news feed. Google Reader users are complaining about their Gmail contacts having an easy way of viewing a list of feeds the user had already made public. I imagine that the Google folks have begun to make arguments like "If Facebook can get away with it, we should be able to as well" to justify some of their recent social networking moves such as this one and Google Profiles.

However the Google Reader team made failed to grasp two key aspects of social software  here:

  1. Internet Users Don't Fully Grasp that Everything on the Web is Public Unless Behind Access Controls: To most users of the Internet, if I create a Web page and don't tell anyone about it, then the page is private and known only to me. Similarly, if I create a blog or shared bookmarks on a social bookmarking site then no one should know about it unless I send them links to the page. 

    As someone who's worked on the Access Control technology behind Windows Live sharing initiatives from SkyDrive to Windows Live Spaces I know this isn't the case. The only way to make something private on the Web is to place it behind access controls that require users to be authenticated and authorized before they can view the content you've created.

    The Google Reader developers assumed that their average users were like me and would assume that their content was public even if it had an obfuscated URL. The problem here is that even if it was "technically" true that Shared Items in Google Reader were public although with an obfuscated URL, the fact that there was URL obfuscation involved implies that they realized that users didn't want their Shared Items to be PUBLIC. Arguing that the items were "technically" public and thus justifying broadcasting the items to the user's Gmail contacts seems dubious at best.

  2. Friends in One Context are not Necessarily Friends in Another: The bigger problem is that the folks at Google are trying to build a unified social graph across all their application as a way to compete with the powerful social network that Facebook has built. I've previously talked about the problems faced by a unified social graph based on what I've seen working on the social graph contacts platform for Windows Live. The fact that I send someone email does not mean that I want to make them an IM buddy nor does it mean that I want them to have access to all the items I find interesting in my RSS feeds since some of these items may reveal political, religious or even sexual leanings that I did not mean to share with someone I just happen to exchange email with frequently.

    Deciding that instead of having GTalk IM buddies, Gmail contacts, and Google Reader friends that users should just have Google Friends may simplify things for some program managers at Google but it causes problems for users who now have to deal with the consequence of their different social contexts beginning to bleed into each other. Even though Facebook is a single application, they have this problem with users having to manage contacts from multiple social contexts (family, friends, co-workers, etc) within a single application let alone applications with extremely different uses.

My assumption is that the folks at Google Reader will put in a some time over the weekend and will add granular privacy controls as recommended by Robert Scoble. I also predict that we will see more ham fisted attempts to grow their social graph at the expense of user privacy from various large [and small] Web properties including Facebook in 2008. 

In the words of Scott McNealy, "Privacy is Dead. Get Over It"


Categories: Social Software

December 26, 2007
@ 05:21 PM

Justin Rudd writes in his blog post entited Your Attention Please

After 3 years and 3 months, I am leaving my position at Amazon.com on December 31st.
My next “gig” is one that I am extraordinarily excited about.  I’m going to Microsoft to be part of the Live Labs team.  This group really excites me because it gives me a chance to find new areas for Microsoft Live to get into, to expand on what Microsoft Live already has, work closely with Microsoft Research, etc.  This is a job that really excites the tinkerer side of my brain.  I can’t wait to get started.

Many thanks to Dare Obasanjo for being my employee referral

Justin is my second official referral of someone I've "known" via reading their blog. I hope he ends up working at Microsoft a little longer than the last blog friend I referred. :)


Categories: Personal

Sometime last week, Amazon soft launched Amazon SimpleDB, a hosted service for storing and querying structured data. This release plugged a hole in their hosted Web services offerings which include the Amazon Simple Storage Service (S3) and the Amazon Elastic Compute Cloud (EC2). Amazon’s goal of becoming the “Web OS” upon which the next generation of Web startups builds upon came off as hollow when all they gave you was BLOB storage and hosted computation but not structured storage. With SimpleDB, they’re almost at the point where all the tools you need for building the next del.icio.us or Flickr can be provided by Amazon’s Web Services. The last bit they need to provide is actual Web hosting so that developers don’t need to resort to absurd dynamic DNS hacks when interacting with their Amazon applications from the Web.

The Good: Comoditizing hosted services and getting people to think outside the relational database box

The data model of SimpleDB is remarkably similar to Google’s BigTable in that instead of having multiple tables and relations between them, you get a single big giant table which is accessed via the tuple of {row key, column key}. Although, both SimpleDB and BigTable allow applications to store multiple values for a particular tuple, they do so in different ways. In BigTable, multiple values are additionally keyed by timestamp so I can access data such using tuples such as {”http://www.example.com”,  “incoming_links”, “12–12–2007”}. In Amazon’s SimpleDB I’d simply be able to store multiple values for a particular key pair so I could access {”Dare Obasanjo”, “weblogs”} and it would return (“http://www.25hoursaday.com/weblog”, “http://blogs.msdn.com/dareobasanjo”, “http://carnage4life.spaces.live.com”).

Another similarity that both systems share, is that there is no requirement that all “rows” in a table share the same schema nor is there an explicit notion of declaring a schema. In SimpleDB, tables are called domains, rows are called items and the columns are called attributes. 

It is interesting to imagine how this system evolved. From experience, it is clear that everyone who has had to build a massive relational database that database joins kill performance. The longer you’ve dealt with massive data sets, the more you begin to fall in love with denormalizing your data so you can scale. Taking to its logical extreme, there’s nothing more denormalized than a single table. Even better, Amazon goes a step further by introducing multivalued columns which means that SimpleDB isn’t even in First Normal Form whereas we all learned in school that the minimum we should aspire to is Third Normal Form.

I think it is great to see more mainstream examples that challenge the traditional thinking of how to store, manage and manipulate large amounts of data.

I also think the pricing is very reasonable. If I was a startup founder, I’d strongly consider taking Amazon Web Services for a spin before going with a traditional LAMP or WISC approach.  

The Bad: Eventual Consistency and Data Values are Weakly Typed

The documentation for the PutAttributes method has the following note

Because Amazon SimpleDB makes multiple copies of your data and uses an eventual consistency update model, an immediate GetAttributes or Query request (read) immediately after a DeleteAttributes or PutAttributes request (write) might not return the updated data.

This may or may not be a problem depending on your application. It may be OK for a del.icio.us style application if it took a few minutes before your tag updates were applied to a bookmark but the same can’t be said for an application like Twitter. What would be useful for developers would be if Amazon gave some more information around the delayed propagation such as average latency during peak and off-peak hours.

There is another interesting note in the documentation of the Query method which states

 Lexicographical Comparison of Different Data Types

Amazon SimpleDB treats all entities as UTF-8 strings. Keep this in mind when storing and querying different data types, such as numbers or dates. Design clients to convert their data into an appropriate string format, so that query expression return expected results.

The following are suggested methods for converting different data types into strings for proper lexicographical order enforcement:

  • Positive integers should be zero-padded to match the largest number of digits in your data set. For example, if the largest number you are planning to use in a range is 1,000,000, every number that you store in Amazon SimpleDB should be zero-padded to at least 7 digits. You would store 25 as 0000025, 4597 as 0004597, and so on.

  • Negative integers should be offset and turned into positive numbers and zero-padded. For example, if the smallest negative integer in your data set is -500, your application should add at least 500 to every number that you store. This ensures that every number is now positive and enables you to use the zero-padding technique.

  • To ensure proper lexicographical order, convert dates to the ISO 8601 format.

[Note] Note

Amazon SimpleDB provides utility functions within our sample libraries that help you perform these conversions in your application.

This is ghetto beyond belief. I should know ahead of time what the lowest number will be in my data set and add/subtract offsets from data values when inserting and retrieving them from SimpleDB? I need to know the largest number in my data set and zero pad to that length? Seriously, WTF?

It’s crazy just thinking about the kinds of bugs that could be introduced into applications because of this wacky semantics and the recommended hacks to get around them. Even if this is the underlying behavior of SimpleDB, Amazon should have fixed this up in an APIs layer above SimpleDB then exposed that instead of providing ghetto helper functions in a handful of popular programming languages then crossing their fingers hoping that no one hits this problem.  

The Ugly: Web Interfaces, that Claim to be RESTful but Aren’t

I’ve talked about APIs that claim to be RESTful but aren’t in the past but Amazon’s takes the cake when it comes to egregious behavior. Again, from the documentation for the PutAttributes method we learn

Sample Request

The following example uses PutAttributes on Item123 which has attributes (Color=Blue), (Size=Med) and (Price=14.99) in MyDomain. If Item123 already had the Price attribute, this operation would replace the values for that attribute.

&AWSAccessKeyId=[valid access key id]

Sample Response

<PutAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">

Wow. A GET request with a parameter called Action which modifies data? What is this, 2005? I thought we already went through the realization that GET requests that modify data are bad after the Google Web Accelerator scare of 2005?

Of course, I'm not the only one that thinks this is ridonkulous. See similar comments from Stefan Tilkov, Joe Gregorio, and Steve Loughran. Methinks, someone at Amazon needs to go read some guidelines on building RESTful Web services.

Bonus points to Subbu Allamaraju for refactoring the SimpleDB API into a true RESTful Web service

Speaking of ridonkulous APIs trends, it seems the SimpleDB Query method follows the lead of the Google Base GData API in stuffing a SQL-like query language into the query string parameters of HTTP GET requests. I guess it is RESTful, but Damn is it ugly.

Now playing: J. Holiday - Suffocate


Categories: Platforms | XML Web Services

Two days ago a bug was filed in the RSS Bandit bug tracker that claimed that Slashdot is banning RSS Bandit users because the application is acting like a Denial of Service client. The root cause of the problem is a bug in the logic for downloading favicons where the application repeatedly attempts to download favicons from each site in your subscription list if there is an error accessing or processing one of the favicons in your list of subscriptions.

We will be releasing version of RSS Bandit this weekend which remedies this problem. I plan to spend all of Saturday fixing as many bugs as I can and polishing up the Top Stories feature. We will likely ship the release sometime on Sunday. It should be noted that this release will be the first version RSS Bandit that will require version 2.0 of the Microsoft .NET Framework or later.

I’d like to apologize to everyone who has been inconvenienced by this issue. Thanks for your support and the great feedback you’ve been sending us.

Now playing: Cool Breeze - Watch for the Hook (Dungeon Family Remix) (feat. Outkast & Goodie Mob)


Categories: RSS Bandit

In the past three months, I’ve seen three moves by Google that highlight that not only is their strategic vision becoming more questionable but their engineering talent has also begun to show signs of being seriously deficient

This year is going to be a watershed year for the company. They are eroding a lot of the geek cred they’ve built up over the past decade. That will be hard to regain once it is lost.

In the meantime, I’ve noticed an uptick in the quiet smart folks you don’t see heralded in blogs turning down offers from Google and picking another horse when the job offers come knocking. Watershed year, indeed.

Now playing: John Legend - I Can Change (feat. Snoop Doggy Dogg)


December 19, 2007
@ 03:14 PM

I’m now at the point where I really, really, really want to blog but have too much going on at work and at home to take the time out to do it. To deal with this I’ve created a Twitter account. You can follow me at http://twitter.com/Carnage4Life.

Things I’ll eventually write about in my blog

  • Amazon Simple DB
  • A new release of RSS Bandit shipping this weekend
  • Thoughts on integrating RSS Bandit and Google Reader based on information found from the pyrfeed documentation 
  • Expressiveness of Python vs. C# 3.0

In the meantime, you can get my thoughts on various topics in 140 characters or less from Twitter.

PS: I’m amazed at how obnoxious Twitter is about collecting the password to your GMail/Yahoo/Hotmail/etc account so it can spam your friends. At first glance, it looked as if it wouldn’t even let me use the service until I gave up those passwords. This crap has gotten out of hand.

PPS: Anyone got decent recommendations for a Twitter client that works on Vista and XP?

Now playing: N.W.A. - Real N*ggaz Don't Die


Categories: Personal

December 17, 2007
@ 05:52 PM

Recently my Cingular 3125 crapped out and I picked up an AT&T Tilt (aka HTC Kaiser) which I've already developed a love<->hate relationship with. I'd originally considered getting an iPhone but quickly gave up that dream when I realized it didn't integrate well with Microsoft Exchange. When you attend 3 - 5 meetings a day, having a phone that can tell you the room number, topic and attendees of your next meeting as you hop, skip and jump from meeting room to meeting room is extremely useful.

There's a lot I like about the phone. The QWERTY keyboard and wide screen make writing emails and browsing the Web a much better experience than on my previous phone. In addition, being able flip out the keyboard & tilt the screen is a spiffy enough effect that it gets ooohs and aaahs from onlookers the first time you do it. Another nice touch is that there are shortcut keys for Internet Explorer, your message center and the Start menu. When I was trying out the phone, the AT&T sales person said I'd become hooked on using those buttons and he was right, without them using the phone would be a lot more cumbersome.

There are some features specific to Windows Mobile 6 that I love. The first is that I can use any MP3, WAV or AAC file as a ringtone. After spending $2.50 for a 20 second snippet of a song I already owned and not being able to re-download the song after switching phones, I decided I wanted no part of this hustle from the cell phone companies. All I needed to do was download MP3 Cutter and I have as many ringtones as I have songs on my iPod. They've also fixed the bug from Windows Mobile 5 where if your next appointment shown on the home screen is for a day other than the current day, clicking on it takes you today's calendar instead of the calendar for that day. My phone also came with Office Mobile which means I can actually read all those Word, Excel and Powerpoint docs I get in email all the time.

So what do I dislike about this phone? The battery life is ridiculously short. My phone is consistently out of battery life at the end of the work day. I've never had this problem with the half dozen phones I've had over the past decade. What's even more annoying is that unlike every other phone I've ever seen, there is no battery life indicator on the main screen. Instead I have to navigate to Start menu->Settings->System->Power if I want to check my battery life. On the other hand, there are redundant indicators showing whether I am on the EDGE or 3G networks where the battery indicator used to be in Windows Mobile 5. Another problem is that voice dialing is cumbersome and often downright frustrating. There is a great rant about this in the post What's Wrong With Windows Mobile and How WM7 and WM8 Are Going to Fix It on Gizmodo which is excerpted below

the day-to-day usage of Windows Mobile isn't what you'd call "friendly," either. In fact, it'd probably punch you in the face if you even made eye contact. Take dialing, for instance. How can the main purpose of a phone—calling someone—be so hard to do?


If you're using a Windows Mobile Professional device, you have a few options, none of which are good:

• You can pull out the stylus to tap in the digits. This requires two hands.

• You can try and use your fingertip to call, which doesn't normally work, so you'll use your fingernail, which does work but, as it results in many misdialed numbers, takes forever.

• You can slide out the keyboard and find the dialpad buried among the QWERTY keys and dial, which requires two hands and intense concentration.

• You can try and bring up the contact list, which takes a long-ass time to scroll through, or you can slide out the keyboard again and search by name. Again, two hands.

• Voice Command has been an option for years, but then again, it kinda works, but it doesn't work well.

• Probably the best way to go is to program your most important numbers into speed dial, as you'll be able to actually talk to the correct person within, say, three button presses.

Compare that to the iPhone, which has just a touchscreen, but gets you to the keypad, your favorites, recent calls or your contact list, all within two key presses of the home screen.

It's amazing to me that there are five or six different options if you want to dial and call a number yet they all are a usability nightmare. One pet peeve that is missing from the Gizmodo rant is that when a call is connected, the keypad is hidden. This means that if you are calling any sort of customer service system (e.g. AT&T wireless, Microsoft's front desk, your cable company, etc) you need to first tap "Keypad" with your fingernail and then deal with the cumbersome dialpad.

So far, I like the phone more than I dislike it.  **** out of *****.

I'd love to see the next version of the iPhone ship with the ability to talk to Exchange and support for 3G, and see whether the next generation of Windows Mobile devices stack up.

Rihanna - Hate That I Love You (feat. Ne-Yo)


Categories: Technology

DISCLAIMER: This post does not reflect the opinions, thoughts, strategies or future intentions of my employer. These are solely my personal opinions. If you are seeking official position statements from Microsoft, please go here.

Last week, Microsoft announced Office Live Workspace which is billed as an online extension to Microsoft Office. Unsurprisingly, the word from the pundits has been uniformly negative especially when comparing it to Google Docs.

An example of the typical pundit reaction to this announcement is Ken Fisher's post on Ars Technica entitled Office Live Workspace revealed: a free 250MB "SharePoint Lite" for everyone where he writes

Office Live Workspace is not an online office suite. The aim of OLW is simple: give web-connected users a no-cost place to store, share, and collaborate on Office documents. To that end, the company will give registered users 250 MB of storage space, which can be used to store documents "in the cloud" or even "host" them for comments by other users equipped with just a web browser (you will be able to manage the access rights of other users). However, and this is important: you cannot create new Office documents with this feature nor can you edit documents beyond adding comments without having a copy of Microsoft Office installed locally.

As you can see, this is not a "Google Docs killer" or even an "answer" to Google Docs. This is not an online office suite, it's "software plus service." Microsoft's move here protects the company's traditional Office business, in that it's really positioned as a value-add to Office, rather than an Office alternative. Microsoft has seen success with its business-oriented SharePoint offering, and Microsoft is taking a kind of "SharePoint Lite" approach with OLW.

The focus of pundits on "an online office suite" and a "Google Docs Killer" completely misses the point when it comes to satisfy the needs of the end user. As a person who is a fan of the Google Docs approach, there are three things I like that it brings to the table

  • it is free for consumers and people with basic needs
  • it enables "anywhere access" to your documents
  • it requires zero install to utilize

The fact that it is Web-based and uses AJAX instead of Win32 code running on my desktop is actually a negative when it comes to responsiveness and feature set. However the functionality of Google Docs hits a sweet spot for a certain class of users authoring certain classes of documents. By the way, this is a textbook example of low-end disruption from Clay Christensen's book "The Innovator's Dilemma". Taking a lesson from much another hyped business book, Money Ball, disruption often happens when the metrics used to judge successful products don't actually reflect the current realities of the market.  

The reality of today's market is that a lot of computer users access their data from multiple computers and perhaps their mobile device during the course of a normal day.  The paradigm of disconnected desktop software is an outdated relic that is dying away. Another reality of today's market is that end users have gotten used to being able to access and utilize world class software for free and without having to install anything thanks to the Googles, Facebooks and Flickrs of the world. When you put both realities together, you get the list of three bullet points above which are the key benefits that Google Docs brings to the table.

The question is whether there is anything Microsoft can do to stem what seems like inevitable disruption by Google Docs and if so, does Office Live Workspace improve the company's chances in any way? I believe the answer to both questions is Yes. If you are already a user of Microsoft's office productivity software then Office Live Workspace gives you a free way to get "anywhere access" to your documents without having to install anything even if the computer does not have Microsoft Office installed.

As I mentioned earlier, a number of pundits have been fairly dismissive of this and declared a no-contest victory for the Google Docs approach. Steven Burke has an article entitled Five Reasons Google Docs Beats Office Live Workspace where he lists a number of ways Google Docs compares favorably Microsoft Workspace. Of his list of five reasons, only one seems like a genuine road block that will hurt adoption by end users. Below are his reasons in bold with my comments underneath.

Steven Burke: Office Live Workspace Does Not Allow You To Create And Edit Documents Within A Web Page. Google Docs Does

This is a lame restriction. I assume this is to ensure that the primary beneficaries of this offering have purchased Microsoft Office (thus it is a software + services play instead of a software as a service play). I can understand the business reasons why this exists but it is often a good business strategy to cannibalize yourself before competitors do it especially when it is clear that such cannibalization is inevitable. The fact that I am tethered to Office in creating new documents is lame. I hope competitive pressure makes this "feature" go away.

Steven Burke: Microsoft Office Live Workspace Has A 250 Mbyte 1,000 Average Office Documents Limitation. Google Docs Does Not.

I don't worry to much about space limitations especially since this is in beta. If Microsoft can figure out how to give people 5GB of space for email in Hotmail and 1GB file storage space in SkyDrive all for FREE, I'm sure we can figure out how to give more than 250MB of storage to people who've likely spent hundreds of dollars buying our desktop software.  

Steven Burke: Microsoft's Office Live WorkSpace Is VaporWare. Google Docs is Real.

The vaporware allegation only makes sense if you think (a) it is never going to ship or (b) you need a solution today. If not, it is a product announcement like any other in the software industry meant to give people a heads up on what's coming down the line. If industry darlings like Apple and Google can get away with it, why single out Microsoft?

Steven Burke: You're Better Off Trusting Google Than Microsoft When It Comes To Web 2.0 Security Issues.

I don't know about you, but over the past year I've heard about several  security flaws in Google's AJAX applications including Cross Site Request Forgery issues in Gmail, leaking people's email addresses via the chat feature of Google presentations, cross site scripting issues that allowed people to modify your documents in Google Docs & Spreadsheets, and lots more. On the flip side, I haven't heard about even half as many security issues in Microsoft's family of Web applications whether they are Office Live, MSN or Windows Live branded.

In fact, one could argue that trusting Google to keep your data secure in their AJAX applications is like trusting a degenerate gambler with your life savings. So far the company has proven to be inept at securing their online services which is problematic if they are pitching to store people's vital business documents.

Steven Burke: Office Live Workspace Is Optimized For Microsoft Office Word, Excel and PowerPoint Data. Google Is Optimized For Web 2.0.

I guess this means Google's service is more buzzword compliant than Microsoft's. So what? At the end of the day, this most important thing is providing value to your customers not repping every buzzword that spews forth from the likes of Mike Arrington and Tim O'Reilly.  

Tomoyasu Hotei - Battle without Honor or Humanity


Categories: Technology

Jeff Atwood has a blog post entitled Sorting for Humans : Natural Sort Order where he writes

The default sort functions in almost every programming language are poorly suited for human consumption. What do I mean by that? Well, consider the difference between sorting filenames in Windows explorer, and sorting those very same filenames via Array.Sort() code:

Implementing a natural sort is more complex than it seems, and not just for the gnarly i20n issues I've hinted at, above. But the Python implementations are impressively succinct

I tried to come up with a clever, similarly succinct C# 3.0 natural sort implementation, but I failed. I'm not interested in a one-liner contest, necessarily, but it does seem to me that a basic natural sort shouldn't require the 40+ lines of code it takes in most languages.

Since I’m still in my “learning Python by mapping it to C#” phase I thought this should be a straightforward task. Below is the equivalent IronPython code for natural sort which is slightly modified from the code posted in Jeff’s post along with what I hoped to be a succint version in C# 2.0. It would definitely be shorter in C# 3.0 [which I don’t plan to start using for another year or so]. The Python snippet below takes advantage of some interesting rules around comparing lists of objects which don’t exist in C#. I’m sure I could reduce the size of the C# code while maintaining readability but my procrastination time is over and I need to get to work. Wink

Natural Sort in IronPython

import re

def sort_nicely( l ):
  """ Sort the given list in the way that humans expect. """
  convert = lambda x: x.isdigit() and int(x) or x
  alphanum = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
  l.sort( key=alphanum ) #serious magic happens here
  return l

print sort_nicely(["z22.txt", "z5.txt" , "z.txt", "z10.txt", "z300.txt", "z2.txt", "z11.txt", "y.txt", "z", "z4.txt", "za.txt" ])

Natural Sort in C# 2.0

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Test {

   ///<summary>Compare two lists of strings using Python rules and natural order semantics</summary>
  public static int NaturalCompare(IList<string> a, IList<string> b) {
    int y, z, len = (a.Count < b.Count ? a.Count : b.Count);

    for (int i = 0; i < len; i++) {
      if (a[i].Equals(b[i])) continue;

      bool w = Int32.TryParse(a[i], out y), x = Int32.TryParse(b[i], out z);
      bool bothNumbers = w && x, bothNotNumbers = !w && !x;

      if (bothNumbers) return y.CompareTo(z);
      else if (bothNotNumbers) return a[i].CompareTo(b[i]);
      else if (w) return -1;
      else return 1; //numbers always less than words or letters
    return (a.Count.CompareTo(b.Count)); //subset list is considered smaller 

  public static List<string> SortNicely(List<string> list) {
    Regex re
= new Regex("([0-9]+)");
.Sort(delegate(string x, string y) { return NaturalCompare(re.Split(x), re.Split(y)); });
    return list;

  public static void Main(string[] args) {
    List<string> l = new List<string>(new string[] { "z.txt", "y.txt", "z22.txt", "z5.txt", "z10.txt", "z3.txt", "z2.txt", "za.txt", "z11.txt", "z400.txt" });
    foreach (string s in SortNicely(l)) Console.WriteLine(s);

Now playing: Notorious B.I.G. - Real N*ggas Do Real Things


Categories: Programming

Yesterday I read about the Opening up Facebook Platform Architecture. My initial thoughts are that Facebook has done what Google claimed to have done but didn't with Open Social. Facebook seems to have provided detailed specs on how to build an interoperable widget platform unlike Google who unleashed a bunch of half baked REST API specs with no details about the "widget" aspect of the platform unless you are building an Orkut application.

As I've thought about this over the past few weeks, building a widget platform that is competitive with Facebook's is hard work. Remember all those stories about OpenSocial apps being hacked in 45 minutes or less? The problem was that sites like Plaxo Pulse and Ning simply didn't think through all the ramifications of building a widget platform and bumped up against the kind of "security 101" issues that widget platforms like Netvibes, iGoogle and Live.com gadgets solved years ago.  I started to wonder exactly how many of these social networking sites will be able to keep up with the capabilities and features of platforms like Facebook's and Orkut's when such development is outside their core competency.

In fact let's take a quote from the TechCrunch story First OpenSocial Application Hacked Within 45 Minutes 

theharmonyguy says he’s successfully hacked Facebook applications too, including the Superpoke app, but that it is more difficult:

Facebook apps are not quite this easy. The main issue I’ve found with Facebook apps is being able to access people’s app-related history; for instance, until recently, I could access the SuperPoke action feed for any user. (I could also SuperPoke any user; not sure if they’ve fixed that one. Finally, I can access all the SuperPoke actions - they haven’t fixed that one, but it’s more just for fun.) There are other apps where, last I checked, that was still an issue ( e.g. viewing anyone’s Graffiti posts).

But the way Facebook setup their platform, it’s tons harder to actually imitate a user and change profile info like this. I’m sure this kind of issue could be easily solved by some verification code on RockYou’s part, but it’s not inherent in the platform - unlike Facebook. I could do a lot more like this on FB if Facebook hadn’t set things up the way they did.

At that point I ask myself, how useful is it to have the specs for the platform if you aren't l337 enough to implement it yourself? [Update: It looks like Google is well aware of this problem and has launched an Apache project called Shindig which is meant to be an Open Source widget platform that implements the Open Social APIs. This obviously indicates that Google realizes the specs are worthless and instead shipping a reusable widget platform is the way to go. It’s interesting to note that with this move Google is attempting to be a software vendor, advertising partner and competitor to the Web’s social networking sites. That must lead to some confusing internal meetings. Smile ]

For now, Facebook has definitely outplayed Google here. The most interesting part of the Facebook announcement to me is

Now we also want to share the benefits of our work by enabling other social sites to use our platform architecture as a model. In fact, we’ll even license the Facebook Platform methods and tags to other platforms. Of course, Facebook Platform will continue to evolve, but by enabling other social sites to use what we’ve learned, everyone wins -- users get a better experience around the web, developers get access to new audiences, and social sites get more applications.

it looks like Facebook plans to assert their Intellectual Property rights on anyone who clones their platform. This is one of the reasons I've found Open Social to be worrisome abuse of the term "open". Like Facebook, Google shipped specs for a proprietary platform whose copyrights, patents, etc belong to them. Any company that implements Open Social or even GData which it is built upon is using Google's intellectual property.

What's to stop Google from asserting these intellectual property rights the way Facebook is doing today? What exactly is "open" about it that makes it any less proprietary than what Facebook just announced?


danah boyd writes eloquently about the slippery slope we are now headed down thanks to way Facebook is influencing the design of social software applications when it comes to privacy. She writes in here post entitled Facebook's "opt-out" precedent

I've been watching the public outcry over Facebook's Beacon (social ads) program with great interest…For all of the repentance by Facebook, what really bugs me is that this is the third time that Facebook has violated people's sense of privacy in a problematic way.

In each incident, Facebook pushed the boundaries of privacy a bit further and, when public outcry took place, retreated just a wee bit to make people feel more comfortable. In other words, this is "slippery slope" software development.

I kinda suspect that Facebook loses very little when there is public outrage. They gain a lot of free press and by taking a step back after taking 10 steps forward, they end up looking like the good guy, even when nine steps forward is still a dreadful end result. This is how "slippery slopes" work and why they are so effective in political circles. Most people will never realize how much of their data has been exposed to so many different companies and people. They will still believe that Facebook is far more private than other social network sites (even though this is patently untrue). And, unless there is a large lawsuit or new legislation introduced, I suspect that Facebook will continue to push the edges when it comes to user privacy.

Lots of companies are looking at Facebook's success and trying to figure out how to duplicate it. Bigger companies are watching to see what they can get away with so that they too can take that path.

I’ve stated before that one of my concerns about Beacon is that it legitimizes what is truly worrying behavior when it comes to companies respecting people’s privacy on the Web. As it stands now we have companies thinking it is OK to send out information about money you are loaning to your friend and that it is OK to violate federal legislation and share information about movies you have rented to watch in the privacy of your home without user consent.

This is an unprecedented degree of violation of the sanctity of the customer’s private Web experience. What I find sad is that not only are the technology unsavvy giving up their privacy on the Web in a way that they would never accept in meat space, but that even the technological savvy who know what is going on just assume it is par for the course. For example, see comments by John Dowdell of Adobe who implies that we were already led down this slippery slope by DoubleClick in the 90s and this is just the natural progression.

I actually worry less about Facebook and more about what happens when the Googles, DoubleClicks, Microsofts, and Yahoos of the world decide that “If Facebook can get away with it, we should do it too especially if we want to stay competitive”. In that world, your privacy and mine becomes collateral damage in the chase after the almighty dollar euro.

Now playing: Ashanti - Unfoolish (feat. Notorious B.I.G.)


Categories: Social Software

Things have been progressing well with RSS Bandit development recently. Some of our recent changes seem so valuable to me that I’ve been flirting with throwing out all our plans for Phoenix and shipping a new release right away. That’s how much I like the personalized meme tracking feature. I also fixed a bug where we continually fetch favicons for all your feeds if one of the sites in your subscriptions gives us an invalid image when we fetch its favicon. This bug affects me personally since a lot of RSS Bandit users are subscribed to my site and are polling my favicons several times an hour instead of once every startup or less. 

Despite those thoughts, we will continue with our plan to add the top 5 features for the next version of RSS Bandit which I blogged about last month. I need to do some fit & finish work this weekend on the meme tracking feature and then it is on to the next set of tasks. Torsten will be looking at ways to add UI for managing your pending and downloaded podcasts. I will be working on adding support for treating the Windows RSS platform and Newsgator Online as “Feed Sources”. This will mean that you can use RSS Bandit in standalone mode and as a desktop client for feeds that you are either sharing with Newsgator applications  (FeedDemon, NewsGator Online, Net News Wire, etc) or Windows RSS platform applications (Internet Explorer 7, Outlook 2007, etc).

For a long time, people have been asking for me to treat services like Newsgator Online in the same way an email client like Outlook treats mail servers like Exchange instead of the arms length degree of integration I’ve done in the past. It’s taken a while but I’m now going to go ahead and do just that.  

With that done, we’d probably have enough new features to ship an alpha and start getting initial feedback. I estimate that this will happen sometime in the first quarter of 2008. I also plan to go over our backlog of bugs during the holiday season and will knock out as many as I can before the alpha.

If you have any questions or comments, fire away. I’m all ears.

Now playing: Scarface - Diary of a Madman


Categories: RSS Bandit

The top story in my favorite aggregator today is the announcement on Scott Guthrie’s blog of the ASP.NET 3.5 Extensions CTP Preview. Normally, announcements related to ASP.NET would not interest me except this time is an interesting item in the list of technologies being released

ADO.NET Data Services: In parallel with the ASP.NET Extensions release we will also be releasing the ADO.NET Entity Framework.  This provides a modeling framework that enables developers to define a conceptual model of a database schema that closely aligns to a real world view of the information.  We will also be shipping a new set of data services (codename "Astoria") that make it easy to expose REST based API endpoints from within your ASP.NET applications.

Wow. It looks like Astoria has quickly moved from being an experimental project to see what it would like to place RESTful interfaces on top of SQL Server database to being very close to shipping a production version.  I dug around for more posts about Astoria ADO.NET Data Services so I could find out what was in the CTP and came across two posts from Mike Flasko and Andy Conrad respectively.

In his post entitled ADO.NET Data Services ("Project Astoria") CTP is Released on the ADO.NET Data Services team blog Mike Flasko writes

The following features are in this CTP:

  • Support to create ADO.NET Data Services backed by:
    • A relational database by leveraging the Entity Framework
    • Any data source (file, web service, custom store, application logic layer, etc)
  • Serialization Formats:
    • Industry standard AtomPub serialization
    • JSON serialization
  • Simple HTTP interface
    • Any platform with an HTTP stack can easily consume a data service
    • Designed to leverage HTTP semantics and infrastructure already deployed at large
  • Client libraries:
    • .NET Framework
    • Silverlight (coming soon)

This is sick. With Astoria I can expose my relational database or even a local just an XML file using a RESTful interface that utilizes the Atom Publishing Protocol or JSON. I am somewhat amused that one of the options is placing a RESTful interface over a SOAP Web Service. My, how times have changed…

It is pretty cool that Microsoft is the first major database vendor to bring the dream of the Atom store to fruition. I also like that one of the side effects of this is that there is now an AtomPub client library for .NET Framework. Smile

Andy Conrad has a blog post entitled Linq to REST which gives an idea of what happens when you combine the Astoria client library with the Language Integrated Query (LINQ) features of C# 3.0

    public class Product{
        private Dictionary<string, object> propBag = new Dictionary<string, object>();

        public int ProductID { get; set; }        
        public string ProductName { get; set; }        
        public int UnitsInStock { get; set; }
        public IDictionary<string, object> PropBag { get { return propBag; } }

        static void Main(string[] args){
            WebDataContext context = new WebDataContext("http://localhost:18752/Northwind.svc");
            var query = from p in context.CreateQuery<Product>("Products")
                        where p.UnitsInStock > 100
                        select p;

            foreach (Product p in query){
                Console.WriteLine(p.ProductName + " , UnitsInStock= " + p.UnitsInStock);


If you hover over the query variable, you will actually see the Astoria URI which the Linq query is translated into by the Astoria client library:


So, there you go.  Linq to Astoria's RESTFUL API.  In other words, Linq to REST. 

Like I said earlier, this is sick. I need to holla at Andy and see if there is a dependency on the Atom feed containing Microsoft specific extensions or whether this Linq to REST capability can be utilized over any arbitrary Atom feed.

Now playing: Jay-Z - Success (feat. Nas)


December 9, 2007
@ 10:30 PM

A few months ago, Jenna and I found out about the Trash the Dress blog which features photo shots from wedding pictures taken in non-traditional locations. The term "trash the dress" is supposed to refer to the fact that the wedding dress is usually trashed at the end of the shoot.

Yesterday we met up with Cheryl Jones from In A Frame Photograpy and proceeded to destroy the Jenna's wedding dress while getting some good pictures out of the process. Below are a couple of pics from the shoot. Click on them to see more pics from Cheryl's blog post.

Now playing: Wyclef Jean - Sweetest Girl (feat. Akon, Lil Wayne & Niia)


Categories: Personal

This time last year, Erik Meijer sent me a paper about a new programming language project he was working on. I was high on the social graph at that time and didn't get around to responding to Erik's paper until this fall. The premise seemed fundamentally interesting; create an MSIL to Javascript compiler which is conceptually similar to Google's GWT and Nikhil Kothari's Script# then flip the traditional Web development script by allowing developers to choose whether code runs on the server or on the client by simply decorating methods with attributes. The last bit is the interesting innovation in Erik's project although it is obscured by the C#/VB/MSIL to Javascript compiler aspects.

As an example, let's say you have a function like ValidateAddress(). Whether this logic lives on the client (i.e. Javascript in the browser) or runs on the server is really a function of how complicated that function actually ends up being. Now imagine if when the time comes to refactor the function and move the validation logic from the Web client to the server or vice versa, instead of rewriting Javascript code in C#/IronPython/VB.NET/IronRuby/etc or vice versa you just add or remove a [RunAtOrigin] attribute on the function.

This project shipped last week as Microsoft Volta. You can learn a little more about it in Erik Meijer's post on Lambda the Ultimate entitled Democratizing the Cloud using Microsoft Live Labs Volta. Try it out, it's an interesting project that has legs. 

Now playing: Jay-Z - Pray


Categories: Programming

Om Malik has a blog post entitled Zuckerberg’s Mea Culpa, Not Enough where he writes

Frankly, I am myself getting sick and tired of repeating myself about the all-important “information transmission from partner sites” aspect of Beacon. That question remains unanswered in Zuckerberg’s blog post, which upon second read is rather scant on actual privacy information. Here is what he writes:

If you select that you don’t want to share some Beacon actions or if you turn off Beacon, then Facebook won’t store those actions even when partners send them to Facebook.”

So essentially he’s saying the information transmitted won’t be stored but will perhaps be interpreted. Will this happen in real time? If that is the case, then the advertising “optimization” that results from “transmissions” is going to continue. Right!

If they were making massive changes, one would have seen options like “Don’t allow any web sites to send stories to Facebook” or “Don’t track my actions outside of Facebook” in this image below.

This is the part of Facebook's Beacon service that I consider to be unfixable which probably needs to be stated more explicitly given comments like those by Sam Ruby in his post Little Details.

The fundamental design of Facebook Beacon is that a Web site publishes information about my transactions to Facebook without my permission and then Facebook tells me what happened after the fact. This is fundamentally Broken As Designed (B.A.D.).

I read Mark Zuckerburg's Thoughts on Beacon last week and looked at the new privacy controls. Nowhere is the fundamental problem addressed.

Nothing Mark Zuckerburg wrote changes the fact that when I rent a movie from Blockbuster Online, information about the transaction is published to Facebook regardless of whether I am a Facebook user or not.  The only change Zuckerburg has announced is that I can opt out of getting nagged to have the information spammed to my friends via the News Feed. One could argue that this isn't Facebook's problem. After all, when SixApart implemented support for Facebook Beacon they didn't decide that they'd blindly publish all activities from users of TypePad to Facebook. Instead they have an opt-in model on their site which preserves their users' privacy by not revealing information to Mark Zuckerburg's company without their permission. On the flip side the Blockbuster decided to publish information about all of their customers' video rental transaction history  to Mark Zuckerburg and company, without their explicit permission, even though this violates federal law. As a Blockbuster customer, the only way around this is to stop using Blockbuster's service.

So who is to blame here? Facebook for designing a system that assumes that 3rd parties publishing private user data to them without the user's consent is OK as the default or Facebook affiliates who care so little of their customer's privacy that they give it away to Facebook in return for "viral" references to their services (aka spam)?

Now playing: Akon - Ghetto (Green Lantern remix) (feat. Notorious B.I.G. & 2Pac)


I often tell people at work that turning an application into a platform is a balancing act, not only do you have to please the developers on your platform BUT you also have to please the users of your application as well.

I recently joined the This has got to stop group on Facebook. If you don't use Facebook, the front page of the group is shown in the screenshot below.


I've seen a bunch of tech folks blog about being overwhelmed by Facebook app spam like Tim Bray in his post Facebook Rules and Doc Searls in Too much face(book) time. However I assumed that the average college or high school student who used the site didn't feel that way. Looks like I was wrong.

The folks at Facebook could fix this problem easily but it would eliminate a lot of the "viralness" that has been hyped about the platform. Personally, I think applications on the site have gotten to the point where the costs have begun to outweigh the benefits. The only way to tip the balance back is to rein them in otherwise it won't be long until the clean and minimal vs. cluttered and messy aesthetics stop working in their favor in comparisons with MySpace. When that happens there will be an opportunity for someone else to do the same thing to them.

On an unrelated note,  the MoveOn.org sponsored group about Facebook Beacon has 74,000 members which is less than half of the size of the This has got to stop group.  This is despite the fact that MoveOn.org has had national media attention focused on that topic. I guess it goes to show that just because a story gets a lot of hype in blogs and the press doesn't mean that it is the most important problem facing the people it actually affects.

Now playing: Jay-Z - Ignorant Shit


One of the things that has always frustrated me about programming in C# that it is such a hassle to return multiple values from a method. You either have to create a wrapper class whose entire purpose is to hold two or three variables or even worse use ref or out parameters. I used to get around this problem in C++ by using the pair utility class since I often wanted to deal with an object plus some value associated with it. However this approach quickly breaks down when you have more than two objects you want to associate temporarily for some processing.  

For example, in the Top Stories feature of RSS Bandit I have some code that operates on a URL, its weighted score and a list of all the posts that reference it. In C#, there’s no good way to deal with those three objects as a single entity without wrapping them in a class definition. In Python, it’s quite easy to do that using tuples. Compare the following two blocks of code and notice how I don’t need the RelationHrefEntry and RankedNewsItem types in the Python version of the code

C#:     /* Tally the votes, only 1 vote counts per feed */

 //RelationHrefEntry is (Href, Score, References), RankedNewsItem is (NewsItem, Score)

List<RelationHRefEntry> weightedLinks = new List<RelationHRefEntry>();

foreach (KeyValuePair<RelationHRefEntry, List<RankedNewsItem>> linkNvotes in allLinks) {

Dictionary<string, float> votesPerFeed = new Dictionary<string, float>();

//pick the lower vote if multiple links from a particular feed

foreach (RankedNewsItem voteItem in linkNvotes.Value) {

string feedLink = voteItem.Item.FeedLink;


votesPerFeed[feedLink] = Math.Min(votesPerFeed[feedLink], voteItem.Score);


votesPerFeed.Add(feedLink, voteItem.Score);




float totalScore = 0.0f;

foreach (float value in votesPerFeed.Values) {

totalScore += value;


linkNvotes.Key.Score = totalScore;



weightedLinks.Sort(delegate(RelationHRefEntry x, RelationHRefEntry y) { return y.Score.CompareTo(x.Score);} );

weightedLinks = weightedLinks.GetRange(0, numStories);


    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:   #tuple magic happens here            
            site[feedTitle] = min(site.get(feedTitle,1), weight)   #Python dictionaries are smarter than .NET’s 
        weighted_links.append((sum(site.values()), link))   #more tuple magic, no need for manual summing of values 

Now playing: UGK - One Day


Categories: Programming

December 5, 2007
@ 04:00 AM

[Scene: A married couple listening to Prince’s Little Red Corvette while driving back from the grocery store]

DARE: You know what? I think this song is a metaphor for sex.

JENNA: All songs about cars are metaphors for sex.

DARE: True.

JENNA: Well, except for Throws Some D’s…that song is actually about putting rims on your car.

DARE: Perhaps it is the exception that proves the rule?


Now playing: Prince - Little Red Corvette


Categories: Personal

Danny Sullivan has an article in Advertising Age entitled Forget Facebook. Search Ads Are the Revolution he writes

Facebook unleashed SocialAds this month, calling it the beginning of a revolutionary, hundred-year era in advertising that will see the end of untargeted messages in mass media. If the revolution is upon us, allow me to submit the lowly search ad as the true revolutionary. For unlike social ads and most other types of advertising, search is something people want rather than something that gets in the way.  

The trusted referral is indeed a holy grail, and Facebook will offer a new way to build word-of-mouth. But how did that friend find the sweetener in the first place? What comes first -- word-of-mouth or the egg? At some point, a new product has to hatch, and those old-school brand-building channels probably will always play a crucial role. Search offers a key way for new products to emerge and be spread around. People turn to search for solutions -- ways to enjoy coffee without the calories or local coffeehouses to try. If you're not visible in search, perhaps you won't generate word-of-mouth as easily, if at all.

Search isn't revolutionary for aiding word-of-mouth, however. It's revolutionary for not "getting into" or in the way of anything. People turn to search when they have particular desires and need particular solutions.

I agree with Danny that the search advertising like AdWords is revolutionary while word-of-mouth advertising platforms like Facebook’s SocialAds are evolutionary.  With search ads, for the first time in the history of advertising people can find advertising when they are looking for it and otherwise it stays out of their way. When I search for digital camera or zune 80 it is quite likely that I’m making a purchasing decision so showing me ads related to buying these devices makes sense. On the other hand, when I search for foreach C# or XmlTextReader.NodeType I don’t get irrelevant ads shoved in my face. That level of match making between given consumers and advertisers is unprecedented. 

However this doesn’t mean that there isn’t something to be said for brand advertising and word of mouth. For search advertising to work, I actually have to have been looking for something in the first place. A lot of advertising today is intended to create the desire for a product in the first place not help you make an informed choice.  For example, I saw the movie Enchanted last weekend. I found out about the movie from TV ads and thought what I saw looked funny. My wife also came to the same conclusion from watching similar ads and then we decided to see the movie. After seeing the movie, I thought it was great and rated the movie in the Flixster Facebook application which sent out the following notification to my “friends”

Enchanted: **** out of *****

a few days later, one of my co-workers said she saw the movie on the strength of my recommendation and other factors.

This story is a small case study in the effectiveness of traditional “media-based” advertising coupled with the power of word-of-mouth marketing using social networking sites. For now, search ads simply cannot provide a similar level of return value for such advertisers. Although search engines like Google have tried to encourage this behavior, people don’t typically perform searches like movies 98052 then decide what movies to see that weekend based on the search results page. This means that for certain classes of products, traditional advertising techniques in combination with word-of-mouth techniques like Facebook’s social ads are extremely valuable.

However at the end of the day, it is extremely unlikely that improved word-of-mouth techniques will be as impactful to the long tail of advertisers as search ads have been or ever will be. 

Now playing: Tear Da Club Up Thugs - Triple Six Clubhouse


December 1, 2007
@ 05:24 PM

Earlier this week I wrote a blog post which pointed out that the two major privacy and user experience problems with Facebook Beacon where that it (i) linked a user's Facebook account with an account on another site without the users permission and (ii) there was no way for a user to completely opt out of being tracked by the system.  Since then Facebook has announced some changes which TechCrunch named Facebook Beacon 2.0. The changes are excerpted below


Facebook users will see a notification in the lower right corner of the screen after transacting with a Beacon Affiliate. Options include “No Thanks” that will immediately stop the transaction from being published. Alternatively closing or ignoring the warning won’t immediately publish the story, but it will be put in a queue

Second Warning

Presuming you’ve ignored or closed the first notification, Facebook warns users again the next time they visit their home page. A new box reminds you that an activity has been sent to Facebook. Like the first notification you can choose to not publish the activity by hitting remove, or you can choose to publish it by hitting ok.


Opt Out
Found via the “External Websites” section of the Facebook Privacy page, this allows users to permanently opt in or out of Beacon notifications, or if you’re not sure be notified. The downside is that there is no global option to opt out of every Beacon affiliated program; it has to be set per program. Better this than nothing I suppose.

The interesting thing to note is that neither of the significant problems with Beacon have been fixed. After the changes were announced there was a post on the CA Security Advisory blog titled Facebook's Misrepresentation of Beacon's Threat to Privacy: Tracking users who opt out or are not logged in which pointed out that the complaining about purchase history getting into the news feed of your friends is a red herring, the real problem is that once a site signs up as a Facebook affiliate they begin to share every significant action you take on the site with Facebook without your permission. 

Which is worse, your friends knowing that you rented Prison Girls or Facebook finding that out without your permission and sharing that with their business partners, without your permission? Aren't there laws against this kind of invasion of privacy? I guess there are (see 18 U.S.C. § 2710)

I wonder who'll be first to sue Facebook and Blockbuster? 

Anyway, back to the title of this blog post. The problem with Facebook Beacon is that it is designed in a way that makes it easy for Facebook Beacon affiliates to integrate into their sites at the cost of user's privacy. From Jay Goldman's excellent post where he Deconstructed the Facebook Beacon Javascript we learn

Beacon from 10,000 Feet

That basically wraps up our tour of how Beacon does what it does. It's a fairly long explanation, so here's a quick summary:

  1. The partner site page includes the beacon.js file, sets a <meta> tag with a name, and then calls Facebook.publish_action.            
  2. Facebook.publish_action builds a query_params object and then passes it to Facebook._send_request.            
  3. Facebook._send_request dynamically generates an <iframe>which loads the URL http://www.facebook.com/beacon/auth_iframe.php and passes the query_params. At this point, Facebook now knows about the news feed item whether you choose to publish it or not. 

When you read this you realize just how insidious the problem actually is. Facebook isn't simply learning about every action taken by Facebook users on affiliate sites, it is learning about every action taken by every user of these affiliate sites regardless of whether they are Facebook users or not.

At first I assumed that the affiliates sites would call some sort of IsFacebookUser() API and then decide whether to send the action or not. Of course, this is still broken since the affiliate site has told Facebook that you are a user of the site, and depending on the return value of the hypothetical function the affiliate in turn learns that you are a Facebook user.

But no, it is actually worse than that. The affiliate sites are pretty much dumping their entire customer database into Facebook's lap, FOR FREE and without their customers permission. What. The. Fuck.

The icing on the cake is the following excerpt from the Facebook Beacon page

Stories of a user's engagement with your site may be displayed in his or her profile and in News Feed. These stories will act as a word-of-mouth promotion for your business and may be seen by friends who are also likely to be interested in your product. You can increase the number of friends who see these stories with Facebook Social Ads.

So after giving Facebook millions of dollars in customer intelligence for free in exchange for spamming their users, Facebook doesn't even guarantee their affiliates that the spam will even get sent. Instead these sites have to pay Facebook to "increase the chances" that they get some return for the free customer intelligence they just gave Facebook.

This reminds me of the story of Tom Sawyer tricking people into paying him to paint a fence he was supposed to paint as part of his chores.

At the end of the day, Facebook can't fix the privacy problems I mentioned in my previous post in a way that completely preserves their users privacy without completely changing the design and implementation of Facebook Beacon. Until then, we'll likely see more misdirection, more red herrings and more violations of user privacy to make a quick buck.