A few days ago I blogged about my plans to make RSS Bandit a desktop client for Google Reader. As part of that process I needed to verify that it is possible to programmatically interact with Google Reader from a desktop client in a way that provides a reasonable user experience. To this end, I wrote a command line client in IronPython based on the documentation I found at the pyrfeed Website.

The command line client isn't terribly useful on its own as a way to read your feeds but it might be useful for other developers who are trying to interact with Google Reader programmatically who would learn better from  code samples than reverse engineered API documentation.

Enjoy...

PS: Note the complete lack of error handling. I never got a hang of error handling in Python let alone going back and forth between handling errors in Python vs. handling underlying .NET/CLR errors.


import sys
from System import *
from System.IO import *
from System.Net import *
from System.Text import *
from System.Globalization import DateTimeStyles
import clr
clr.AddReference("System.Xml")
from System.Xml import *
clr.AddReference("System.Web")
from System.Web import *

#################################################################
#
# USAGE: ipy greader.py <Gmail username> <password> <path-to-directory-for-storing-feeds>
# 
# username & password are required
# feed directory location is optional, defaults to C:\Windows\Temp\
#################################################################

#API URLs
auth_url          = rhttps://www.google.com/accounts/ClientLogin?continue=http://www.google.com&service=reader&source=Carnage4Life&Email=%s&Passwd=%s
feed_url_prefix   = rhttp://www.google.com/reader/atom/
api_url_prefix    = rhttp://www.google.com/reader/api/0/
feed_cache_prefix = r"C:\\Windows\Temp\\"
add_url           = r"http://www.google.com/reader/quickadd"

#enumerations
(add_label, remove_label) = range(1,3)

class TagList:
    """Represents a list of the labels/tags used in Google Reader"""
    def __init__(self, userid, labels):
        self.userid = userid
        self.labels = labels

class SubscriptionList:
    """Represents a list of RSS feeds subscriptions"""
    def __init__(self, modified, feeds):
        self.modified = modified
        self.feeds    = feeds

class Subscription:
    """Represents an RSS feed subscription"""
    def __init__(self, feedid, title, categories, firstitemmsec):
        self.feedid        = feedid
        self.title         = title
        self.categories    = categories
        self.firstitemmsec = firstitemmsec

def MakeHttpPostRequest(url, params, sid):
    """Performs an HTTP POST request to a Google service and returns the results in a HttpWebResponse object"""
    req = HttpWebRequest.Create(url)
    req.Method = "POST"
    SetGoogleCookie(req, sid)

    encoding = ASCIIEncoding();
    data     = encoding.GetBytes(params)

    req.ContentType="application/x-www-form-urlencoded"
    req.ContentLength = data.Length
    newStream=req.GetRequestStream()
    newStream.Write(data,0,data.Length)
    newStream.Close()
    resp = req.GetResponse()
    return resp

def MakeHttpGetRequest(url, sid):
    """Performs an HTTP GET request to a Google service and returns the results in an XmlDocument"""
    req          = HttpWebRequest.Create(url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    doc          = XmlDocument()
    doc.LoadXml(reader.ReadToEnd())
    return doc

def GetToken(sid):
    """Gets an edit token which is needed for any edit operations using the Google Reader API"""
    token_url = api_url_prefix + "token"
    req          = HttpWebRequest.Create(token_url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    return reader.ReadToEnd()

def MakeSubscription(xmlNode):
    """Creates a Subscription class out of an XmlNode that was obtained from the feed list"""
    id_node     = xmlNode.SelectSingleNode("string[@name='id']")
    feedid      = id_node and id_node.InnerText or ''
    title_node  = xmlNode.SelectSingleNode("string[@name='title']")
    title       = title_node and title_node.InnerText or ''
    fim_node    =  xmlNode.SelectSingleNode("string[@name='firstitemmsec']")
    firstitemmsec = fim_node and fim_node.InnerText or ''
    categories  = [MakeCategory(catNode) for catNode in xmlNode.SelectNodes("list[@name='categories']/object")]
    return Subscription(feedid, title, categories, firstitemmsec)

def MakeCategory(catNode):
    """Returns a tuple of (label, category id) from an XmlNode representing a feed's labels that was obtained from the feed list"""
    id_node     = catNode.SelectSingleNode("string[@name='id']")
    catid       = id_node and id_node.InnerText or ''
    label_node  = catNode.SelectSingleNode("string[@name='label']")
    label       = label_node and label_node.InnerText or ''
    return (label, catid)

def AuthenticateUser(username, password):
    """Authenticates the user and returns a username/password combination"""
    req = HttpWebRequest.Create(auth_url % (username, password))
    reader = StreamReader(req.GetResponse().GetResponseStream())
    response = reader.ReadToEnd().split('\n')
    for s in response:
        if s.startswith("SID="):
            return s[4:]

def SetGoogleCookie(webRequest, sid):
    """Sets the Google authentication cookie on the HttpWebRequest instance"""
    cookie = Cookie("SID", sid, "/", ".google.com")
    cookie.Expires = DateTime.Now + TimeSpan(7,0,0,0)
    container      = CookieContainer()
    container.Add(cookie)
    webRequest.CookieContainer = container

def GetSubscriptionList(feedlist, sid):
    """Gets the users list of subscriptions"""
    feedlist_url = api_url_prefix + "subscription/list"
    #download the JSON-esque XML feed list
    doc = MakeHttpGetRequest(feedlist_url, sid)

    #create subscription nodes
    feedlist.feeds  = [MakeSubscription(node) for node in doc.SelectNodes("/object/list[@name='subscriptions']/object")]
    feedlist.modified = False

def GetTagList(sid):
  """Gets a list of the user's tags"""
  taglist_url = api_url_prefix + "tag/list"
  doc = MakeHttpGetRequest(taglist_url, sid)
  #get the user id needed for creating new labels from Google system tags

  userid = doc.SelectSingleNode("/object/list/object/string[contains(string(.), 'state/com.google/starred')]").InnerText
  userid = userid.replace("/state/com.google/starred", "");
  userid = userid[5:]
  #get the user-defined labels
  tags = [node.InnerText.Replace("user/" + userid + "/label/" ,"") for node in doc.SelectNodes("/object/list[@name='tags']/object/string[@name='id']") if node.InnerText.IndexOf( "/com.google/") == -1 ]
  return TagList(userid, tags)

def DownloadFeeds(feedlist, sid):
    """Downloads each feed from the subscription list to a local directory"""
    for feedinfo in feedlist.feeds:
        unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)
        oneweek_ago   = DateTime.Now - TimeSpan(7,0,0,0)
        ifmodifiedsince = oneweek_ago - unixepoch
        feed_url = feed_url_prefix + feedinfo.feedid +  "?n=25&r=o&ot=" + str(int(ifmodifiedsince.TotalSeconds))
        continuation = True
        continuation_token = ''
        feedDoc      = None

        while True:
            print "Downloading feed at %s" % (feed_url  + continuation_token)
            doc = MakeHttpGetRequest(feed_url + continuation_token, sid)
            continuation_node     = doc.SelectSingleNode("//*[local-name()='continuation']")
            continuation_token    = continuation_node and ("&c=" + continuation_node.InnerText) or ''

            if feedDoc is None:
                feedDoc = doc
            else:
                for node in doc.SelectNodes("//*[local-name()='entry']"):
                    node = feedDoc.ImportNode(node, True)
                    feedDoc.DocumentElement.AppendChild(node)

            if continuation_token == '':
                break

        print "Saving %s" % (feed_cache_prefix + feedinfo.title + ".xml")
        feedDoc.Save(feed_cache_prefix + feedinfo.title + ".xml")

def ShowSubscriptionList(feedlist, sid):
    """Displays the users list of subscriptions including the labels applied to each item"""
    if feedlist.modified:
        GetSubscriptionList(feedlist, sid)
    count = 1
    for feedinfo in feedlist.feeds:
        print "%s. %s (%s)" % (count, feedinfo.title, [category[0] for category in feedinfo.categories])
        count = count + 1

def Subscribe(url, sid):
    """Subscribes to the specified feed URL in Google Reader"""
    params        = "quickadd=" + HttpUtility.UrlEncode(url) + "&T=" + GetToken(sid)
    resp = MakeHttpPostRequest(add_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "%s successfully added to subscription list" % url
        return True
    else:
        print resp.StatusDescription
        return False

def Unsubscribe(index, feedlist, sid):
    """Unsubscribes from the feed at the specified index in the feed list"""
    unsubscribe_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=unsubscribe&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(unsubscribe_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully removed from subscription list" % feed.title
        return True
    else:
        print resp.StatusDescription
        return False

def Rename(new_title, index, feedlist, sid):
    """Renames the feed at the specified index in the feed list"""
    api_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + new_title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(api_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully renamed to '%s'" % (feed.title, new_title)
        return True
    else:
        print resp.StatusDescription
        return False

def EditLabel(label, editmode, userid, feedlist, index, sid):
    """Adds or removes the specified label to the feed at the specified index depending on the edit mode"""
    full_label = "user/" + userid + "/label/" + label
    label_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid

    if editmode == add_label:
        params = params + "&a=" + full_label
    elif editmode == remove_label:
        params = params + "&r=" + full_label
    else:
        return

    resp = MakeHttpPostRequest(label_url, params, sid)
    if resp.StatusCode == HttpStatusCode.OK:
        print "Successfully edited label '%s' of feed '%s'" % (label, feed.title)
        return True
    else:
        print resp.StatusDescription
        return False

def MarkAllItemsAsRead(index, feedlist, sid):
    """Marks all items from the selected feed as read"""
    unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)

    markread_url = api_url_prefix + "mark-all-as-read"
    feed = feedlist.feeds[index]
    params = "s=" + feed.feedid + "&T=" + GetToken(sid) + "&ts=" + str(int((DateTime.Now - unixepoch).TotalSeconds))
    MakeHttpPostRequest(markread_url, params, sid)
    print "All items in '%s' have been marked as read" % feed.title

def GetFeedIndexFromUser(feedlist):
    """prompts the user for the index of the feed they are interested in and returns the index as the result of this function"""
    print "Enter the numeric position of the feed from 1 - %s" % (len(feedlist.feeds))
    index = int(sys.stdin.readline().strip())
    if (index < 1) or (index > len(feedlist.feeds)):
        print "Invalid index specified: %s" % feed2label_indx
        return -1
    else:
        return index

if __name__ == "__main__":
       if len(sys.argv) < 3:
           print "ERROR: Please specify a Gmail username and password"
       else:
           if len(sys.argv) > 3:
               feed_cache_prefix = sys.argv[3]

           SID = AuthenticateUser(sys.argv[1], sys.argv[2])
           feedlist = SubscriptionList(True, [])
           GetSubscriptionList(feedlist, SID)
           taglist = GetTagList(SID)

           options = "***Your options are (f)etch your feeds, (l)ist your subscriptions, (s)ubscribe to a new feed, (u)nsubscribe, (m)ark read , (r)ename, (a)dd a label to a feed, (d)elete a label from a feed or (e)xit***"
           print "\n"

           while True:
               print options
               cmd = sys.stdin.readline()
               if cmd == "e\n":
                   break
               elif cmd == "l\n": #list subscriptions
                   ShowSubscriptionList(feedlist, SID)
               elif cmd == "s\n": #subscribe to a new feed
                   print "Enter url: "
                   new_feed_url = sys.stdin.readline().strip()
                   success = Subscribe(new_feed_url, SID)

                   if feedlist.modified == False:
                       feedlist.modified = success
               elif cmd == "u\n": #unsubscribe from a feed
                   feed2remove_indx = GetFeedIndexFromUser(feedlist)
                   if feed2remove_indx != -1:
                       success = Unsubscribe(feed2remove_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "r\n": #rename a feed
                   feed2rename_indx = GetFeedIndexFromUser(feedlist)
                   if feed2rename_indx != -1:
                       print "'%s' selected" % feedlist.feeds[feed2rename_indx -1].title
                       print "Enter the new title for the subscription:"
                       success = Rename(sys.stdin.readline().strip(), feed2rename_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "f\n": #fetch feeds
                   feedlist = DownloadFeeds(feedlist, SID)
               elif cmd == "m\n": #mark all items as read
                   feed2markread_indx = GetFeedIndexFromUser(feedlist)
                   if feed2markread_indx != -1:
                       MarkAllItemsAsRead(feed2markread_indx-1, feedlist, SID)
               elif (cmd == "a\n") or (cmd == "d\n"): #add/remove a label on a feed
                   editmode = (cmd == "a\n") and add_label or remove_label
                   feed2label_indx = GetFeedIndexFromUser(feedlist)
                   if feed2label_indx != -1:
                       feed = feedlist.feeds[feed2label_indx-1]
                       print "'%s' selected" % feed.title
                       print "%s" % ((cmd == "a\n") and "Enter the new label:" or "Enter the label to delete:")
                       label_name = sys.stdin.readline().strip()
                       success = EditLabel(label_name, editmode, taglist.userid, feedlist, feed2label_indx-1, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               else:
                   print "Unknown command"

Now Playing: DJ Drama - Cannon (Remix) (Feat. Lil Wayne, Willie The Kid, Freeway And T.I.)


 

Categories: Programming

December 30, 2007
@ 11:19 PM

REQUEST:

POST /reader/api/0/subscription/edit HTTP/1.1
Content-Type: application/x-www-form-urlencoded
Host: www.google.com
Cookie: SID=DQAAAHoAAD4SjpLSFdgpOrhM8Ju-JL2V1q0aZxm0vIUYa-p3QcnA0wXMoT7dDr7c5FMrfHSZtxvDGcDPTQHFxGmRyPlvSvrgNe5xxQJwPlK_ApHWhzcgfOWJoIPu6YuLAFuGaHwgvFsMnJnlkKYtTAuDA1u7aY6ZbL1g65hCNWySxwwu__eQ
Content-Length: 182
Expect: 100-continue

s=http%3a%2f%2fwww.icerocket.com%2fsearch%3ftab%3dblog%26q%3dlink%253A25hoursaday.com%252Fweblog%2b%26rss%3d1&ac=subscribe&T=wAxsLRcBAAA.ucVzEgL9y7YfSo5CU5omw.w1BCzXzXHsyicU9R3qWgQ

RESPONSE:

HTTP/1.1 200 OK
Content-Type: text/html; charset=UTF-8
Set-Cookie: GRLD=UNSET;Path=/reader/
Transfer-Encoding: chunked
Cache-control: private
Date: Sun, 30 Dec 2007 23:08:51 GMT
Server: GFE/1.3

<html><head><title>500 Server Error</title>
<style type="text/css">
      body {font-family: arial,sans-serif}
      div.nav {margin-top: 1ex}
      div.nav A {font-size: 10pt; font-family: arial,sans-serif}
      span.nav {font-size: 10pt; font-family: arial,sans-serif; font-weight: bold}
      div.nav A,span.big {font-size: 12pt; color: #0000cc}
      div.nav A {font-size: 10pt; color: black}
      A.l:link {color: #6f6f6f}
      </style></head>
<body text="#000000" bgcolor="#ffffff"><table border="0" cellpadding="2" cellspacing="0" width="100%"></table>
<table><tr><td rowspan="3" width="1%"><b><font face="times" color="#0039b6" size="10">G</font><font face="times" color="#c41200" size="10">o</font><font face="times" color="#f3c518" size="10">o</font><font face="times" color="#0039b6" size="10">g</font><font face="times" color="#30a72f" size="10">l</font><font face="times" color="#c41200" size="10">e</font>&nbsp;&nbsp;</b></td>
<td>&nbsp;</td></tr>
<tr><td bgcolor="#3366cc"><font face="arial,sans-serif" color="#ffffff"><b>Error</b></font></td></tr>
<tr><td>&nbsp;</td></tr></table>
<blockquote><h1>Server Error</h1>
The server encountered a temporary error and could not complete your request.<p></p> Please try again in 30 seconds.

<p></p></blockquote>
<table width="100%" cellpadding="0" cellspacing="0"><tr><td bgcolor="#3366cc"><img alt="" width="1" height="4"></td></tr></table></body></html>


 

Categories: Platforms | Programming | XML Web Services

With the v1.6.0.0 out of the door, I've shipped what I think is our most interesting feature in years and resolved  an issue that was making RSS Bandit a nuisance to lots of sites on the Internet.

The feature I'm currently working on is an idea I'm calling supporting multiple feed sources. For a few years, we've had support for roaming your feed list and read/unread state between two computers using an FTP site, a shared folder or NewsGator Online. Although useful, this functionality has always seemed bolted on. You have to manually upload and download feeds from these locations instead of things happening automatically and transparently as they do with the typical mail reader + mail server scenario (e.g. Outlook + Exchange) which is the most comparable model.

My original idea for the feature was simply to make the existing NewsGator and RSS Bandit integration work automatically instead of via a manual download so it could be more like Outlook + Exchange. Then I realized that there could never be full integration because there are feeds that RSS Bandit can read that a Web-based feed reader like NewsGator Online can not (e.g. feeds within your company's intranet if you read feeds at work). This meant that we would need an explicit demarcation of feeds that roamed in NewsGator Online and those that were local to that machine.

In addition, I got a bunch of feedback from our users that there were a lot more of them using Google Reader than using NewsGator Online. Since I was already planning to do a bunch of work to streamline synchronizing with NewsGator Online, adding another Web-based feed reader didn't seem like a stretch. I'm currently working on a command line only prototype in IronPython which uses the information from the reverse engineered Google Reader API documentation to retrieve and update my feed subscriptions. I'm about part way through and it seems that the Google Reader API is as full featured as the NewsGator API so we should be good to go.  I should be able to integrate this functionality into RSS Bandit within the next few weeks.

The tricky part will be how the UI integration should work. For example, Google Reader doesn't support hierarchical folders of feeds like we do. Instead there is a flat namespace of tag names but each feed can have one or more tags applied to it. On the flip side, NewsGator Online uses the hierarchical folder model like RSS Bandit does. I'm considering moving to a more Google Reader friendly model in the next release where we flatten hierarchies and instead go with a flat tag-based approach to organizing feeds. For the case, of feeds synchronized from NewsGator Online we will prevent users from putting feeds in multiple categories since that won't be supported by the service.

Now Playing: Eminem - Evil Deeds


 

Categories: RSS Bandit

December 26, 2007
@ 05:22 PM

The new version of RSS Bandit is now available. This release fixes a bug that causes the application to repeatedly request favicons from a feed's website in a manner that eventually resembles a denial of service attack. The new feature in this release is the [Top Stories] button.

The rationale for the new feature is given in Omar Shahine's blog post entitled Google Reader needs Mute. Omar wrote

Here is a feature that Google Reader needs: Mute.

Why, Cause subscribing to a lot of tech bloggers, a-list folks, and news outlets is extremely annoying when they write about the same thing. You get tired of seeing dozens or hundreds of posts about Kindle, Facebook, ThinkSecret and on and on.

These days I feel like my blogging info is like the local news (which I stopped watching some time back in high school).

So, please google, let me mute or mark read all feed items on a certain topic as read and save me the hassle of suffering through the repetition and pain.

The Top Stories feature is meant to target exactly this scenario. When you click on it, you get a list of the most recently popular items among your subscriptions. From there you can hit "Mark Items as Read" and mark all of the linking posts as read once you've gotten the gist of the story.

We don't have a Mute option where all posts that link to a story are automatically marked as read or deleted after being downloaded. This seems like overkill to me but would love to get some feedback from our users if this would be a desirable feature.

Translations
This release is available in the following languages; English, German, Polish, French, Simplified Chinese, Russian, Brazilian Portuguese, Turkish, Dutch, Italian, Serbian and Bulgarian.

Installer
Download the installer from RssBandit1.6.0.0a_Installer.zip . A snapshot of the source code will be available later in the week as a source code release.

New Features

  • Top Stories button shows the ten most recently popular links in your subscriptions.
  • Twitter plugin enables posting tweets about news stories or responding to tweets in an RSS feed.

Major Bug Fixes

  • Del.ico.us plugin silently fails when posting items with tags containing special characters like '#' or '+'
  • Downloading feed list from NewsGator Online deletes local machine and intranet feeds
  • KeyNotFoundException if "Mark All Items as Read" clicked shortly after changing the URL for a subscribed feed.
  • 100% CPU used when an RSS feed with no <channel> element is encountered.
  • Downloading favicons happens several times while the application is running instead of just once.
  • The "Check for updates" feature would sometimes result in the application crashing.

 

Categories: RSS Bandit

There is a post in a Slashdot user Felipe Hoffa's journal entitled Google Reader shares private data, ruins Christmas (alternate link) which contains a very damning indictment of the Google Reader team. It all starts with the release of the Sharing with Friends feature which is described below

We've just launched a new feature that makes it easier to follow your
friends' shared items in Google Reader. Check out the announcement on
our blog:
http://googlereader.blogspot.com/2007/12/reader-and-talk-are-friends....

The short description of it is this: If any of your friends from
Google Talk are using Reader and sharing items, you'll see them listed
in your sidebar under "Friends' shared items." Similarly, they'll be
able to see any items you're sharing. You can hide items from any
friend you don't want to see, and you can also opt out of sharing by
removing all your shared items. For full details, check out the
following help articles:
http://www.google.com/support/reader/bin/answer.py?answer=83000
http://www.google.com/support/reader/bin/answer.py?answer=83041

This is still a very experimental feature, so we'd love to hear what
you think of it.

Unsurprisingly, there has been a massive negative outcry about this feature. The main reason for the flood of complaints (many of which are excerpted in Felipe Hoffa's journal) is the fact that the Google Reader team has decided to define "friends" as anyone in your Gmail contact list.

On the surface this seems a lot like the initial backlash over the Facebook news feed. Google Reader users are complaining about their Gmail contacts having an easy way of viewing a list of feeds the user had already made public. I imagine that the Google folks have begun to make arguments like "If Facebook can get away with it, we should be able to as well" to justify some of their recent social networking moves such as this one and Google Profiles.

However the Google Reader team made failed to grasp two key aspects of social software  here:

  1. Internet Users Don't Fully Grasp that Everything on the Web is Public Unless Behind Access Controls: To most users of the Internet, if I create a Web page and don't tell anyone about it, then the page is private and known only to me. Similarly, if I create a blog or shared bookmarks on a social bookmarking site then no one should know about it unless I send them links to the page. 

    As someone who's worked on the Access Control technology behind Windows Live sharing initiatives from SkyDrive to Windows Live Spaces I know this isn't the case. The only way to make something private on the Web is to place it behind access controls that require users to be authenticated and authorized before they can view the content you've created.

    The Google Reader developers assumed that their average users were like me and would assume that their content was public even if it had an obfuscated URL. The problem here is that even if it was "technically" true that Shared Items in Google Reader were public although with an obfuscated URL, the fact that there was URL obfuscation involved implies that they realized that users didn't want their Shared Items to be PUBLIC. Arguing that the items were "technically" public and thus justifying broadcasting the items to the user's Gmail contacts seems dubious at best.

  2. Friends in One Context are not Necessarily Friends in Another: The bigger problem is that the folks at Google are trying to build a unified social graph across all their application as a way to compete with the powerful social network that Facebook has built. I've previously talked about the problems faced by a unified social graph based on what I've seen working on the social graph contacts platform for Windows Live. The fact that I send someone email does not mean that I want to make them an IM buddy nor does it mean that I want them to have access to all the items I find interesting in my RSS feeds since some of these items may reveal political, religious or even sexual leanings that I did not mean to share with someone I just happen to exchange email with frequently.

    Deciding that instead of having GTalk IM buddies, Gmail contacts, and Google Reader friends that users should just have Google Friends may simplify things for some program managers at Google but it causes problems for users who now have to deal with the consequence of their different social contexts beginning to bleed into each other. Even though Facebook is a single application, they have this problem with users having to manage contacts from multiple social contexts (family, friends, co-workers, etc) within a single application let alone applications with extremely different uses.

My assumption is that the folks at Google Reader will put in a some time over the weekend and will add granular privacy controls as recommended by Robert Scoble. I also predict that we will see more ham fisted attempts to grow their social graph at the expense of user privacy from various large [and small] Web properties including Facebook in 2008. 

In the words of Scott McNealy, "Privacy is Dead. Get Over It"


 

Categories: Social Software

December 26, 2007
@ 05:21 PM

Justin Rudd writes in his blog post entited Your Attention Please

After 3 years and 3 months, I am leaving my position at Amazon.com on December 31st.
...
My next “gig” is one that I am extraordinarily excited about.  I’m going to Microsoft to be part of the Live Labs team.  This group really excites me because it gives me a chance to find new areas for Microsoft Live to get into, to expand on what Microsoft Live already has, work closely with Microsoft Research, etc.  This is a job that really excites the tinkerer side of my brain.  I can’t wait to get started.

Many thanks to Dare Obasanjo for being my employee referral

Justin is my second official referral of someone I've "known" via reading their blog. I hope he ends up working at Microsoft a little longer than the last blog friend I referred. :)


 

Categories: Personal

Sometime last week, Amazon soft launched Amazon SimpleDB, a hosted service for storing and querying structured data. This release plugged a hole in their hosted Web services offerings which include the Amazon Simple Storage Service (S3) and the Amazon Elastic Compute Cloud (EC2). Amazon’s goal of becoming the “Web OS” upon which the next generation of Web startups builds upon came off as hollow when all they gave you was BLOB storage and hosted computation but not structured storage. With SimpleDB, they’re almost at the point where all the tools you need for building the next del.icio.us or Flickr can be provided by Amazon’s Web Services. The last bit they need to provide is actual Web hosting so that developers don’t need to resort to absurd dynamic DNS hacks when interacting with their Amazon applications from the Web.

The Good: Comoditizing hosted services and getting people to think outside the relational database box

The data model of SimpleDB is remarkably similar to Google’s BigTable in that instead of having multiple tables and relations between them, you get a single big giant table which is accessed via the tuple of {row key, column key}. Although, both SimpleDB and BigTable allow applications to store multiple values for a particular tuple, they do so in different ways. In BigTable, multiple values are additionally keyed by timestamp so I can access data such using tuples such as {”http://www.example.com”,  “incoming_links”, “12–12–2007”}. In Amazon’s SimpleDB I’d simply be able to store multiple values for a particular key pair so I could access {”Dare Obasanjo”, “weblogs”} and it would return (“http://www.25hoursaday.com/weblog”, “http://blogs.msdn.com/dareobasanjo”, “http://carnage4life.spaces.live.com”).

Another similarity that both systems share, is that there is no requirement that all “rows” in a table share the same schema nor is there an explicit notion of declaring a schema. In SimpleDB, tables are called domains, rows are called items and the columns are called attributes. 

It is interesting to imagine how this system evolved. From experience, it is clear that everyone who has had to build a massive relational database that database joins kill performance. The longer you’ve dealt with massive data sets, the more you begin to fall in love with denormalizing your data so you can scale. Taking to its logical extreme, there’s nothing more denormalized than a single table. Even better, Amazon goes a step further by introducing multivalued columns which means that SimpleDB isn’t even in First Normal Form whereas we all learned in school that the minimum we should aspire to is Third Normal Form.

I think it is great to see more mainstream examples that challenge the traditional thinking of how to store, manage and manipulate large amounts of data.

I also think the pricing is very reasonable. If I was a startup founder, I’d strongly consider taking Amazon Web Services for a spin before going with a traditional LAMP or WISC approach.  

The Bad: Eventual Consistency and Data Values are Weakly Typed

The documentation for the PutAttributes method has the following note

Because Amazon SimpleDB makes multiple copies of your data and uses an eventual consistency update model, an immediate GetAttributes or Query request (read) immediately after a DeleteAttributes or PutAttributes request (write) might not return the updated data.

This may or may not be a problem depending on your application. It may be OK for a del.icio.us style application if it took a few minutes before your tag updates were applied to a bookmark but the same can’t be said for an application like Twitter. What would be useful for developers would be if Amazon gave some more information around the delayed propagation such as average latency during peak and off-peak hours.

There is another interesting note in the documentation of the Query method which states

 Lexicographical Comparison of Different Data Types

Amazon SimpleDB treats all entities as UTF-8 strings. Keep this in mind when storing and querying different data types, such as numbers or dates. Design clients to convert their data into an appropriate string format, so that query expression return expected results.

The following are suggested methods for converting different data types into strings for proper lexicographical order enforcement:

  • Positive integers should be zero-padded to match the largest number of digits in your data set. For example, if the largest number you are planning to use in a range is 1,000,000, every number that you store in Amazon SimpleDB should be zero-padded to at least 7 digits. You would store 25 as 0000025, 4597 as 0004597, and so on.

  • Negative integers should be offset and turned into positive numbers and zero-padded. For example, if the smallest negative integer in your data set is -500, your application should add at least 500 to every number that you store. This ensures that every number is now positive and enables you to use the zero-padding technique.

  • To ensure proper lexicographical order, convert dates to the ISO 8601 format.

[Note] Note

Amazon SimpleDB provides utility functions within our sample libraries that help you perform these conversions in your application.

This is ghetto beyond belief. I should know ahead of time what the lowest number will be in my data set and add/subtract offsets from data values when inserting and retrieving them from SimpleDB? I need to know the largest number in my data set and zero pad to that length? Seriously, WTF?

It’s crazy just thinking about the kinds of bugs that could be introduced into applications because of this wacky semantics and the recommended hacks to get around them. Even if this is the underlying behavior of SimpleDB, Amazon should have fixed this up in an APIs layer above SimpleDB then exposed that instead of providing ghetto helper functions in a handful of popular programming languages then crossing their fingers hoping that no one hits this problem.  

The Ugly: Web Interfaces, that Claim to be RESTful but Aren’t

I’ve talked about APIs that claim to be RESTful but aren’t in the past but Amazon’s takes the cake when it comes to egregious behavior. Again, from the documentation for the PutAttributes method we learn

Sample Request

The following example uses PutAttributes on Item123 which has attributes (Color=Blue), (Size=Med) and (Price=14.99) in MyDomain. If Item123 already had the Price attribute, this operation would replace the values for that attribute.

https://sdb.amazonaws.com/
?Action=PutAttributes
&Attribute.0.Name=Color&Attribute.0.Value=Blue
&Attribute.1.Name=Size&Attribute.1.Value=Med
&Attribute.2.Name=Price&Attribute.2.Value=14.99
&Attribute.2.Replace=true
&AWSAccessKeyId=[valid access key id]
&DomainName=MyDomain
&ItemName=Item123
&SignatureVersion=1
&Timestamp=2007-06-25T15%3A03%3A05-07%3A00
&Version=2007-11-07
&Signature=gabYTEXUgY%2Fdg817JBmj7HnuAA0%3D

Sample Response

<PutAttributesResponse xmlns="http://sdb.amazonaws.com/doc/2007-11-07">
  <ResponseMetadata>
    <RequestId>490206ce-8292-456c-a00f-61b335eb202b</RequestId>
    <BoxUsage>0.0000219907</BoxUsage>
  </ResponseMetadata>
</PutAttributesResponse>

Wow. A GET request with a parameter called Action which modifies data? What is this, 2005? I thought we already went through the realization that GET requests that modify data are bad after the Google Web Accelerator scare of 2005?

Of course, I'm not the only one that thinks this is ridonkulous. See similar comments from Stefan Tilkov, Joe Gregorio, and Steve Loughran. Methinks, someone at Amazon needs to go read some guidelines on building RESTful Web services.

Bonus points to Subbu Allamaraju for refactoring the SimpleDB API into a true RESTful Web service

Speaking of ridonkulous APIs trends, it seems the SimpleDB Query method follows the lead of the Google Base GData API in stuffing a SQL-like query language into the query string parameters of HTTP GET requests. I guess it is RESTful, but Damn is it ugly.

Now playing: J. Holiday - Suffocate


 

Categories: Platforms | XML Web Services

Two days ago a bug was filed in the RSS Bandit bug tracker that claimed that Slashdot is banning RSS Bandit users because the application is acting like a Denial of Service client. The root cause of the problem is a bug in the logic for downloading favicons where the application repeatedly attempts to download favicons from each site in your subscription list if there is an error accessing or processing one of the favicons in your list of subscriptions.

We will be releasing version 1.6.0.0 of RSS Bandit this weekend which remedies this problem. I plan to spend all of Saturday fixing as many bugs as I can and polishing up the Top Stories feature. We will likely ship the release sometime on Sunday. It should be noted that this release will be the first version RSS Bandit that will require version 2.0 of the Microsoft .NET Framework or later.

I’d like to apologize to everyone who has been inconvenienced by this issue. Thanks for your support and the great feedback you’ve been sending us.

Now playing: Cool Breeze - Watch for the Hook (Dungeon Family Remix) (feat. Outkast & Goodie Mob)


 

Categories: RSS Bandit

In the past three months, I’ve seen three moves by Google that highlight that not only is their strategic vision becoming more questionable but their engineering talent has also begun to show signs of being seriously deficient

This year is going to be a watershed year for the company. They are eroding a lot of the geek cred they’ve built up over the past decade. That will be hard to regain once it is lost.

In the meantime, I’ve noticed an uptick in the quiet smart folks you don’t see heralded in blogs turning down offers from Google and picking another horse when the job offers come knocking. Watershed year, indeed.

Now playing: John Legend - I Can Change (feat. Snoop Doggy Dogg)


 

December 19, 2007
@ 03:14 PM

I’m now at the point where I really, really, really want to blog but have too much going on at work and at home to take the time out to do it. To deal with this I’ve created a Twitter account. You can follow me at http://twitter.com/Carnage4Life.

Things I’ll eventually write about in my blog

  • Amazon Simple DB
  • A new release of RSS Bandit shipping this weekend
  • Thoughts on integrating RSS Bandit and Google Reader based on information found from the pyrfeed documentation 
  • Expressiveness of Python vs. C# 3.0

In the meantime, you can get my thoughts on various topics in 140 characters or less from Twitter.

PS: I’m amazed at how obnoxious Twitter is about collecting the password to your GMail/Yahoo/Hotmail/etc account so it can spam your friends. At first glance, it looked as if it wouldn’t even let me use the service until I gave up those passwords. This crap has gotten out of hand.

PPS: Anyone got decent recommendations for a Twitter client that works on Vista and XP?

Now playing: N.W.A. - Real N*ggaz Don't Die


 

Categories: Personal