Friday, 30 November 2007 - Dare Obasanjo's weblog

November 30, 2007

@ 02:50 PM

Recently I logged into Facebook and saw a notification on my news feed that someone who'd I'd met at Microsoft under professional circumstances had uploaded some pictures. I clicked on the link and saw what looked like a college prank. There was a picture of a person being led down a hall way while entirely nude by a member of the opposite sex while others watched nonchalantly. Another picture had multiple naked people of the same sex in the aforementioned hall way, one of them in a suggestive position. After those two pictures I'd seen enough and clicked away.

The problem here is the one I've blogged about previously in posts like Facebook's Achilles Heel: Handling Multiple Social Contexts and that Cory Doctorow recently wrote about in his article How Your Creepy Ex-Co-Workers Will Kill Facebook. The person who uploaded the photos [who happens to be a college student] didn't consider that although the pictures may have been fine to share with college buddies, they probably aren't OK to share with people who you've only met in a professional context. Facebook's poor handling of multiple social contexts (professional acquaintances vs. college buddies) caused an embarrassing situation for both of us. Cory Doctorow's article tells of a similar story which is excerpted below

Here's one of boyd's examples, a true story: a young woman, an elementary school teacher, joins Friendster after some of her Burning Man buddies send her an invite. All is well until her students sign up and notice that all the friends in her profile are sunburnt, drug-addled techno-pagans whose own profiles are adorned with digital photos of their painted genitals flapping over the Playa. The teacher inveigles her friends to clean up their profiles, and all is well again until her boss, the school principal, signs up to the service and demands to be added to her friends list. The fact that she doesn't like her boss doesn't really matter: in the social world of Friendster and its progeny, it's perfectly valid to demand to be "friended" in an explicit fashion that most of us left behind in the fourth grade. Now that her boss is on her friends list, our teacher-friend's buddies naturally assume that she is one of the tribe and begin to send her lascivious Friendster-grams, inviting her to all sorts of dirty funtimes.

Thus I was quite pleased to look at the list of upcoming Facebook features and notice the following item which indicates that similar occurrences will be mitigated in the future.

Sort out your friends.

We’ll let you organize that long list of friends into groups so you can decide more specifically who sees what.

Of course, the proof is in the pudding. Given that I have been disappointed by Facebook's recent attempts to fix flaws in their user experience such as the thumbs up, thumbs down in the News feed and Facebook Beacon 2.0, it isn't a given that the ability to Sort out your friends described above will reduce the number of awkward social situations that occur when one's different social contexts are forced to blend together due to the existence of a single, unified "friends" list.

Cory Doctorow seems to think it is inevitable that no one will get this right when he writes

It's not just Facebook and it's not just me. Every "social networking service" has had this problem and every user I've spoken to has been frustrated by it. I think that's why these services are so volatile: why we're so willing to flee from Friendster and into MySpace's loving arms; from MySpace to Facebook. It's socially awkward to refuse to add someone to your friends list -- but removing someone from your friend-list is practically a declaration of war. The least-awkward way to get back to a friends list with nothing but friends on it is to reboot: create a new identity on a new system and send out some invites (of course, chances are at least one of those invites will go to someone who'll groan and wonder why we're dumb enough to think that we're pals).
That's why I don't worry about Facebook taking over the net. As more users flock to it, the chances that the person who precipitates your exodus will find you increases. Once that happens, poof, away you go -- and Facebook joins SixDegrees, Friendster and their pals on the scrapheap of net.history.

I agree with the sentiment but disagree with the conclusion that seems to imply that no one is going to figure out how to get this right. I suspect Facebook will give it a good shot because the future of the company will eventually depend on it as the site's users grows older and have a longer history with the service. In the real world, we often reboot our social networks by switching jobs, graduating from school, getting married, etc. Unfortunately, social networking sites don't really account for that which leads to the jumping around that Cory describes.

If Facebook doesn't want its users to start "graduating" to the next hot social networking site as some users are doing by "graduating" from MySpace to Facebook, then they will have to figure out how to deal with multiple social contexts and people's need to reboot their social network as they experience life changes.

Categories: Social Software

November 30, 2007

@ 01:54 PM

Categories:

November 30, 2007

@ 04:00 AM

Comments [0]

Memetracking in RSS Bandit

I had some free time last night, so I ported my meme tracker in IronPython over to C# and integrated it into the RSS Bandit source tree. Below is a screenshot of the results of clicking the [Top Stories] button on my feeds today.

Screenshot of RSS Bandit 'Top Stories' Feature

Now the fun part of the code is over and the fit & finish begins. I probably need some CSS help to render that list in a style that is a little more pleasant and less utilitarian. I was going to start of by copying the TechMeme look and feel but any other suggestions are welcome.

After that I need to figure out what configuration options we should have, if any. Right now it shows the top 10 stories from the past week using a weighted scoring mechanism that rewards newly popular items and penalizes older ones. So a page with 3 links from your subscriptions today will rank higher than an item with 5 links from your subscriptions from three days ago. I’m not sure if people will want to change any of those factors (number of posts shown, the date range or the weighted scoring system) but then again I hate cluttering the UI up with configuration options that only 1% of our user base will ever use.

I should also decide if we show multiple posts from the same feed which link to the same item or just pick the newest one. Right now, I’m only showing a single post from the feed because I subscribe to planet feeds like MSDN Blogs and ASP.NET Weblogs which means that when there are announcements like the rebranding of Silverlight 1.1 to Silverlight 2.0 there are like 20 posts from those two feeds alone about it.

Finally, I was wondering about a “mark all items that reference this link as read” feature. For example, I really don’t need to read every post about Google’s clean energy aspirations or a geek dinner starring Robert Scoble and Dave Winer to get the gist of it. So being able to mark every post linking to these items as read once I read the main link would be nice. On the other hand, I am interested in all of the various opinions on Facebook Beacon from my subscriptions. So that rules out an automatic ‘mark as read’ feature which was my initial thought.

Maybe I was wrong, there is still some fun coding left. Namaste.

Now playing: Dream - Shawty Is A 10 (remix) (feat. R. Kelly)

Categories: RSS Bandit

November 27, 2007

@ 04:00 AM

Comments [6]

Some Thoughts on the Facebook Beacon

Recently I’ve read a number of negative posts about the Facebook Beacon which highlight how easy it is for a company to completely misjudge the privacy implications and ramifications of certain features in social software applications.

Charlene Li, a Principal Analyst at Forrester Research who specializing in social software trends and marketing, writes in her blog post Close encounter with Facebook Beacon

I put a lot of trust in sites like Facebook to do the right thing when it comes to privacy. After all, the only stuff that gets out into the public is the stuff that I actually put in. Until now.

Earlier this week, I bought a coffee table on Overstock.com. When I next logged into Facebook and saw this at the top of my newsfeed:

I was pretty surprised to see this, because I received no notification while I was on Overstock.com that they had the Facebook Beacon installed on the site. If they had, I would have turned it off.

I used my personal email address to buy the coffee table, so I was puzzled why and how this "personal" activity was being associated with my "public" Facebook profile.

David Treadwell, a corporate vice president of Windows Live, writes in his blog post entitled Blockbuster, you're fired

Yesterday evening, I decided to add a few movies to my Blockbuster queue. Upon adding movies, I was surprised to see toasts from Facebook showing up on the Blockbuster site indicating that something was being added to my Facebook news feed. When I finished adding movies, I went to Facebook to see what was going on. I was then quite surprised to learn that Blockbuster and Facebook were conspiring to broadcast my movie selections to my Facebook friends.

I am not normally uptight about privacy issues, but you guys really crossed the line on this one:

I had never told either Blockbuster or Facebook that you should share my movie selections with friends.

Neither of you asked me if you could take this action. You just went ahead and did it, assuming that I would not mind.

This sharing of information about me without my informed consent about the mechanism of sharing is absolutely unacceptable to me.

You can find similar complaints all over the Web from similarly Web savvy folks who you typically don’t see griping about privacy issues. In all of the complaints raised, the underlying theme is that Facebook violated the principle of putting the user in control of their user experience.

As someone who works on a competing service I have to give the folks on Facebook credit for shipping the Facebook Beacon so quickly. I assumed something like that was still about six months away from being on their radar. I do give them poor marks when it comes to how this feature has been rolled out. There are several problems with how this feature has been rolled out when it comes to how it affects their users.

Linking identities and data sharing without user permission: One of the thinks people have found creepy about this feature is that they are automatically discovered to be Facebook users on sites that they have not told they use Facebook. In Charlene’s case, she actually uses different email addresses to log in on both sites which must have seemed even doubly weird to her at first. As Ethan Zuckerman points out in his post Facebook changes the norms for web purchasing and privacy this completely upturns user expectations of how privacy on the Web works especially when it comes to cookies.

It's a genuine concern that Facebook has opened a Pandora's box when you consider what could happen if it is deemed socially acceptable for Web sites to use cookies to actively identify users across sites as opposed to the passive way it is done today. I’m sure the folks at Google would be excited about this since thanks to AdSense and DoubleClick, they probably have cookies on every computer on the Web that has cookies turned enabled in the Web browser. Today it’s Facebook, tomorrow Amazon and eBay are posting your purchase history to every OpenSocial enabled web site courtesy of the cookies from these sites or from Google ads on your machine.
No global opt-out: There is no way to turn off this feature. The best you get is that when a site tries to publish an update to your news feed and mini-feed, you get an entry for the site added to your Privacy Settings for External Websites page on Facebook. I guess it never occured to Mark Zuckerburg and Justin Rosenstein that not sharing my purchase history with Facebook is a valid privacy option. Why do I have to police this list and refer back to it every couple of days to figure out if some new Web site is now publishing my private data to Facebook without my permission?
I expect that kind of myopia and hubris from the Googles and Microsofts of the world not Facebook. Wow, the honeymoon lasted shorter than I expected.

I suspect that Facebook will loathe fixing both issues. The first issue can’t really be solved by having partner sites provide an opt-in mechanism because there is the valid concern that (i) people won’t opt-in to the feature and (ii) the experience and messaging will vary too much from site to site for users to have a consistent set of expectations. This then points to Facebook having an opt-in page for partner sites that is part of the Facebook settings page for this feature but that may start getting away from the add 3 lines of code to reach millions of users sales pitch which they have going. Adding a global opt-out button is also similarly fraught with down side for Facebook.

At this point, they’ll have to do something. I’ll be impressed if they address both issues. Anything less is simply not good enough.

PS: The technically inclined folks in the audience should take a look at Jay Goldman’s excellent Deconstruction of the Facebook Beacon Javascript. Found via Sam Ruby.

Now playing: Eightball & MJG - Relax & Take Notes (feat. Project Pat & Notorious B.I.G.)

Categories: Competitors/Web Companies | Social Software | Windows Live

November 26, 2007

@ 01:56 PM

Comments [3]

A Meme Tracker in IronPython

My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby's meme tracker (source code) from CPython to Iron Python. Sam's meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions. A nice wrinkle Sam added to his code is that more recent posts are weighted higher than older posts. So a newly hot item with 3 or 4 links posted yesterday ends up ranking higher than an item with 6 to 10 posts about it from five days ago. Using that sort of weighting probably wouldn't have occurred to me if I just hacked the feature on my own, so I'm glad I spent the time learning Sam's code.

There are a few differences between Sam's code and mine, the most significant being that I support two modes; showing the most popular items from all unread posts and showing the most popular items from the past week. The other differences mainly have to do with the input types (Atom entries vs. RSS feeds) and using .NET Libraries like System.Xml and System.IO instead of CPython libraries like libxml2 and blob. You can see the difference between both approaches for determining top stories on my feed subscriptions below

Top Stories in all Unread Posts

OpenSocial (score: 5.0)
The Future of Reading (score: 3.0)
MySpace (score: 3.0) [Ed Note: Mostly related to OpenSocial]

As you can probably tell, the weighted scoring isn't used when determining top stories in all unread posts. I did this to ensure that the results didn't end up being to similar for both approaches. This functionality is definitely going to make its way into RSS Bandit now that I've figured out the basic details on how it should work. As much as I'd like to keep this code in Iron Python, I'll probably port it to C# when integrating it for a number of practical reasons including maintainability (Torsten shouldn't have to learn Python as well), performance and better integration into our application.

Working with Python was a joy. I especially loved programming with a REPL. If I had a question about what some code does, it's pretty easy to write a few one or two liners to figure it out. Contrast this with using Web searches, trawling through MSDN documentation or creating a full blown program just to test the out some ideas when using C# and Visual Studio. I felt a lot more productive even though all I was using was Emacs and a DOS prompt.

I expected the hardest part of my project to be getting my wife to tolerate me spending most of the weekend hacking code. That turned out not to be a problem because it didn't take as long as I expected and for the most part we did spend the time together (me on the laptop, her reading The Other Boleyn Girl, both of us on the sofa).

There are at least two things that need some fine tuning. The first is that I get the title of the link from the text of the links used to describe it and that doesn't lead to very useful link text in over half of the cases. After generating the page, there may need to be a step that goes out to the HTML pages and extracts their title elements for use as link text. The second problem is that popular sites like Facebook and Twitter tend to show up every once in a while in the list just because people talk about them so much. This seems to happen less than I expected however, so this may not be a problem in reality.

Now I just have to worry about whether to call the button [Show Popular Stories] or [Show Most Linked Stories]. Thoughts?

import time, sys, re, System, System.IO, System.Globalization
from System import *
from System.IO import *
from System.Globalization import DateTimeStyles
import clr
clr.AddReference("System.Xml")
from System.Xml import *

#################################################################
#
# USAGE: ipy memetracker.py <directory-of-rss-feeds> <mode>
# mode = 0 show most popular links in unread items
# mode = 1 show most popular links from items from the past week
#################################################################

all_links = {}
one_week = TimeSpan(7,0,0,0)

cache_location = r"C:\Documents and Settings\dareo\Local Settings\Application Data\RssBandit\Cache"
href_regex = r"<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>"
regex = re.compile(href_regex)

(popular_in_unread, popular_in_past_week) = range(2)
mode = popular_in_past_week

class RssItem:
    """Represents an RSS item"""
    def __init__(self, permalink, title, date, read, outgoing_links):
        self.outgoing_links = outgoing_links
        self.permalink      = permalink
        self.title          = title
        self.date           = date
        self.read           = read

def MakeRssItem(itemnode):
    link_node = itemnode.SelectSingleNode("link")
    permalink = link_node and link_node.InnerText or ''
    title_node = itemnode.SelectSingleNode("title")
    title      = link_node and title_node.InnerText or ''
    date_node = itemnode.SelectSingleNode("pubDate")
    date       = date_node and DateTime.Parse(date_node.InnerText, None, DateTimeStyles.AdjustToUniversal) or DateTime.Now
    read_node = itemnode.SelectSingleNode("//@*[local-name() = 'read']")
    read       = read_node and int(read_node.Value) or 0
    desc_node = itemnode.SelectSingleNode("description")
    # obtain href value and link text pairs
    outgoing   = desc_node and regex.findall(desc_node.InnerText) or []
    outgoing_links = {}
    #ensure we only collect unique href values from entry by replacing list returned by regex with dictionary
    if len(outgoing) > 0:
        for url, linktext in outgoing:
            outgoing_links[url] = linktext
    return RssItem(permalink, title, date, read, outgoing_links)

if __name__ == "__main__":
    if len(sys.argv) > 1: #get directory of RSS feeds
        cache_location = sys.argv[1]
    if len(sys.argv) > 2: # mode = 0 means use only unread items, mode = 1 means use all items from past week
        mode           = int(argv[2]) and popular_in_past_week or popular_in_unread

    print "Processing items from %s seeking items that are %s" % (cache_location,
                                                                  mode and "popular in items from the past week"
                                                                  or "popular in unread items" )
    #decide what filter function to use depending on mode
    filterFunc = mode and (lambda x : (DateTime.Now - x.date) < one_week) or (lambda x : x.read == 0)
    #in mode = 0 each entry linking to an item counts as a vote, in mode = 1 value of vote depends on item age
    voteFunc   = mode and (lambda x: 1.0 - (DateTime.Now.Ticks - x.date.Ticks) * 1.0 / one_week.Ticks) or (lambda x: 1.0)

    di = DirectoryInfo(cache_location)
    for fi in di.GetFiles("*.xml"):
        doc = XmlDocument()
        doc.Load(Path.Combine(cache_location, fi.Name))
        # for each item in feed
        # 1. Get permalink, title, read status and date
        # 2. Get list of outgoing links + link title pairs
        # 3. Convert above to RssItem object
        items = [ MakeRssItem(node) for node in doc.SelectNodes("//item")]
        feedTitle = doc.SelectSingleNode("/rss/channel/title").InnerText
        # apply filter to pick candidate items, then calculate vote for each outgoing url
        for item in filter(filterFunc, items):
            vote = (voteFunc(item), item, feedTitle)
            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) == None:
                    all_links[url] = []
                all_links.get(url).append(vote)

       # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

    # output the results, choose link text from first item we saw story linked from
    print "<ol>"
    for weight, link in weighted_links[:10]:
        link_text = (all_links.get(link)[0])[1].outgoing_links.get(link)
        print "<li><a href='%s'>%s</a> (%s)" % (link, link_text, weight)
        print "<p>Seen on:"
        print "<ul>"
        for weight, item, feedTitle in all_links.get(link):
            print "<li>%s: <a href='%s'>%s</a></li>" % (feedTitle, item.permalink, item.title)
        print "</ul></p></li>"
    print "</ol>"

Categories: Programming

November 24, 2007

@ 02:28 PM

Comments [15]

Interested in "Python for C# Developers?"

I've been reading Dive Into Python for a couple of days have gotten far enough to tell what the following block of code does if the object variable is an arbitrary Python object

methodList = [method for method in dir(object) if callable(getattr(object, method))]
    processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
    print "\n".join(["%s %s" %
                      (method.ljust(spacing),
                       processFunc(str(getattr(object, method).__doc__)))
                     for method in methodList])

However I've been having a tougher time than I expected with learning Python using Dive Into Python for a couple of reasons, some to do with how the book is laid out and some to do with my choice of IronPython as my Python implementation of choice. The main problems I've been having are

IronPython doesn't have a number of standard libraries that are used in the book such as os, xml.dom, UserDict, sgmllib and so on. This means I can't construct or run a number of the more advanced samples in the book.
The book seems to dive into meaty topics before covering the basics. For example, introspection (aka reflection) and lambda functions (aka anonymous methods) are covered in Chapter 4 while defining classes and importing modules is covered in Chapter 5. Similarly, a discussion on for loops seems to have been removed from the most recent version of the book and replaced with a discussion on list comprehensions which are definitely superior but more alien to the novice.
I tend to learn comparatively. Thus I would love to get a more direct mapping of constructs in C# to those in Python with discussions on Python features that don't exist in C#. This is the tack I took with C# from a Java Developers perspective and it seems a lot of people found that useful.

For these reasons, I thought it would be useful to create a set of tutorials for myself that would address some of these issues I've been having. Then I wondered if other C# developers wouldn't find such an article or tutorial useful as well. Are there any C# developers out there that would actually be interested in this or am I an edge case?

PS: I wouldn't expect to get time to actually writing, editing and publishing such tutorials until the summer of next year at the earliest (which I expect should coincide with the availability of IronPython 2.0).

Categories: Programming

November 21, 2007

@ 04:00 AM

Comments [5]

The Joy of Programming: Learning Python and Simplifying Life

I’ve been working on simplifying my life and improving on my mental state over the last year or so. I’m now at the stage where I think I’ve gotten into a decent routine with diet and excercise. My next step [now that the rigors of buying the house and planning the wedding are over] is to broaden my programming horizons by learning a radically different programming language from my comfort zone, Python and C# respectively, while not harming my personal life or work habits.

It turns out that I can add an hour or two to my day by (i) leaving home earlier and thus avoiding traffic (ii) reading blogs less (iii) unsubscribing from most of the Microsoft internal mailing lists I was on and (iv) scheduling meetings so they are clumped together instead of having three meetings with 30 minutes in between each one thus burning up an hour of my time mentally twiddling my thumbs and checking email.

So far I’ve installed IronPython and python-mode. I’ve also started reading Dive into Python and have gotten as far as Chapter 3. I’d just like to thank folks like Mark Pilgrim, Jim Hugunin and Barry Warsaw who are gifting programmers with such wonderful resources. Right now I’m still trying to wrap my mind around Everything is An Object

Everything in Python is an object, and almost everything has attributes and methods.

This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects.

All functions have a built-in attribute __doc__, which returns the doc string defined in the function's source code. The sys module is an object which has (among other things) an attribute called path. And so forth.

So far this is already an enjoyable experience for someone who has mostly been programming in Javascript (not object oriented, dynamic but weakly typed) and C# (statically typed, no REPL) for the past few years.

Once I’m done reading Dive into Python, my plan is to integrate Sam Ruby’s MeMeme 2.0 into RSS Bandit. That way even though I’ve stopped reading blogs regularly, I don’t end up finding out days later that Visual Studio 2008 and .NET Framework 3.5 were released because it wasn’t on TechMeme or programming.reddit.

Optimizing your life by writing code is fun. I guess this is what they call life hacking. Wink

Now playing: The Clash - Rock The Casbah

Categories: Programming

November 19, 2007

@ 03:21 PM

Comments [1]

What's Right and Wrong with Code Generation in Web Services

Via Steve Vinoski's Answers for Sergey I stumbled upon Sergey Beryozkin's Questions for Steve which started of with the following question

1. Do you think client code generation is evil ? If yes, do you expect people to do manual programming on a large scale ?

The problem with the software industry [or should I say humanity?] is that we like to take absolutist positions because they are easier to defend or argue against than admitting that the world is full of shades of gray.

Starting with the first question, code generation isn't necessarily bad let alone evil. However there are lots of problems with how code generation is implemented by the major vendors that support the SOAP/WSDL/XSD/WS-* family of technologies. Steve does a good job of laying out the problems with these approaches.

The first problem Steve points out is that a lot of these SOAP toolkits have implement some form of location transparency which tries to hide as much as possible the differences between invoking a remote system and calling a method on a local object. This behavior even flies in the face of SOA since one of the four tenets of service orientation is that boundaries are explicit. Another problem is that the inflexible and rigid requirements of static typing systems runs counter to the distributed and flexible nature of the Web. I posted a practical example a few years ago in my post entitled Why You Should Avoid Using Enumerated Types in XML Web Services. In that example, I pointed out that if you have a SOAP Web Service that returns an enumeration with the possible value {CDF, RSS10, RSS20} and in a future release modify that enumeration by adding a new syndication format {CDF, RSS10, RSS20, Atom} then even if you never return that syndication format to old clients written in .NET, these clients will still have to be recompiled because of the introduction of a new enumeration value. I find it pretty ridiculous that till today I have list of "people we need to call and tell to recompile their code whenever we change an enum value in any of our SOAP Web Services". Of course, some of this foolishness can be mitigated in a statically typed system using technologies like Java's dynamic class loading but that is really just an insecure workaround that tries to hide the fact that what you really need here is a dynamically typed system.

The second question is really asking whether we want developers writing XML processing code by hand instead of having a code generator do this work. Even though I used to work on the XML team at Microsoft, I do agree that it is a valid concern that you shouldn't want to spend a lot of effort writing code for parsing and processing XML if that is not the core function of your application. Again, Steve Vinoski hits the nail on the head with the suggestion to use standard data formats and MIME types. For example, if I decide to use application/atom+xml MIME type for the data that is returned by my RESTful Web service then clients can choose from a vast array of libraries for processing Atom feeds [such as the Windows RSS platform, ROME, Mark Pilgrim's Universal Feed Parser, etc] without having to write a lot of XML parsing code. If you must provide your own custom formats then it is imperative to make sure that it is easy to consume these formats from any platform by using a consistent and simple data model for the data format. A number of popular Web service APIs like the Flickr API, and the Facebook platform have provided client libraries for their APIs, this should be considered the exception and not the rule. Even in their case, it is interesting to note that a large proportion of the client libraries for these services are not actually maintained or developed by the creators of the service. This highlights the value of utilizing simple data formats and straightforward protocols. That way it isn't actually a massive undertaking for client developers to build and share libraries that abstract away the XML processing code. Of course, all of this can be avoided by just using standard MIME types and data formats that are already supported on a wide array of platforms instead of reinventing the wheel.

Categories: XML Web Services

November 19, 2007

@ 01:46 PM

Comments [5]

Guidelines for Building RESTful Web Services

In a comment on my previous post, pwb states

"REST" still suffers mightily from no real guidance on how exactly to construct something REST-ful. A dissertation doesn't cut it.

I guess it depends on what is meant by "real guidance". I can argue that there is no real guidance on how to construct object oriented systems, distributed applications or high performance data storage systems. Whether you agree or disagree with any of those statements depends on how "official" or "canonical" one expects such guidance to be.

If you are interested in building RESTful Web services. Here are the top 3 ways I suggest one learn about building and designing a RESTful system.

Learn the Atom Publishing Protocol: The best way to figure out how to build a RESTFul Web service is to actually use a well designed one and none is a better example of what a RESTful Web service should be than the Atom Publishing Protocol. For extra credit, you should also read up on Google's GData protocol and come up with a list of pros and cons of the approaches they've taken to extending AtomPub.
Read Joe Gregorio's "Restful Web" column on XML.com: Joe Gregorio is one of the editors of RFC 5023 (the Atom Publishing protocol) and is currently employed at Google working to make GData even more Web friendly. He started a series of articles on XML.com on building and designing RESTFul Web services complete with code samples. The entire list of articles can be found here but if you don't have the time to read them all, I suggest starting with How to Create a REST Protocol which covers the four decisions you must make as part of the design process for your service
- What are the URIs? [Ed note - This is actually "What are the resources?" ]
- What's the format?
- What methods are supported at each URI?
- What status codes could be returned?
If you have more time I suggest following that article up with Constructing or Traversing URIs? which contrasts client access models based on URI construction [where clients have baked into them knowledge of the URI structure of your service] or by URI traversal [where clients discover resources by following links either within the primary resources your service returns or via a service document that describes your service's end points]. Just because WSDL is a disaster doesn't mean that interface definition languages aren't useful or that they aren't needed in RESTful applications. Afterall even Atom has service documents.

And finally, because I don't believe you get to design a system without being familiar with what the code looks like and should do, I'd suggest reading Dispatching in a REST Protocol Application which walks through what the server side code for a particular RESTful service looks like. He even throws in some performance optimizations at the end.

Of course, I know you're actually going to read all the articles in the series because you're a dutiful lil' Web head and not shortcut seeker. Right? :)
Buy "RESTful Web Services" by Leonard Richardson, Sam Ruby, and David Heinemeier Hansson: If you are the kind of person who prefers to have a book than learning from "a bunch of free articles on the Internet" then RESTful Web Services is for you. I haven't read the book but have seen it on the desks of more co-workers than I'd care to admit and each of them favorably recommended it. Sam Ruby, Leonard Richardson and DHH know their stuff so you'll be learning at the feet of gurus.

To me, this seems like an abundance of resources for learning about building RESTful Web services. Now I understand that there are some for whom until it shows up on MSDN, IBM Developer Works or in a Gartner Report it might as well not exist. To these people all I can say is that "It must be great to have all that free time now that you have outsourced your thinking and business critical analysis to someone else". :)

Categories: XML Web Services

November 15, 2007

@ 12:43 PM

Comments [13]

WS-* is to REST as Theory is to Practice

Recently someone asked in the comments to one of my posts why I seem to be down on the WS-* family of technologies (XSD, WSDL, SOAP, etc) when just a few years ago I worked on Microsoft’s implementations of some of these technologies and used to blog about them extensively.

I was composing a response when I stumbled on James Snell’s notes on the recent QCon conference that captures the spirit of my “conversion” if you want to call it that. He wrote

Those who are familiar with my history with IBM should know that I was once a *major* proponent of the WS-* approach. I was one of the original members of the IBM Emerging Technologies Toolkit team, I wrote so many articles on the subject during my first year with IBM that I was able to pay a down payment on my house without touching a dime of savings or regular paycheck, and I was involved in most of the internal efforts to design and prototype nearly all of the WS-* specifications. However, over the last two years I haven’t written a single line of code that has anything to do with WS-*. The reason for this change is simple: when I was working on WS-*, I never once worked on an application that solved a real business need. Everything I wrote back then were demos. Now that I’m working for IBM’s WebAhead group, building and supporting applications that are being used by tens of thousands of my fellow IBMers, I haven’t come across a single use case where WS-* would be a suitable fit. In contrast, during that same period of time, I’ve implemented no fewer than 10 Atom Publishing Protocol implementations, have helped a number of IBM products implement Atom and Atompub support, published thousands of Atom feeds within the firewall, etc. In every application we’re working on, there is an obvious need to apply the fundamental principles of the REST architectural style. The applications I build today are fundamentally based on HTTP, XML, Atom, JSON and XHTML.

My movement towards embracing building RESTful Web services from being a WS-* advocate is based on my experiences as someone who worked on the fundamental building blocks of these technologies and then as someone who became a user of these technologies when I moved to ~~MSN~~ Windows Live. The seeds were probably sown when I found myself writing code to convert Microsoft’s GetTopDownloads Web service to an RSS feed because the SOAP Web service was more complicated to deal with and less useful than an RSS feed. Later on I realized that RSS was the quintessential RESTful Web service and just asking people “How many RSS feeds does Microsoft produce?” versus how many SOAP endpoints does Microsoft expose is illuminating in itself.

Since then we’ve reached a world where thousands of applications being utilized by millions of end users are built on RESTful Web services on the public internet. My favorite example of the moment is the Facebook developer platform and before that it was Flickr and Amazon S3. Compare that with the number of SOAP and WS-* interfaces that are being used to build real developer platforms that benefit end users on the Web today.

Earlier today, I was contemplating Don Box’s recent post where he complained about the diversity of authentication schemes of various RESTful Web services from the “J. Random Facebook/Flickr/GData” services on the Web today. Don seems to hint that WS-Security/WS-Trust would somehow solve this problem which is rightfully debunked by Sam Ruby who points out that all those technologies do is give you a more complicated version of the extensible authentication story that is available in HTTP. So the only real issue here is that there are actually enough RESTful Web services on the Internet for Don Box to complain about the diversity that comes from having a flexible authentication model for Web services. On the other hand, there are so few useful public WS-* Web services on the Web (read: zero) that Don Box hadn’t encountered the same problem with WS-Security/WS-Trust since no one is actually using them.

At this point I realize I’m flogging a dead horse. The folks I know from across the industry who have to build large scale Web services on the Web today at Google, Yahoo!, Facebook, Windows Live, Amazon, etc are using RESTful Web services. The only times I encounter someone with good things to say about WS-* is if it is their job to pimp these technologies or they have already “invested” in WS-* and want to defend that investment.

At the end of the day, my job is to enable successful developer platforms that enrich our users’ lives not pimp a particular technology. So if you are one of my readers and were wondering what was behind my shift from thinking that WS-* related technologies were the cat’s pajamas and my current RESTful Web services/APP/GData/Web3S bent, now you know.

Enjoy.

Now playing: Brooke Valentine - Girlfight (feat. Big Boi & Lil Jon)

Categories: XML Web Services

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Friday, 30 November 2007 - Dare Obasanjo's weblog

Sort out your friends.