December 1, 2007
@ 05:24 PM

Earlier this week I wrote a blog post which pointed out that the two major privacy and user experience problems with Facebook Beacon where that it (i) linked a user's Facebook account with an account on another site without the users permission and (ii) there was no way for a user to completely opt out of being tracked by the system.  Since then Facebook has announced some changes which TechCrunch named Facebook Beacon 2.0. The changes are excerpted below

Notification

Facebook users will see a notification in the lower right corner of the screen after transacting with a Beacon Affiliate. Options include “No Thanks” that will immediately stop the transaction from being published. Alternatively closing or ignoring the warning won’t immediately publish the story, but it will be put in a queue
beacon2b.jpg

Second Warning

Presuming you’ve ignored or closed the first notification, Facebook warns users again the next time they visit their home page. A new box reminds you that an activity has been sent to Facebook. Like the first notification you can choose to not publish the activity by hitting remove, or you can choose to publish it by hitting ok.

...

Opt Out
Found via the “External Websites” section of the Facebook Privacy page, this allows users to permanently opt in or out of Beacon notifications, or if you’re not sure be notified. The downside is that there is no global option to opt out of every Beacon affiliated program; it has to be set per program. Better this than nothing I suppose.

The interesting thing to note is that neither of the significant problems with Beacon have been fixed. After the changes were announced there was a post on the CA Security Advisory blog titled Facebook's Misrepresentation of Beacon's Threat to Privacy: Tracking users who opt out or are not logged in which pointed out that the complaining about purchase history getting into the news feed of your friends is a red herring, the real problem is that once a site signs up as a Facebook affiliate they begin to share every significant action you take on the site with Facebook without your permission. 

Which is worse, your friends knowing that you rented Prison Girls or Facebook finding that out without your permission and sharing that with their business partners, without your permission? Aren't there laws against this kind of invasion of privacy? I guess there are (see 18 U.S.C. § 2710)

I wonder who'll be first to sue Facebook and Blockbuster? 

Anyway, back to the title of this blog post. The problem with Facebook Beacon is that it is designed in a way that makes it easy for Facebook Beacon affiliates to integrate into their sites at the cost of user's privacy. From Jay Goldman's excellent post where he Deconstructed the Facebook Beacon Javascript we learn

Beacon from 10,000 Feet

That basically wraps up our tour of how Beacon does what it does. It's a fairly long explanation, so here's a quick summary:

  1. The partner site page includes the beacon.js file, sets a <meta> tag with a name, and then calls Facebook.publish_action.            
  2. Facebook.publish_action builds a query_params object and then passes it to Facebook._send_request.            
  3. Facebook._send_request dynamically generates an <iframe>which loads the URL http://www.facebook.com/beacon/auth_iframe.php and passes the query_params. At this point, Facebook now knows about the news feed item whether you choose to publish it or not. 

When you read this you realize just how insidious the problem actually is. Facebook isn't simply learning about every action taken by Facebook users on affiliate sites, it is learning about every action taken by every user of these affiliate sites regardless of whether they are Facebook users or not.

At first I assumed that the affiliates sites would call some sort of IsFacebookUser() API and then decide whether to send the action or not. Of course, this is still broken since the affiliate site has told Facebook that you are a user of the site, and depending on the return value of the hypothetical function the affiliate in turn learns that you are a Facebook user.

But no, it is actually worse than that. The affiliate sites are pretty much dumping their entire customer database into Facebook's lap, FOR FREE and without their customers permission. What. The. Fuck.

The icing on the cake is the following excerpt from the Facebook Beacon page

Stories of a user's engagement with your site may be displayed in his or her profile and in News Feed. These stories will act as a word-of-mouth promotion for your business and may be seen by friends who are also likely to be interested in your product. You can increase the number of friends who see these stories with Facebook Social Ads.

So after giving Facebook millions of dollars in customer intelligence for free in exchange for spamming their users, Facebook doesn't even guarantee their affiliates that the spam will even get sent. Instead these sites have to pay Facebook to "increase the chances" that they get some return for the free customer intelligence they just gave Facebook.

This reminds me of the story of Tom Sawyer tricking people into paying him to paint a fence he was supposed to paint as part of his chores.

At the end of the day, Facebook can't fix the privacy problems I mentioned in my previous post in a way that completely preserves their users privacy without completely changing the design and implementation of Facebook Beacon. Until then, we'll likely see more misdirection, more red herrings and more violations of user privacy to make a quick buck. 


 

Recently I logged into Facebook and saw a notification on my news feed that someone who'd I'd met at Microsoft under professional circumstances had uploaded some pictures. I clicked on the link and saw what looked like a college prank. There was a picture of a person being led down a hall way while entirely nude by a member of the opposite sex while others watched nonchalantly. Another picture had multiple naked people of the same sex in the aforementioned hall way, one of them in a suggestive position. After those two pictures I'd seen enough and clicked away.

The problem here is the one I've blogged about previously in posts like Facebook's Achilles Heel: Handling Multiple Social Contexts and that Cory Doctorow recently wrote about in his article How Your Creepy Ex-Co-Workers Will Kill Facebook. The person who uploaded the photos [who happens to be a college student] didn't consider that although the pictures may have been fine to share with college buddies, they probably aren't OK to share with people who you've only met in a professional context. Facebook's poor handling of multiple social contexts (professional acquaintances vs. college buddies) caused an embarrassing situation for both of us. Cory Doctorow's article tells of a similar story which is excerpted below

Here's one of boyd's examples, a true story: a young woman, an elementary school teacher, joins Friendster after some of her Burning Man buddies send her an invite. All is well until her students sign up and notice that all the friends in her profile are sunburnt, drug-addled techno-pagans whose own profiles are adorned with digital photos of their painted genitals flapping over the Playa. The teacher inveigles her friends to clean up their profiles, and all is well again until her boss, the school principal, signs up to the service and demands to be added to her friends list. The fact that she doesn't like her boss doesn't really matter: in the social world of Friendster and its progeny, it's perfectly valid to demand to be "friended" in an explicit fashion that most of us left behind in the fourth grade. Now that her boss is on her friends list, our teacher-friend's buddies naturally assume that she is one of the tribe and begin to send her lascivious Friendster-grams, inviting her to all sorts of dirty funtimes.

Thus I was quite pleased to look at the list of upcoming Facebook features and notice the following item which indicates that similar occurrences will be mitigated in the future.

Sort out your friends.

We’ll let you organize that long list of friends into groups so you can decide more specifically who sees what.

Of course, the proof is in the pudding. Given that I have been disappointed by Facebook's recent attempts to fix flaws in their user experience such as the thumbs up, thumbs down in the News feed and Facebook Beacon 2.0, it isn't a given that the ability to Sort out your friends described above will reduce the number of awkward social situations that occur when one's different social contexts are forced to blend together due to the existence of a single, unified "friends" list.

Cory Doctorow seems to think it is inevitable that no one will get this right when he writes

It's not just Facebook and it's not just me. Every "social networking service" has had this problem and every user I've spoken to has been frustrated by it. I think that's why these services are so volatile: why we're so willing to flee from Friendster and into MySpace's loving arms; from MySpace to Facebook. It's socially awkward to refuse to add someone to your friends list -- but removing someone from your friend-list is practically a declaration of war. The least-awkward way to get back to a friends list with nothing but friends on it is to reboot: create a new identity on a new system and send out some invites (of course, chances are at least one of those invites will go to someone who'll groan and wonder why we're dumb enough to think that we're pals).

That's why I don't worry about Facebook taking over the net. As more users flock to it, the chances that the person who precipitates your exodus will find you increases. Once that happens, poof, away you go -- and Facebook joins SixDegrees, Friendster and their pals on the scrapheap of net.history.

I agree with the sentiment but disagree with the conclusion that seems to imply that no one is going to figure out how to get this right. I suspect Facebook will give it a good shot because the future of the company will eventually depend on it as the site's users grows older and have a longer history with the service. In the real world, we often reboot our social networks by switching jobs, graduating from school, getting married, etc. Unfortunately, social networking sites don't really account for that which leads to the jumping around that Cory describes.

If Facebook doesn't want its users to start "graduating" to the next hot social networking site as some users are doing by "graduating" from MySpace to Facebook, then they will have to figure out how to deal with multiple social contexts and people's need to reboot their social network as they experience life changes.


 

Categories: Social Software

November 30, 2007
@ 01:54 PM

Categories:

November 30, 2007
@ 04:00 AM

I had some free time last night, so I ported my meme tracker in IronPython over to C# and integrated it into the RSS Bandit source tree. Below is a screenshot of the results of clicking the [Top Stories] button on my feeds today.

Screenshot of RSS Bandit 'Top Stories' Feature

Now the fun part of the code is over and the fit & finish begins. I probably need some CSS help to render that list in a style that is a little more pleasant and less utilitarian. I was going to start of by copying the TechMeme look and feel but any other suggestions are welcome.

After that I need to figure out what configuration options we should have, if any. Right now it shows the top 10 stories from the past week using a weighted scoring mechanism that rewards newly popular items and penalizes older ones. So a page with 3 links from your subscriptions today will rank higher than an item with 5 links from your subscriptions from three days ago. I’m not sure if people will want to change any of those factors (number of posts shown, the date range or the weighted scoring system) but then again I hate cluttering the UI up with configuration options that only 1% of our user base will ever use.

I should also decide if we show multiple posts from the same feed which link to the same item or just pick the newest one. Right now, I’m only showing a single post from the feed because I subscribe to planet feeds like MSDN Blogs and ASP.NET Weblogs which means that when there are announcements like the rebranding of Silverlight 1.1 to Silverlight 2.0 there are like 20 posts from those two feeds alone about it.

Finally, I was wondering about a “mark all items that reference this link as read” feature. For example, I really don’t need to read every post about Google’s clean energy aspirations or a geek dinner starring Robert Scoble and Dave Winer to get the gist of it. So being able to mark every post linking to these items as read once I read the main link would be nice. On the other hand, I am interested in all of the various opinions on Facebook Beacon from my subscriptions. So that rules out an automatic ‘mark as read’ feature which was my initial thought.  

Maybe I was wrong, there is still some fun coding left. Namaste.

Now playing: Dream - Shawty Is A 10 (remix) (feat. R. Kelly)


 

Categories: RSS Bandit

November 27, 2007
@ 04:00 AM

Recently I’ve read a number of negative posts about the Facebook Beacon which highlight how easy it is for a company to completely misjudge the privacy implications and ramifications of certain features in social software applications.

Charlene Li, a Principal Analyst at Forrester Research who specializing in social software trends and marketing, writes in her blog post Close encounter with Facebook Beacon  

I put a lot of trust in sites like Facebook to do the right thing when it comes to privacy. After all, the only stuff that gets out into the public is the stuff that I actually put in. Until now.

Earlier this week, I bought a coffee table on Overstock.com. When I next logged into Facebook and saw this at the top of my newsfeed:

I was pretty surprised to see this, because I received no notification while I was on Overstock.com that they had the Facebook Beacon installed on the site. If they had, I would have turned it off.

I used my personal email address to buy the coffee table, so I was puzzled why and how this "personal" activity was being associated with my "public" Facebook profile.

David Treadwell, a corporate vice president of Windows Live, writes in his blog post entitled Blockbuster, you're fired

Yesterday evening, I decided to add a few movies to my Blockbuster queue. Upon adding movies, I was surprised to see toasts from Facebook showing up on the Blockbuster site indicating that something was being added to my Facebook news feed. When I finished adding movies, I went to Facebook to see what was going on. I was then quite surprised to learn that Blockbuster and Facebook were conspiring to broadcast my movie selections to my Facebook friends.

I am not normally uptight about privacy issues, but you guys really crossed the line on this one:

  • I had never told either Blockbuster or Facebook that you should share my movie selections with friends.
  • Neither of you asked me if you could take this action. You just went ahead and did it, assuming that I would not mind.
  • This sharing of information about me without my informed consent about the mechanism of sharing is absolutely unacceptable to me.

You can find similar complaints all over the Web from similarly Web savvy folks who you typically don’t see griping about privacy issues. In all of the complaints raised, the underlying theme is that Facebook violated the principle of putting the user in control of their user experience.

As someone who works on a competing service I have to give the folks on Facebook credit for shipping the Facebook Beacon so quickly. I assumed something like that was still about six months away from being on their radar. I do give them poor marks when it comes to how this feature has been rolled out. There are several problems with how this feature has been rolled out when it comes to how it affects their users.

  1. Linking identities and data sharing without user permission: One of the thinks people have found creepy about this feature is that they are automatically discovered to be Facebook users on sites that they have not told they use Facebook. In Charlene’s case, she actually uses different email addresses to log in on both sites which must have seemed even doubly weird to her at first. As Ethan Zuckerman points out in his post Facebook changes the norms for web purchasing and privacy this completely upturns user expectations of how privacy on the Web works especially when it comes to cookies.  

    It's a genuine concern that Facebook has opened a Pandora's box when you consider what could happen if it is deemed socially acceptable for Web sites to use cookies to actively identify users across sites as opposed to the passive way it is done today. I’m sure the folks at Google would be excited about this since thanks to AdSense and DoubleClick, they  probably have cookies on every computer on the Web that has cookies turned enabled in the Web browser. Today it’s Facebook, tomorrow Amazon and eBay are posting your purchase history to every OpenSocial enabled web site courtesy of the cookies from these sites or from Google ads on your machine.

  2. No global opt-out: There is no way to turn off this feature. The best you get is that when a site tries to publish an update to your news feed and mini-feed, you get an entry for the site added to your Privacy Settings for External Websites page on Facebook. I guess it never occured to Mark Zuckerburg and Justin Rosenstein that not sharing my purchase history with Facebook is a valid privacy option. Why do I have to police this list and refer back to it every couple of days to figure out if some new Web site is now publishing my private data to Facebook without my permission? 

    I expect that kind of myopia and hubris from the Googles and Microsofts of the world not Facebook. Wow, the honeymoon lasted shorter than I expected.

I suspect that Facebook will loathe fixing both issues. The first issue can’t really be solved by having partner sites provide an opt-in mechanism because there is the valid concern that (i) people won’t opt-in to the feature and (ii) the experience and messaging will vary too much from site to site for users to have a consistent set of expectations. This then points to Facebook having an opt-in page for partner sites that is part of the Facebook settings page for this feature but that may start getting away from the add 3 lines of code to reach millions of users sales pitch which they have going. Adding a global opt-out button is also similarly fraught with down side for Facebook.

At this point, they’ll have to do something. I’ll be impressed if they address both issues. Anything less is simply not good enough.

PS: The technically inclined folks in the audience should take a look at Jay Goldman’s excellent Deconstruction of the Facebook Beacon Javascript. Found via Sam Ruby.

Now playing: Eightball & MJG - Relax & Take Notes (feat. Project Pat & Notorious B.I.G.)


 

November 26, 2007
@ 01:56 PM

My weekend project was to read Dive Into Python and learn enough Python to be able to port Sam Ruby's meme tracker (source code) from CPython to Iron Python. Sam's meme tracker, shows the most popular links from the past week from the blogs in his RSS subscriptions. A nice wrinkle Sam added to his code is that more recent posts are weighted higher than older posts. So a newly hot item with 3 or 4 links posted yesterday ends up ranking higher than an item with 6 to 10 posts about it from five days ago. Using that sort of weighting probably wouldn't have occurred to me if I just hacked the feature on my own, so I'm glad I spent the time learning Sam's code.

There are a few differences between Sam's code and mine, the most significant being that I support two modes; showing the most popular items from all unread posts and showing the most popular items from the past week. The other differences mainly have to do with the input types (Atom entries vs. RSS feeds) and using .NET Libraries like System.Xml and System.IO instead of CPython libraries like libxml2 and blob. You can see the difference between both approaches for determining top stories on my feed subscriptions below

Top Stories in the Past Week

  1. Mobile Web: So Close Yet So Far (score: 1.71595943656)
  2. The Secret Strategies Behind Many "Viral" Videos (score: 1.52423410473)
  3. Live Documents is Powerful Stuff (score: 1.35218082421)

Top Stories in all Unread Posts

  1. OpenSocial (score: 5.0)
  2. The Future of Reading (score: 3.0)
  3. MySpace (score: 3.0) [Ed Note: Mostly related to OpenSocial]

As you can probably tell, the weighted scoring isn't used when determining top stories in all unread posts. I did this to ensure that the results didn't end up being to similar for both approaches. This functionality is definitely going to make its way into RSS Bandit now that I've figured out the basic details on how it should work. As much as I'd like to keep this code in Iron Python, I'll probably port it to C# when integrating it for a number of practical reasons including maintainability (Torsten shouldn't have to learn Python as well), performance and better integration into our application.

Working with Python was a joy. I especially loved programming with a REPL. If I had a question about what some code does, it's pretty easy to write a few one or two liners to figure it out. Contrast this with  using Web searches, trawling through MSDN documentation or creating a full blown program just to test the out some ideas when using C# and Visual Studio. I felt a lot more productive even though all I was using was Emacs and a DOS prompt. 

I expected the hardest part of my project to be getting my wife to tolerate me spending most of the weekend hacking code. That turned out not to be a problem because it didn't take as long as I expected and for the most part we did spend the time together (me on the laptop, her reading The Other Boleyn Girl, both of us on the sofa). 

There are at least two things that need some fine tuning. The first is that I get the title of the link from the text of the links used to describe it and that doesn't lead to very useful link text in over half of the cases. After generating the page, there may need to be a step that goes out to the HTML pages and extracts their title elements for use as link text. The second problem is that popular sites like Facebook and Twitter tend to show up every once in a while in the list just because people talk about them so much. This seems to happen less than I expected however, so this may not be a problem in reality.

Now I just have to worry about whether to call the button [Show Popular Stories] or [Show Most Linked Stories]. Thoughts?


import time, sys, re, System, System.IO, System.Globalization
from System import *
from System.IO import *
from System.Globalization import DateTimeStyles
import clr
clr.AddReference("System.Xml")
from System.Xml import *

#################################################################
#
# USAGE: ipy memetracker.py <directory-of-rss-feeds> <mode>
# mode = 0 show most popular links in unread items
# mode = 1 show most popular links from items from the past week
#################################################################

all_links = {}
one_week =  TimeSpan(7,0,0,0)

cache_location = r"C:\Documents and Settings\dareo\Local Settings\Application Data\RssBandit\Cache"
href_regex     = r"<a[\s]+[^>]*?href[\s]?=[\s\"\']+(.*?)[\"\']+.*?>([^<]+|.*?)?<\/a>"
regex          = re.compile(href_regex)

(popular_in_unread, popular_in_past_week) = range(2)
mode = popular_in_past_week

class RssItem:
    """Represents an RSS item"""
    def __init__(self, permalink, title, date, read, outgoing_links):
        self.outgoing_links = outgoing_links
        self.permalink      = permalink
        self.title          = title
        self.date           = date
        self.read           = read

def MakeRssItem(itemnode):
    link_node  = itemnode.SelectSingleNode("link")
    permalink  = link_node and link_node.InnerText or ''
    title_node = itemnode.SelectSingleNode("title")
    title      = link_node and title_node.InnerText or ''
    date_node  = itemnode.SelectSingleNode("pubDate")
    date       = date_node and DateTime.Parse(date_node.InnerText, None, DateTimeStyles.AdjustToUniversal) or DateTime.Now 
    read_node  = itemnode.SelectSingleNode("//@*[local-name() = 'read']")
    read       = read_node and int(read_node.Value) or 0
    desc_node  = itemnode.SelectSingleNode("description")
    # obtain href value and link text pairs
    outgoing   = desc_node and regex.findall(desc_node.InnerText) or []
    outgoing_links = {}
    #ensure we only collect unique href values from entry by replacing list returned by regex with dictionary
    if len(outgoing) > 0:
        for url, linktext in outgoing:
            outgoing_links[url] = linktext
    return RssItem(permalink, title, date, read, outgoing_links)   

if __name__ == "__main__":
    if len(sys.argv) > 1: #get directory of RSS feeds
        cache_location = sys.argv[1]
    if len(sys.argv) > 2: # mode = 0 means use only unread items, mode = 1 means use all items from past week
        mode           = int(argv[2]) and popular_in_past_week or popular_in_unread

    print "Processing items from %s seeking items that are %s" % (cache_location,
                                                                  mode and "popular in items from the past week"
                                                                  or "popular in unread items" )
    #decide what filter function to use depending on mode
    filterFunc = mode and (lambda x : (DateTime.Now - x.date) < one_week) or (lambda x : x.read == 0)
    #in mode = 0 each entry linking to an item counts as a vote, in mode = 1 value of vote depends on item age
    voteFunc   = mode and (lambda x: 1.0 - (DateTime.Now.Ticks - x.date.Ticks) * 1.0 / one_week.Ticks) or (lambda x: 1.0)

    di = DirectoryInfo(cache_location)
    for fi in di.GetFiles("*.xml"):     
        doc = XmlDocument()
        doc.Load(Path.Combine(cache_location, fi.Name))
        # for each item in feed       
        #  1. Get permalink, title, read status and date
        #  2. Get list of outgoing links + link title pairs
        #  3. Convert above to RssItem object
        items = [ MakeRssItem(node) for node in doc.SelectNodes("//item")]
        feedTitle = doc.SelectSingleNode("/rss/channel/title").InnerText
        # apply filter to pick candidate items, then calculate vote for each outgoing url
        for item in filter(filterFunc, items):
            vote = (voteFunc(item), item, feedTitle)
            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) == None:
                    all_links[url] = []
                all_links.get(url).append(vote)

       # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:               
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

    # output the results, choose link text from first item we saw story linked from
    print "<ol>"   
    for weight, link in weighted_links[:10]:
        link_text = (all_links.get(link)[0])[1].outgoing_links.get(link)
        print "<li><a href='%s'>%s</a> (%s)" % (link, link_text, weight)
        print "<p>Seen on:"
        print "<ul>"
        for weight, item, feedTitle in all_links.get(link):
            print "<li>%s: <a href='%s'>%s</a></li>" % (feedTitle, item.permalink, item.title)
        print "</ul></p></li>" 
    print "</ol>"


 

Categories: Programming

I've been reading Dive Into Python for a couple of days have gotten far enough to tell what the following block of code does if the object variable is an arbitrary Python object

methodList = [method for method in dir(object) if callable(getattr(object, method))]
    processFunc = collapse and (lambda s: " ".join(s.split())) or (lambda s: s)
    print "\n".join(["%s %s" %
                      (method.ljust(spacing),
                       processFunc(str(getattr(object, method).__doc__)))
                     for method in methodList])

However I've been having a tougher time than I expected with learning Python using Dive Into Python for a couple of reasons, some to do with how the book is laid out and some to do with my choice of IronPython as my Python implementation of choice. The main problems I've been having are

  • IronPython doesn't have a number of standard libraries that are used in the book such as os, xml.dom, UserDict, sgmllib and so on. This means I can't construct or run a number of the more advanced samples in the book.
  • The book seems to dive into meaty topics before covering the basics. For example, introspection (aka reflection) and lambda functions (aka anonymous methods)  are covered in Chapter 4 while defining classes and importing modules is covered in Chapter 5. Similarly, a discussion on for loops seems to have been removed from the most recent version of the book and replaced with a discussion on list comprehensions which are definitely superior but more alien to the novice.
  • I tend to learn comparatively. Thus I would love to get a more direct mapping of constructs in C# to those in Python with  discussions on Python features that don't exist in C#. This is the tack I took with C# from a Java Developers perspective and it seems a lot of people found that useful.

For these reasons, I thought it would be useful to create a set of tutorials for myself that would address some of these issues I've been having. Then I wondered if other C# developers wouldn't find such an article or tutorial useful as well. Are there any C# developers out there that would actually be interested in this or am I an edge case?

PS: I wouldn't expect to get time to actually writing, editing and publishing such tutorials until the summer of next year at the earliest (which I expect should coincide with the availability of IronPython 2.0). 


 

Categories: Programming

I’ve been working on simplifying my life and improving on my mental state over the last year or so. I’m now at the stage where I think I’ve gotten into a decent routine with diet and excercise. My next step [now that the rigors of buying the house and planning the wedding are over] is to broaden my programming horizons by learning a radically different programming language from my comfort zone, Python and C# respectively, while not harming my personal life or work habits.

It turns out that I can add an hour or two to my day by (i) leaving home earlier and thus avoiding traffic (ii) reading blogs less (iii) unsubscribing from most of the Microsoft internal mailing lists I was on and (iv) scheduling meetings so they are clumped together instead of having three meetings with 30 minutes in between each one thus burning up an hour of my time mentally twiddling my thumbs and checking email. 

So far I’ve installed IronPython and python-mode. I’ve also started reading Dive into Python and have gotten as far as Chapter 3. I’d just like to thank folks like Mark Pilgrim, Jim Hugunin and Barry Warsaw who are gifting programmers with such wonderful resources. Right now I’m still trying to wrap my mind around Everything is An Object

Everything in Python is an object, and almost everything has attributes and methods.

This is so important that I'm going to repeat it in case you missed it the first few times: everything in Python is an object. Strings are objects. Lists are objects. Functions are objects. Even modules are objects.

All functions have a built-in attribute __doc__, which returns the doc string defined in the function's source code. The sys module is an object which has (among other things) an attribute called path. And so forth.

So far this is already an enjoyable experience for someone who has mostly been programming in Javascript (not object oriented, dynamic but weakly typed) and C# (statically typed, no REPL) for the past few years.

Once I’m done reading Dive into Python, my plan is to integrate Sam Ruby’s MeMeme 2.0 into RSS Bandit. That way even though I’ve stopped reading blogs regularly, I don’t end up finding out days later that Visual Studio 2008 and .NET Framework 3.5 were released because it wasn’t on TechMeme or programming.reddit.  

Optimizing your life by writing code is fun. I guess this is what they call life hacking. Wink

Now playing: The Clash - Rock The Casbah


 

Categories: Programming

Via Steve Vinoski's Answers for Sergey I stumbled upon Sergey Beryozkin's Questions for Steve which started of with the following question

1. Do you think client code generation is evil ? If yes, do you expect people to do manual programming on a large scale ?

The problem with the software industry [or should I say humanity?] is that we like to take absolutist positions because they are easier to defend or argue against than admitting that the world is full of shades of gray.

Starting with the first question, code generation isn't necessarily bad let alone evil. However there are lots of problems with how code generation is implemented by the major vendors that support the SOAP/WSDL/XSD/WS-* family of technologies. Steve does a good job of laying out the problems with these approaches.

The first problem Steve points out is that a lot of these SOAP toolkits have implement some form of location transparency which tries to hide as much as possible the differences between invoking a remote system and calling a method on a local object. This behavior even flies in the face of SOA since one of the four tenets of service orientation is that boundaries are explicit. Another problem is that the inflexible and rigid requirements of static typing systems runs counter to the distributed and flexible nature of the Web. I posted a practical example a few years ago in my post entitled Why You Should Avoid Using Enumerated Types in XML Web Services. In that example, I pointed out that if you have a SOAP Web Service  that returns an enumeration with the possible value {CDF, RSS10, RSS20} and in a future release modify that enumeration by adding a new syndication format {CDF, RSS10, RSS20, Atom} then even if you never return that syndication format to old clients written in .NET, these clients will still have to be recompiled because of the introduction of a new enumeration value. I find it pretty ridiculous that till today I have list of "people we need to call and tell to recompile their code whenever we change an enum value in any of our SOAP Web Services". Of course, some of this foolishness can be mitigated in a statically typed system using technologies like Java's dynamic class loading but that is really just an insecure workaround that tries to hide the fact that what you really need here is a dynamically typed system. 

The second question is really asking whether we want developers writing XML processing code by hand instead of having a code generator do this work. Even though I used to work on the XML team at Microsoft, I do agree that it is a valid concern that you shouldn't want to spend a lot of effort writing code for parsing and processing XML if that is not the core function of your application. Again, Steve Vinoski hits the nail on the head with the suggestion to use standard data formats and MIME types. For example, if I decide to use application/atom+xml MIME type for the data that is returned by my RESTful Web service then clients can choose from a vast array of libraries for processing Atom feeds [such as the Windows RSS platform, ROME, Mark Pilgrim's Universal Feed Parser, etc] without having to write a lot of XML parsing code. If you must provide your own custom formats then it is imperative to make sure that it is easy to consume these formats from any platform by using a consistent and simple data model for the data format. A number of popular Web service APIs like the Flickr API, and the Facebook platform have provided client libraries for their APIs, this should be considered the exception and not the rule. Even in their case, it is interesting to note that a large proportion of the client libraries for these services are not actually maintained or developed by the creators of the service. This highlights the value of utilizing simple data formats and straightforward protocols. That way it isn't actually a massive undertaking for client developers to build and share libraries that abstract away the XML processing code. Of course, all of this can be avoided by just using standard MIME types and data formats that are already supported on a wide array of platforms instead of reinventing the wheel.


 

Categories: XML Web Services

In a comment on my previous post, pwb states

"REST" still suffers mightily from no real guidance on how exactly to construct something REST-ful. A dissertation doesn't cut it.

I guess it depends on what is meant by "real guidance". I can argue that there is no real guidance on how to construct object oriented systems, distributed applications or high performance data storage systems. Whether you agree or disagree with any of those statements depends on how "official" or "canonical" one expects such guidance to be.

If you are interested in building RESTful Web services. Here are the top 3 ways I suggest one learn about building and designing a RESTful system. 

  1. Learn the Atom Publishing Protocol: The best way to figure out how to build a RESTFul Web service is to actually use a well designed one and none is a better example of what a RESTful Web service should be than the Atom Publishing Protocol. For extra credit, you should also read up on Google's GData protocol and come up with a list of pros and cons of the approaches they've taken to extending AtomPub.

  2. Read Joe Gregorio's "Restful Web" column on XML.com: Joe Gregorio is one of the editors of RFC 5023 (the Atom Publishing protocol) and is currently employed at Google working to make GData even more Web friendly. He started a series of articles on XML.com on building and designing RESTFul Web services complete with code samples. The entire list of articles can be found here but if you don't have the time to read them all, I suggest starting with How to Create a REST Protocol which covers the four decisions you must make as part of the design process for your service 

    • What are the URIs? [Ed note - This is actually  "What are the resources?" ]
    • What's the format?
    • What methods are supported at each URI?
    • What status codes could be returned?

    If you have more time I suggest following that article up with Constructing or Traversing URIs? which contrasts client access models based on URI construction [where clients have baked into them knowledge of the URI structure of your service] or by URI traversal [where clients discover resources by following links either within the primary resources your service returns or via a service document that describes your service's end points]. Just because WSDL is a disaster doesn't mean that interface definition languages aren't useful or that they aren't needed in RESTful applications. Afterall even Atom has service documents

    And finally, because I don't believe you get to design a system without being familiar with what the code looks like and should do, I'd suggest reading Dispatching in a REST Protocol Application which walks through what the server side code for a particular RESTful service looks like. He even throws in some performance optimizations at the end.

    Of course, I know you're actually going to read all the articles in the series because you're a dutiful lil' Web head and not shortcut seeker. Right? :)

  3. Buy "RESTful Web Services" by Leonard Richardson, Sam Ruby, and David Heinemeier Hansson:  If you are the kind of person who prefers to have a book than learning from "a bunch of free articles on the Internet" then RESTful Web Services is for you. I haven't read the book but have seen it on the desks of more co-workers than I'd care to admit and each of them favorably recommended it.  Sam Ruby, Leonard Richardson and DHH know their stuff so you'll be learning at the feet of gurus. 

To me, this seems like an abundance of resources for learning about building RESTful Web services. Now I understand that there are some for whom until it shows up on MSDN, IBM Developer Works or in a Gartner Report it might as well not exist. To these people all I can say is that "It must be great to have all that free time now that you have outsourced your thinking and business critical analysis to someone else". :)  


 

Categories: XML Web Services