Two seemingly unrelated posts flew by my aggregator this morning. The first was Robert Scoble’s post The shy Mark Zuckerberg, founder of Facebook where he talks about meeting the Facebook founder. During their conversation, Zuckerburg admits they made mistakes with their implementation of Facebook Beacon and will be coming out with an improved version soon.

The second post is from the Facebook developer blog and it is the announcement of the JavaScript Client Library for Facebook API which states

This JavaScript client library allows you to make Facebook API calls from any web site and makes it easy to create Ajax Facebook applications. Since the library does not require any server-side code on your server, you can now create a Facebook application that can be hosted on any web site that serves static HTML.

Although the pundits have been going ape shit over this on Techmeme this is an unsurprising announcement given the precedent of Facebook Beacon. With that announcement they provided a mechanism for a limited set of partners to integrate with their feed.publishTemplatizedAction API using a Javascript client library. Exposing the rest of their API using similar techniques was just a matter of time.

What was surprising to me when reading the developer documentation for the Facebook Javascript client library is the following notice to developers

Before calling any other Facebook API method, you must first create an FB.ApiClient object and invoke its requireLogin() method. Once requireLogin() completes, it invokes the callback function. At that time, you can start calling other API methods. There are two way to call them: normal mode and batched mode.

So unlike the original implementation of Beacon, the Facebook developers aren’t automatically associating your Facebook account with the 3rd party site then letting them party on your data. Instead, it looks like the user will be prompted to login before the Website can start interacting with their data on Facebook or giving Facebook data about the user.

This is a much better approach than Beacon and remedies the primary complaint from my Facebook Beacon is Unfixable post from last month.

Of course, I haven’t tested it yet to validate whether this works as advertised. If you get around to testing it before I do, let me know if it works the way the documentation implies in the comments.


 

I’m not a regular user of Digg so I tend to ignore the usual controversy of the month style storms that end up on Techmeme whenever they tweak their algorithm for promoting items from the upcoming queue to the front page. Today I decided to take a look at the current controversy and clicked on the story So Called Top Digg Users Cry About Digg Changes which led me to the following comment by ethicalh

the algorithm uses the "social networking" part of digg now. if you are a mutual friend of the submitter your vote is cancelled and wont go towards promotion to the frontpage, and if you're a mutual friend of someone who has already dugg the story, your vote is cancalled and won't go towards promotion to the frontpage. so if you don't have a friends list and don't use the social networking part of digg then you can still be a top digger. you just need to create a new account and don't be tempted to add anyone as a friend, because the new algorithm is linked upto everyones friends list now. thats the real reason digg added social networking, is so they could eventually hook it upto the algorithm, thats the secret reason digg introduced social networking and friends lists onto the digg site.

A number of people confirmed the behavior in the comments. It seems that all votes from mutual friends (i.e. people on each other’s friends list) are treated as a single vote. So if we are three friends that all use Digg and have each other on our friends’ lists then if all three of us vote for a story, it is counted as a single vote. This is intended to subvert cliques of “friends” who vote up articles and encourage diversity or so the Digg folks claim in a quote from a New York Times article on the topic.

My first impression was that this change seems pretty schizophrenic on the part of the Digg developers. What is the point of adding all sorts of features that enable me to keep tabs on what other Digg users are voting up if you don’t want me to vote up the ones I like as well? Are people really supposed to trawl through http://www.digg.com/tech_news/upcoming to find stories to vote up instead of just voting up the ones their friends found that they thought were cool as well? Yeah…right.

But thinking deeper, I realize that this move is even more screwed up. The Digg developers are pretty much penalizing you for using a feature of the site. You actually less value out of the site by using the friends feature. Because once you start using the friends feature, it is less likely that stories you like will be voted to the front page due to the fact that your vote doesn’t count if someone on your friends list has already voted for the story.

So why would anyone want to use the friends feature once they realize this penalty exists?

Now playing: Dogg Pound - Ridin', Slipin' & Slidin'


 

Categories: Social Software

According to the blog post entitled on Microsoft Joins DataPortability.org on dev.live.com we learn

Today Microsoft is announcing that it has joined DataPortability.org, a group committed to advancing the conversation about the portability, security and privacy of individuals’ information online.  There are important security and privacy issues to solve as the internet evolves, and we are committed to being an integral part of the industry conversation on behalf of our users.

The decision to join DataPortability.org is an outgrowth of a deeper theme that technology and the internet should be deployed to help people be at the center of their online worlds, a theme that has begun to permeate our products and services over the past few years. We believe the logical evolution of the internet is to enable the removal of barriers to provide integrated, seamless experiences, but to do so in a manner that ensures that users retain full control over the security and privacy of their information.

Windows Live is focused on providing tools and a platform to enable these types of seamless experiences.  Windows Live has more than 420 million active Live IDs that work across our services and across partner sites. 

I’m sure some folks are wondering exactly what this means. Even though I was close to the decision making around this, I believe it is still too early to tell. Personally, I share Marc Canter’s skepticism about Dataportability.org given that so far there’s been a lot of hype but no real meat.

However we have real problems to solve as an industry. The lack of interoperability between various social software applications is troubling given that the Internet (especially the Web) got to be a success today by embracing interoperability instead of being about walled gardens fighting over who can build the prettiest gilded cage for their prisoners customers. The fact that when interoperability happens, it is in back room deals (e.g. Google OpenSocial, Microsoft’s conversations with startups, etc) instead of being open to all using standard and unencumbered protocols is similarly troubling. Even worse, insecure practices that expose social software users to privacy violations have become commonplace due to the lack of a common framework for interoperability.

As far as I can tell, Dataportability.org seems like a good forum for various social software vendors to start talking about how we can get to a world where there is actual interoperability between social software applications. I’d like to see real meat fall out of this effort not fluff. One of the representatives Microsoft has chosen is the dev lead from the product team I am on (Inder Sethi) which implies we want technical discussion of protocols and technologies not just feel good jive. We’ll also be sending a product planning/marketing type as well (John Richards) to make sure the end user perspective is also being covered. You can assume that even though I am not on the working group in person, I will be there in spirit since I communicate with both John and Inder on a regular basis. Smile 

I’ll also be at the O’Reilly offices during Super Bowl weekend attending the O’Reilly Social Graph FOO Camp which I hope will be another avenue to sit together with technical decision makers from the various major social software vendors and talk about how we can move this issue forward as an industry.

Now playing: Bone Thugs 'N Harmony - If I Could Teach The World


 

Categories: Windows Live

The top story in my favorite RSS reader is the article MapReduce: A major step backwards by David J. DeWitt and Michael Stonebraker. This is one of those articles that is so bad you feel dumber after having read it. The primary thesis of the article is

 MapReduce may be a good idea for writing certain types of general-purpose computations, but to the database community, it is:

  1. A giant step backward in the programming paradigm for large-scale data intensive applications
  2. A sub-optimal implementation, in that it uses brute force instead of indexing
  3. Not novel at all -- it represents a specific implementation of well known techniques developed nearly 25 years ago
  4. Missing most of the features that are routinely included in current DBMS
  5. Incompatible with all of the tools DBMS users have come to depend on

One of the worst things about articles like this is that it gets usually reasonable and intelligent sounding people spouting of bogus responses as knee jerk reactions due to the articles stupidity. The average bogus reaction was the kind by Rich Skrenta in his post Database gods bitch about mapreduce which talks of "disruption" as if Google MapReduce is actually comparable to a relation database management system.

On the other hand, the good thing about articles like this is that you get often get great responses from smart folks that further your understanding of the subject matter even though the original article was crap. For example, take the post from Google employee Mark Chu-Carroll entitled Databases are Hammers; MapReduce is a ScrewDriver where he writes eloquently that

The beauty of MapReduce is that it's easy to write. M/R programs are really as easy as parallel programming ever gets. So, getting back to the article. They criticize MapReduce for, basically, not being based on the idea of a relational database.

That's exactly what's going on here. They've got their relational databases. RDBs are absolutely brilliant things. They're amazing tools, which can be used to build amazing software. I've done a lot of work using RDBs, and without them, I wouldn't have been able to do some of the work that I'm proudest of. I don't want to cut down RDBs at all: they're truly great. But not everything is a relational database, and not everything is naturally suited towards being treated as if it were relational. The criticisms of MapReduce all come down to: "But it's not the way relational databases would do it!" - without every realizing that that's the point. RDBs don't parallelize very well: how many RDBs do you know that can efficiently split a task among 1,000 cheap computers? RDBs don't handle non-tabular data well: RDBs are notorious for doing a poor job on recursive data structures. MapReduce isn't intended to replace relational databases: it's intended to provide a lightweight way of programming things so that they can run fast by running in parallel on a lot of machines. That's all it was intended to do.

Mark’s entire post is a great read.

Greg Jorgensen also has a good rebutal in his post Relational Database Experts Jump The MapReduce Shark which points out that if the original article had been a critique of a Web-based structured data storage systems such as Amazon’s SimpleDB  or Google Base then the comparison may have been almost logical as opposed to being completely ridiculous. Wink

Now playing: Marvin Gaye - I Heard It Through the Grapevine


 

Categories: Web Development

Ari Steinberg who works on the Facebook developer platform has a blog post entitled New Rules for News Feed which states

As part of the user experience improvements we announced yesterday, we're changing the rules for how Feed stories can be published with the feed.publishTemplatizedAction API method. The new policy moving forward will be that this function should only be used to publish actions actively taken by the "actor" mentioned in the story. As an example, feed stories that say things like "John has received a present" are no longer acceptable. The product motivation behind this change is that Feed is a place to publish highly relevant stories about user activity, rather than passive stories promoting an application.

To foster this intended behavior, we are changing the way the function works: the "actor_id" parameter will be ignored. Instead the session_key used to generate the feed story will be used as the actor.


In order to ensure a high quality experience for users, starting 9am Pacific time Tuesday 22 January we may contact you, or in severe cases initiate an enforcement action, if your stories are not complying with the new policy, especially if the volume of non-complying stories is high.

If you are not a developer using the Facebook platform, it may be unclear what exactly this announcement means to end users or applications that utilize Facebook’s APIs.

To understand the impact of the Facebook announcement, it would be useful to first talk about the malicious behavior that Facebook is trying to curb. Today, an application can call feed.publishTemplatizedAction and publish a story to the user’s Mini-feed (list of all the user’s actions) which will also show up in the News Feed of the users friends. Unfortunately some Facebook applications have been publishing stories that don’t really correspond to a user taking an action. For example, when a user installs the Flixster application, Flixster not only publishes a story to all of the user’s friends saying the user has installed the application but also publishes a story to the friends of each of the user’s friends that also have Flixster installed. This means my friends get updates such as

being sent to my friends when I wasn’t actually doing anything with the Flixster application. I don’t know about you but this seems like a rather insiduous way for an application to spread “virally”.

Facebook’s attempt to curb such application spam is to require that an application have a session key that identifies the logged in user when publishing the story which implies that the user is actually using the application from within Facebook when the story is published. The problem with this remedy is that it totally breaks applications that publish stories to the Facebook News Feed when the user isn’t on the site. For example, since I have the Twitter application installed on Facebook, my Facebook friends get an update sent to their News Feeds whenever I post something new on Twitter.

The problem for Facebook is that by limiting a valid usage of the API, they may have closed off a spam vector but have also closed off a valuable integration point for third party developers and for their users.

PS: There might be an infinite session key loophole to the above restriction which I’m sure Facebook will close off if apps start abusing it as well.

Now playing: Silkk the Shocker - It Ain't My Fault


 

I’ve finally broken down and used the Extension methods feature in C# 3.0. The feature is similar to the concept of “open classes” in Ruby described in Neal Ford’s post Are Open Classes Evil? which contains the following excerpt

Open classes in dynamic languages allow you to crack open a class and add your own methods to it. In Groovy, it's done with either a Category or the Expando Meta-class (which I think is a great name). JRuby allows you to do this to Java classes as well. For example, you can add methods to the ArrayList class thusly:

require "java"
include_class "java.util.ArrayList"
list = ArrayList.new
%w(Red Green Blue).each { |color| list.add(color) }

# Add "first" method to proxy of Java ArrayList class.
class ArrayList
def first
size == 0 ? nil : get(0)
end
end
puts "first item is #{list.first}"


Here, I just crack open the ArrayList class and add a first method (which probably should have been there anyway, no?).

In my case, I added a CanonicalizedUri() method the System.Uri class because I was tired of how we had to have all sorts of special case code to deal with canonicalizing URIs because Uri.AbsoluteUri property would not represent http://www.example.com and http://www.example.com/ as the same URI and a couple of other special cases.  My extension method is shown below

/// <summary>

/// Helper class used to add extension method to the System.Uri class.

/// </summary>

public static class UriHelper

{

/// <summary>

/// Returns a the URI canonicalized in the following way. (1) if the file is a UNC or file URI then it only returns the local part.

/// (2) for Web URIs it removes trailing slashes and preceding "www."

/// </summary>

/// <param name="uri">The URI to canonicalize</param>

/// <returns>The canonicalized URI as a string</returns>

public static string CanonicalizedUri(this Uri uri)

{

if (uri.IsFile || uri.IsUnc)

return uri.LocalPath;

UriBuilder builder = new UriBuilder(uri);

builder.Host = (builder.Host.ToLower().StartsWith("www.") ? builder.Host.Substring(4) : builder.Host);

builder.Path = (builder.Path.EndsWith("/") ? builder.Path.Substring(0, builder.Path.Length - 1) : builder.Path);

return builder.ToString().Replace(":" + builder.Port + "/", "/");

}

}

Now everywhere we used to have special case code for canonicalizing a URI, we just replace that with a call to uri.CanonicalizedUri(). It’s intoxicating how liberating it feels to be able to “fix” classes in this way.  

I have seen some complain that coupling this feature with Intellisense (i.e. method name autocomplete) leads to an overwhelming experience. Compare the following screenshots from Jessica Folser’s post using System.Linq, sometimes more isn't better which shows the difference between hitting ‘.’ on an Array with the System.Linq namespace included versus not. Note that the System.Linq namespace defines a number of extension methods for the Array class and its base classes.

WITHOUT SYSTEM.LINQ (20 methods)
Without System.Linq, Array has 20 methods

USING SYSTEM.LINQ (68 methods)

Using System.Linq, Array has 68 methods

I did find this disconcerting at first but I got used to it.

It should be noted that “open classes” in Ruby come with a bunch more features than extension methods in C# 3.0. For example, Ruby has remove_method and undef_method which allows developers to remove methods from a class. This is particularly useful if there is a particularly buggy or insecure method you’d rather was not used by developers in your project. Much better than simply relying on the Obsolete attribute. Smile

One problem I had with C# is that I can’t create an extension property only methods (so my CanonicalizedUri() had to be a method not a property like Uri.AbsoluteUri).  I assume this is due to difficulty in coming up with a backwards compatible extension to the syntax for properties. Either way, it sucks. You can count me as another developer who is voting for extension properties in C# 4.0 (or whatever comes next).

Now playing: Oomp Camp - Intoxicated


 

Categories: Programming

I was reading the blog post entitled The hard side of Mister Softie from Josh Quittner of Fortune magazine which ends with the following excerpt

Hall said that Microsoft’s main concern, and the reason it sent out Big Foot letters in the first place, was security. “If you look at what a number of sites are doing, they’re asking for your Hotmail login info, They’re storing your identity, which is not a best practices [approach] for anyone’s data from a security standpoint. We want to make sure our data is kept between our users and our servers.”

The thrust of the term sheets, he said, was to create a process whereby Hotmail and other Windows Live data could be shared securely with third parties. Added Hall: “There are models for federation where you can trust other services—and that’s what we’re trying to do with our partners.”

Thats what doesn’t make sense to me. If this is such a security problem, why do Google and Yahoo let their users take their contacts with them?

Besides the obvious observation that folks at Google & Yahoo! probably don’t think it’s a good idea for random fly-by-night social networking services to be collecting  usernames and password from users of their services (see posts like Spock sign-up flow demonstrates how to scare users away... from Jeremy Zawodny of Yahoo!), I am amused by the “if the geniuses at Google and Yahoo! think it’s OK, who are the Microsoft morons to think different” sentiment exposed by that statement.

Maybe I’m getting snarky in my old age. Wink

Now playing: Red Hot Chili Peppers - Torture Me


 

Categories: Social Software | Windows Live

From danah boyd: confused by Facebook

I'm also befuddled by the slippery slope of Facebook. Today, they announced public search listings on Facebook. I'm utterly fascinated by how people talk about Facebook as being more private, more secure than MySpace. By default, people's FB profiles are only available to their network. Join a City network and your profile is far more open than you realize. Accept the default search listings and you're findable on Google. The default is far beyond friends-only and locking a FB profile down to friends-only takes dozens of clicks in numerous different locations. Plus, you never can really tell because if you join a new network, everything is by-default open to that network (including your IM and phone number).

From Caroline McCarthy: Report: Facebook threatens to ban Gawker's Denton

Facebook isn't too happy with Gawker Media founder Nick Denton over some screenshots of a member's profile that he posted on Gawker.com on Tuesday, Portfolio.com reports. The social-networking site reportedly plans to send a warning letter to the New York-based digital-media entrepreneur citing several terms-of-service violations--one more, and he's out.

Facebook representatives were not immediately available for comment.

On Tuesday, Denton--who took over as managing editor of Gawker.com this month after several staff departures--posted a bit of an expose on 25-year-old Emily Brill, daughter of New York publishing figure Steve Brill. Screenshots of the younger Brill's Facebook profile, featuring glamorous photos of a yachting trip to the British Virgin Islands, as well as excited "status" messages about an impending trip to the Caribbean luxury getaway of St. Barth's, were juxtaposed with an older photograph of the Brown graduate when she was significantly heavier.

It's not clear whether Denton and Brill are "friends" on the site, or if it was even Denton (rather than a source or another Gawker Media employee) who pulled the screenshots from Facebook. But both Denton and Brill are members of the New York regional network, so there is a chance that Denton would have been able to view Brill's profile even without being connected as friends.

It boggles my mind that someone sat down and coded “Anyone who lives in the same city as me” as a privacy control and didn’t immediately smack themselves on the head for writing something so ridiculously useless and that is guaranteed to cause privacy issues. 

It would have been easier to have a notion of public profiles and appropriate scary warnings or defaults that protected people’s privacy than the farce that is “regional networks”.

Now playing: Chris Brown - Say Goodbye


 

Categories: Social Software

I’ve noticed a meme that seems to have been going around in various blogs of folks who continue to indulge in the long since dead REST vs. SOAP discussion. This meme is that (i) you don’t want or need an interface definition language if you are building a RESTful Web Service and (ii) an interface definition language for RESTful Web Services has to look something like CORBA IDL or WSDL.

You can find examples of this kind of thinking in blog posts like Steve Vinoski’s IDLs vs. Human Documentation post excerpted below

Note that Patrick mostly talks about data schemas, whereas my posting talks only of interface definition languages. These are two very different things, which I’ve noted in comments on his blog. In a reply comment he said they’re both metadata, which is true, but still, they’re very separable. REST depends heavily on data definitions, but it doesn’t require specialized interface definitions because it promotes a uniform interface. For data definition REST relies on and promotes media/MIME types, and the standardization of such data definitions is critical to allowing independently-developed consumers and providers to interact correctly. I doubt Patrick and I really disagree on this last point.

and Ryan Tomayko’s Speaking of, "lying through their teeth..." also excerpted below

The WS-* folks have historically been obsessed with making things easy, usually for an imaginary business analyst who is nowhere near as technically adept as they. The REST folks, on the other hand, seem much more interested in keeping the entire stack simple, and for everyone involved.

This difference in priorities (easy vs. simple) often manifests itself in arguments about technological issues on the surface. Take the never ending debate about whether REST needs a description language like WSDL; which, incidentally, Sanjiva is largely responsible for. If building systems in your world can be made easier with the addition of a description language, then WSDL probably makes a lot of sense. If, however, building distributed systems in your world is a tediously hard pain in the ass whether you have these cockamamie description files or not, well, then you fight to keep the system as simple as possible by reducing the number of actors, dependencies, and concepts to an absolute minimum.

Let’s start with Steve Vinoski’s post. Steve is right to point out that there is a difference between data schemas and interface definitional languages. When building services with WS-*, you have a WSDL to describe your methods & expected inputs/outputs and XSD schema(s) to describe the schemas for said inputs/outputs. When building a RESTful Web Service, the need for both of these documents does not go away regardless of how often you repeat the phrase “uniform interface”.

Steve argues that instead of using an XML schema to describe your document formats, you should rely on registered MIME types. The benefit of this is that you’ve broadened your horizon from thinking that the only payload for your Web service can be XML documents. The WS-* folks had to jump through lots of mental hoops to try and get non-XML data to fit in their model with wacky schemes like SOAP with Attachments (SwA), Direct Internet Message Encapsulation (DIME), WS-Attachments, Message Transmission Optimization Mechanism (MTOM) and XML-binary Optimized Packaging (XOP). All of this complexity existed because of the fundamental design of WS-* is that all data going in and out of a SOAP Web service must either be an XML document or modelled as an XML document. 

However this doesn’t mean everything is plain sailing if you stick to only using registered IANA MIME types as the payloads of your Web services. What happens when you have a document format that doesn’t have a registered MIME type? You have two choices, you can either co-opt an existing MIME type and use it as an envelope format as Google has done with GData or you can use your own custom XML format as Facebook has done with the Facebook REST API. In both cases, it would be useful for developers if your data schema is documented either in prose or via some XML schema language. This doesn’t require an interface definition language like WSDL nor does it require a schema definition language like XSD.  

On the other hand, how does a client application discover your application’s service end points? Today, when you point your browser to my blog at http://www.25hoursaday.com/weblog, your browser automatically detects that I have an RSS feed. When you  point Windows Live Writer to a weblog, it automatically detects how to post and edit blog posts programmatically if your weblog software supports the Atom Publishing Protocol.

In the case of the RSS feed, your browser knows I have one by looking at the link element pointing to my RSS feed. The browser knows what to do with the file via the MIME type and there is no interface to be defined because the only contract of an RSS feed is that it should support HTTP GET. On the flip side an Atom service document, which is what Windows Live Writer reads to learn about your blog describes, describes the various service end points (i.e. collections) as well as the accepted inputs/outputs (either as MIME types or the hardcoded string ‘entry’ for Atom entries since they don’t have a MIME type).

The examples of Atom service documents and link elements in HTML, highlight that there is real world value in describing the interfaces to your RESTful Web Service. In addition, Atom service documents show that you can define an interface definition language for Web services without resorting to reinventing CORBA IDL (i.e. WSDL). So I respectfully disagree with Ryan Tomayko…just because my life is made easier with a service description language doesn’t make WSDL a good idea.

Now playing: Dead Prez - Hell Yeah (Pimp the System)


 

Categories: XML Web Services

In his blog post entitled Joining Microsoft Live Labs Greg Linden writes

I am starting at Microsoft Live Labs next week.

Live Labs is an applied research group affiliated with Microsoft Research and MSN. The group has the enjoyable goal of not only trying to solve hard problems with broad impact, but also getting useful research work out the door and into products so it can help as many people as possible as quickly as possible.

Live Labs is lead by
Gary Flake, the former head of Yahoo Research. It is a fairly new group, formed only two years ago. Gary wrote a manifesto that has more information about Live Labs.

when I found out Greg was shutting down Findory I thought myself that he’d be a great hire for Microsoft especially since he already lived in the area. It seems someone else though the same thing and now Greg has been assimilated. Congratulations, Greg.

I seem to be bumping into more and more people who are either working for or with Live Labs. Besides Justin Rudd who I just referred to the team, there’s Mike Deem and Erik Meijer, two people I know from my days on the XML team. I wonder what Gary Flake is cooking up in those swanky offices in Bellevue that has so many smart folks gravitating to his group?

Now playing: Kool & The Gang - Celebration


 

It seems I missed the news when this happened last week but according to Greg Reinacker’s post NewsGator’s RSS clients are now free! 

We’ve got a lot of big news today at NewsGator.

First, we’ve got new releases of our most popular applications: FeedDemon 2.6, NetNewsWire 3.1, Inbox 3.0 (beta), and NewsGator Go! for Windows Mobile 2.0. Each of these is a pretty major release on its own - tons of new features in all of them.

But second, every one of those applications is now free! Free as in beer, that is. And add to the free list NewsGator Go! for BlackBerry as well. And not only are they free, but our online services (including synchronization) are now free as well! Not to mention our iPhone reader, HTML mobile reader, and all of the other applications that are part of our online platform.

According to Greg and Nick Bradbury, the reason they are doing this is because the bulk of their profits/revenues come from selling Enterprise licenses and the desktop readers are now being used as advertising to get enterprise customers.

They also mention that the other reason they are giving away their desktop application is that they see a lot of financial value from collecting information about what feeds their users are reading. My assumption was that this is because the demographic data is being resold to marketers although both Greg and Nick make it seem like the collection of this data is benign and only used for end user facing features. 

Anyway, this is pretty great news for fans of desktop RSS readers. If I didn’t already have RSS Bandit, FeedDemon would be my first choice when it comes to a desktop RSS reader for Windows. I also like the fact that we get a shout out as part of the setup experience for FeedDemon which is shown below

Thanks for the shout out Nick. Smile

Now that these apps are free, it does encourage us to step our game up with RSS Bandit. Right now, my thinking is that official version number for the release currently codenamed Phoenix will be RSS Bandit 2.0. This release will be deserve the monicker “2.0” for two reasons

  1. The user interface will be completely rewritten from scratch by Oren and Torsten using WPF.
  2. The application will become capable of being a full-blown desktop client to both Google Reader and NewsGator Online.

If I seem to have been blogging less, it is because I’ve been spending more time reading about code, thinking about code and writing code in my free time. I can’t wait to get the first beta out to you guys in a few months.

Now playing: Kid Rock - Welcome To The Party (Ode 2 the Old School)


 

Last year was the year of big changes in my personal life. I bought a house, got married and brought a very cute and lovable Shih Tzu into our household. Some time during 2007, I realized I'd been at Microsoft for over 5 years and decided that I'd also look for change in my professional life as well.

I learned a couple of lessons from the experience. The first was that looking around for a job while trying to buy a house, moving into a new home and working towards getting married is pretty stressful. The second thing I learned was that I hadn't really thought about what I want from my career in several years. Back in my college days, I had a clear idea where I wanted to be within my first year of graduation and every thing I did back then moved me closer to that goal, from the classes I took to ensuring that I interned every summer break. Since then, I haven't really had a "five year plan" to get me to the next stage in my career. I now have a much clearer idea where I want to be by 2010 than I have in the past two or three years. Finally, I realized that I actually really like working at Microsoft especially within my current job.

Ever since I came to that final realization I've wanted to blog about why this is the case but it seemed like such a corny thing to write about that I didn't want people reading this to think I was shilling for Microsoft. However this morning I was reading a blog post entitled Gone Indie by Jens Alfke which explained why he was leaving Apple Inc. after ten years and a lot of the reasons he is leaving are the same reasons I'm still at Microsoft. 

Social Software

Jens wrote

But I’m fascinated with social software. Apple isn’t. Despite some promising starts, the most I’ve been able to get accomplished in that vein at Apple was iChat [the IM part; I’m really not interested in videoconferencing], Safari RSS, and the “PubSub” [which turned out to be “RSS and Atom”] framework. There were some very promising prototypes of sexier things, but I really can’t talk about those, other than to say that they were canceled.

I looked around after Leopard was finished, and didn’t see any place in the company where I could pursue my ideas. It would have meant evangelizing reluctant executives into sharing my vision … and that’s something that I know I have little talent at.

I am similarly fascinated by Social Software and have been since I wrote down my epiphany Social Software is the Platform of the Future after a couple of conversations with my friend Mike Vernal. This epiphany is the reason I decided to start working in Microsoft's social graph contacts platform team which is where I continue to work till this day. Three and a half years later, this epiphany has been borne out by the rise of MySpace and Facebook as well as the realization by the technocrat masses that without data portability social software is the new vendor lock-in. This is all stuff Mike and I used to chat about back in 2004, and now Mike is off at Facebook and I'm here at Microsoft trying to make our visions a reality in our own little ways.

Unlike Jens, I don't have to evangelize reluctant execs into sharing my vision. A lot of our execs understand the importance of social software and have clear ideas of how Microsoft can add value to our users lives with our contributions to this space. When I talk to folks like Ray Ozzie, Chris Jones or David Treadwell about some of the problems I see in the social software space today, not only do they get it, I always leave the conversation with a strong sense that Microsoft will do the right thing.

Some people may criticize Microsoft for not being quick to jump onto every fad. However as Phil Haack mentioned in his blog post about his first few days as a new Microsoft hire, Microsoft invests for the long run and expects it's employees to think deeply about issues before acting. At the end of the day, the software we build in Windows Live impacts how hundreds of millions of people interact, share and communicate with their friends, family and loved ones. We endeavor to be good stewards of the trust they've placed in us.

Sharing Your Ideas

Jens wrote

I tend to have a lot of ideas. I’m not bragging, and that’s not always a good trait; it can be hard for me to focus on something long enough to finish it. A structured job has helped me stay on-task. On the other hand, though, the development cycle in a big company is such that every significant idea takes a year or more to finish, and during that time, more ideas pile up in my brain.

That wouldn’t be bad if there were some other channels to express those ideas. And if they took the form of songs, or novels, or scrimshaw carvings of Biblical scenes on walrus tusks, I could do whatever I wanted with them. But on software, Apple’s position (not unusually for the industry) is “All Your Idea Are Belong To Us”, and I signed onto that when I accepted the job offer. In other words, anything I do that relates in any way to Apple’s areas of business, no matter when or where I do it, belongs to Apple. [Edit: Ha! Note I’m still using present tense.]

(Again, this isn’t something particular about Apple. Most tech companies are like this, and if you work for one, you probably signed a very similar “Proprietary Rights Agreement” that they hid in the stack of paperwork beneath your offer letter. And yes, companies will enforce that if they see profit in it.)

I believe all Microsoft employees sign similar agreements with the company when hired. However, Microsoft is very good about letting employees explore their ideas in software on their own time without getting in the way. Projects like Script#, Reflector, RSS Bandit, DasBlog, Tweak UI and WiX are examples of software projects either developed or maintained by Microsoft employees in their free time that are now benefiting thousands to hundreds of thousands of end users.

However I think that more important than being able to share our ideas in code, being able to share our ideas in words is one of the coolest things about working at Microsoft. Thousands of Microsoft employees share their ideas with their coworkers, competitors and customers via blogs on a daily basis. Lots of companies would clamp down on that sort of behavior and ensure that only sanctioned company positions go out in employee communications but not Microsoft.

Even more surprisingly, Microsoft tolerates employees that may have ideas that differ from the company's ideas of how things should be done. You may wonder why that is surprisingly until you remember that even supposedly enlightened "Web 2.0" companies like Friendster and Google can fire you for disagreeing with the company's technology choices or hinting about future products or complaining about the company's benefits.

A lot of people [including Microsoft employees] wonder how I still have a job at Microsoft even though I've been critical of some of the company's strategies and products in my almost six years as an employee. Although I've had conversations with peers, middle managers and senior execs about my blog, I've never felt that my job was in danger. If anything, I've had it confirmed that Microsoft's culture is about being open and respectful. The one thing I have tried to change about my blog [and in fact all my communications] is being more respectful of other's perspectives and personal feelings especially when I disagree with them since you catch more flies with honey than with vinegar...or so I heard.

Individuality

Jens wrote

Finally — and this may seem petty — Apple’s lack of individuality bugs me. I don’t mean internally: within the company, communication is reasonably open (modulo confidentiality issues) and there’s lots of room for self-expression. But ever since the return of Steve Jobs, the company has been pretty maniacal about micro-managing its visible face, to make it as smooth and featureless as an iPod’s backside. (In my darker moments I’ve compared it to the brutal whiteness of “THX-1138”.)

It’s deeply ironic: For a company that famously celebrates individuality and Thinking Different, Apple has in the past decade kept its image remarkably impersonal. Other than the trinity who go onstage at press events — Steve Jobs, Jonathan Ive, Phil Schiller — how many people can you name who work for Apple? How many engineers?
...
And then there are blogs. Apple doesn’t like them, not when they talk about it. (Big surprise.) I’ve heard it said that there are hardly any bloggers working at Apple; there are actually a lot more than you’d think, but they mostly keep it a secret. (I could out a few people, including at least one director…) I think Apple’s policy on blogging is one of the least enlightened of major tech companies; Microsoft in particular is surprisingly open.

There really isn't much more I can add to that. The fact that you are reading my blog and know who I am is a testament to how much Microsoft encourages it's employees to express their individuality in their products and in our communications with our customers.

This may not be a big deal in 2008 when everyone is blogging but it was back in 2003 when the early community of Microsoft bloggers could all fit at a table in a single restaurant. Especially since when you consider it, Microsoft bloggers are probably a large part of the reason corporate blogging is mainstream today. That alone is a worthy legacy in my book.

I'd like to leave you with this image from Scott Hanselman's post about joining Microsoft. Everyone's goal should be trying to get to center of the picture.

Now Playing: Wu-Tang Clan - Can It Be All So Simple


 

Categories:

Yesterday:

After publishing an invitation to Facebook to join the DataPortability Working Group January 4, we never thought that Facebook would accept it. Today changes everything you’ve ever thought about social-networking data and lock-in before, because today Facebook, Google and Plaxo have joined the DataPortability Workgroup.

Today:

Joining the cavalcade of companies jumping on the open data bandwagon, LinkedIn has now joined Facebook, Google, Plaxo (announcement here) in joining the DataPortability Work Group.

LinkedIn has worked hard to become open since announcing their own open platform in June 2007 in response to Facebook, then becoming an initial OpenSocial launch partner in October 2007.

Update: Web developers from Flickr, SixApart, and Twitter have also joined.

Some folks at work asked me what I thought of DataPortability.org. The short version is that it reminds me of AttentionTrust, long on intention and short on implementation.

Principles are nice but working code is even better. 

Now playing: Aerosmith - Eat the Rich


 

Categories: Social Software

Brad Fitzpatrick, founder of Livejournal, has a blog post entitled Naming twins in Python & Perl where he writes

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.
I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.
Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:

#!/usr/bin/python

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    letters.sort()
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []
    by_anagram[sorted_name].append(name)

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:
        continue

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy

You can guess where this is going. Below is my solution in C# 3.0

using System;
using System.Linq;
using System.IO;

namespace Name_agram{
    class Program{
        static void Main(string[] args){
            var names    = from line in File.ReadAllLines("dist.male.first.txt")
                           let name = line.Split(' ')[0]
                           group name by new string(name.ToCharArray().OrderBy(x=>x).ToArray()) into anagrams
                           select anagrams;
                                        
            foreach (var list_of_names in names) {
                if (list_of_names.Count() > 1)
                    Console.WriteLine(String.Join(" ",list_of_names.ToArray()));
            }
            Console.ReadLine();
        }
    }
}

There are a ton of solutions in various languages in the comments to Brad's post. Again, it is startling to see how different an idiomatic C# 3.0 solution using LINQ is from the traditional imperative/procedural style of programming.

Now Playing: Rick James - Give It To Me Baby


 

Categories: Programming

This is likely my last post in Robert Scoble vs. Facebook saga but I think there are some subtle points being lost because of the typical blog feeding frenzy where people either choose to flame Facebook, Scoble or both. Robert Scoble has a post entitled Plaxo: the social monster? where he writes

Judi Sohn rips into the trustworthiness of both me and Plaxo for attempting to import email addresses, names, and birthdays.
...
What if I wrote down Judi’s email and then manually put it into my Outlook’s contact database. Wouldn’t that have been exactly the same thing that I tried to do with Plaxo’s script?

There are a couple of things wrong with Robert's analogy.

When I entire my personally identifiable information (PII) into Facebook, I am entering into a social contract with two entities. I am trusting Facebook to protect my data so it is safe from malicious hackers and not sell it to malicious third parties like spammers or telemarketers, in return I provide Facebook with accurate data which improves their service and the user experience of the people in my social network.  In addition, I am implicitly trusting the people in my social network not to abuse the privilege of having my personal information (e.g. by prank calling my cell phone, giving my personal details to third parties I don't trust).

There is a key difference between Robert taking my personal information I shared with him on Facebook and importing into Outlook versus importing it into Plaxo Pulse. In the former case, Robert is taking data I shared with him and viewing it in a different application. In the latter case, Robert is additionally sharing my personal details with a corporate entity; Plaxo, Inc. This is an entity that is synonymous with spam and at the time of writing this post there 209,000 hits returned for a search for "Plaxo Spam" on the Google search engine. This is the key difference between Robert importing my personal details into Outlook and importing it into Plaxo Pulse.

Lots of geeks have focused on the fact that since it was possible for Robert to manually extract this data, then then people sharing data with him shouldn't complain since they gave him access to the data. This ignores the fact that just because something is technically possible doesn't make it right even if it is legal. Just because it is technically possible for you to read the RSS feed for my blog and republish it on a splog so you can make money from AdSense ads doesn't make it right. Just because it is technically possible for you to view my photo albums on Windows Live Spaces doesn't mean I'd think it was OK to use Omar's Send to Smugmug script to republish these photos on Smugmug. Just because you have my phone number doesn't mean I think it is OK for you to share it with all your drinking buddies that want to work at Microsoft and need a recommendation. And so on...

In all of these cases, there the social contract between us would have been broken. This is independent of whether it's technically possible for you to do these things by hand without needing a script or whatever.

Taking my data and sharing it with a third party without my permission isn't cool. Just because I shared information with you doesn't give you the right to share it with others.

 Now Playing: Eminem - Mockingbird


 

Categories: Social Software

One of the things to keep in mind when learning a new programming language is that it isn’t enough to learn the syntax and semantics of various language features you are unfamiliar with. Just as important is learning the idioms and way of thinking that goes with these language features. On reddit, ubernostrum pointed out that my usage of

if all_links.get(url) == None:

was jarring to read as a Python programmer when compared to the more idiomatic

if url not in all_links:

Of course this is just a stylistic issue but his point is valid. A similar thing happened with regards to other aspects of my recent post entitled Does C# 3.0 Beat Dynamic Languages at their Own Game? 

I argued that type inferencing and anonymous types in C# 3.0 did not offer the same degree of functionality that tuples and dynamic typing did when it came to processing intermediate values in a computation without requiring nominal types (i.e. named classes) to hold these values.

Specifically I had the following IronPython code,

IronPython Code

      for item in filteredItems:
            vote = (voteFunc(item), item, feedTitle)

            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) is None:
                    all_links[url] = []
                all_links.get(url).append(vote)

    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

The key things to note about the above code block are (i) the variable named vote is a tuple of three values; the numeric weight given to a link received from a particular RSS item, an RSS item and the title of the feed Python and (ii) the items in the tuple can be unpacked into individual variables when looping over the contents of the tuple in a for loop.

When I tried to write the same code in C# 3.0 with a vote variable that was an anonymous type, I hit a road block. When I placed instances of the anonymous type in the list, I had no way of knowing what the data type of the object I’d be pulling out of the list would be when I wanted to extract it later to tally the votes. Since C# is statically typed, knowing the type’s name is a requirement for retrieving the objects from the list later unless I planned to interact with them as instances of System.Object and access their fields through reflection (or something just as weird).

So in my C# 3.0 solution I ended up creating RankedLink and Vote types to simulate the functionality I was getting from tuples in Python.

However it turns out I was using anonymous types incorrectly. I tried to take a feature that was meant to be coupled with C# 3.0’s declarative Language Integrated Query (LINQ) and use it in the traditional imperative loop constructs I’ve been familiar with since my days programming in C.

Ian Griffith’s set me straight with his blog post entitled Dare Obasanjo on C# Anonymous Types where he showed how to use anonymous types to get the solution I wanted without having to create unnecessary named types to hold intermediate values. Ian’s code is shown below

C# 3.0 Code

// calculate vote for each outgoing url
var all_links = from item in items
                from url in item.OutgoingLinks.Keys 
                group item by url into itemUrlGroup
                select new
                {
                  Url=itemUrlGroup.Key,
                  Votes=from item in itemUrlGroup
                        select new
                        {
                          Weight=voteFunc(item),
                          Item=item,
                          FeedTitle=feedTitle
                        }
                };

// tally the votes
var weighted_links = from link_n_votes in all_links
                     select new
                     {
                       Url=link_n_votes.Url,
                       Score=(from vote in link_n_votes.Votes
                              group vote by vote.FeedTitle into feed
                              select feed.Min(vote => vote.Weight)
                             ).Sum()
                     } into weighted_link
                     orderby weighted_link.Score descending
                     select weighted_link;

As you can see, Ian’s code performs the same task as the Python code does but with a completely different approach. The anonymous types are performing the same function as the Python tuples did in my previous code sample and there is no need to create RankedLink and Vote types to hold these intermediate values.

What I find interesting about this is that even though I’ve been using C# for the past five or six years, I feel like I have to relearn the language from scratch to fully understand or be able to take advantage the LINQ features. Perhaps a few stints as a SQL developer may be necessary as well?  

 


 

Categories: Programming

Paul Buchheit, creator of Gmail now the founder of FriendFeed, has a blog post entitled Should Gmail, Yahoo, and Hotmail block Facebook? where he writes

Apparently Facebook will ban you (or at least Robert Scoble) if you attempt to extract your friend's email addresses from the service. Automated access is a difficult issue for any web service, so I won't argue with their decision -- it's their service and they own you. However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:
...
So the question is, should Gmail, Yahoo, and Hotmail block Facebook (or close the accounts of anyone who uses Facebook's "friend finder") for violating their Terms of Use?

I don't want to single out Facebook here since pretty much every "Web 2.0" website with social features is very in-your-face about asking for your credentials from your email provider and then screen scraping your contact's email addresses. I just signed up for Twitter and the user interface makes it cumbersome to even start using the service after creating an account without giving up your email username and password.

I think there are two questions here. The first is whether users should be able to extract their data [including social graph data] from one service and import it into another. I personally believe the answer is Yes and this philosophy underlies what we've been working on at Windows Live and specifically the team I'm on which is responsible for the social graph contacts platform.

The next question is whether screen scraping is the way to get this data? I think the answer is definitely not. The first problem with this approach is that when I give some random "Web 2.0" social network my email username and password, I’m not only giving them access to my address book but also access to This seems like a lot of valuable data to trust  to some fly by night "Web 2.0"  service that can't seem to hire a full time sys admin or a full rack in a data center let alone know how to properly safeguard my personal information.

Another problem with this approach is that it encourages users to give up their usernames and passwords when prompted by any random Web site which increases incidences of phishing. Some have gone as far as calling this approach an anti-pattern that is kryptonite to the Open Web.

Finally, there is no way to identify the application that is accessing data on the user's behalf if it turns out to be a malicious application. For example, if you read articles like Are you getting Quechup spammed you'll note that there's been more than one incident where a "Web 2.0" company turned out to either be spamming users via the email addresses they had harvested in this manner or straight up just resold the email addresses to spammers. Have you ever wondered how much spam you get because someone who has your email address blithely gave up your email credentials to some social network site who in turn used a Web service that is run by spammers to retrieve your contact details?

So if I think that user's should be able to get out their data yet screen scraping isn't the way, what should we do? At Windows Live, we believe the right approach is to provide user-centric APIs which allow users to grant and revoke permission to third party applications to access their personal data. For the specific case of social graph data, we've provided an ALPHA Windows Live Contacts API which is intended to meet exactly this scenario. The approach taken by this API and similar patterns (e.g. using OAuth) solves all three concerns I've raised above.

Now given what I've written above, do you think Hotmail should actively block or hinder screen scraping applications used to obtain the email addresses of a user's contacts?


 

Categories: Platforms | Windows Live

I’ve read a number of stories this week that highlight that interoperability between social networking sites will be a “top ask” in 2008 (as we say at Microsoft). Earlier this week I read the Wired article Should Web Giants Let Startups Use the Information They Have About You? which does a good job of telling both sides of the story when it comes to startups screen scraping importing user data such as social graphs (i.e. friend and contact lists) from more successful sites as a way to bootstrap their social networks. The Wired article is a good read if you want to hear all sides of the story when it comes to the issue of sharing user social data between sites.

Yesterday, I saw Social Network Aggregation, Killer App in 2008? which points out the problem that users often belong to multiple social networks at once and that bridging between them is key. However I disagree with the premise that this points to need for a “Social Network Aggregator” category of applications. I personally believe that the list of 20 or so Social Network Aggregators on Mashable are all companies that would cease to exist if the industry got off it’s behind and worked towards actual interoperability between social networking sites.

Today, I saw saw Facebook disabled Robert Scoble’s account. After reading Robert’s account of the incident, I completely agree with Facebook.

Why Robert Scoble is Wrong and Facebook is Right

Here’s what Robert Scoble wrote about the incident

My account has been “disabled” for breaking Facebook’s Terms of Use. I was running a script that got them to keep me from accessing my account

I am working with a company to move my social graph to other places and that isn’t allowable under Facebook’s terms of service. Here’s the email I received:

+++++

Hello,

Our systems indicate that you’ve been highly active on Facebook lately and viewing pages at a quick enough rate that we suspect you may be running an automated script. This kind of Activity would be a violation of our Terms of Use and potentially of federal and state laws.

As a result, your account has been disabled. Please reply to this email with a description of your recent activity on Facebook. In addition, please confirm with us that in the future you will not scrape or otherwise attempt to obtain in any manner information from our website except as permitted by our Terms of Use, and that you will immediately delete and not use in any manner any such information you may have previously obtained.

The first thing to note is that Facebook allows you to extract your social graph data from their site using the Facebook platform. In fact, right now whenever I get an email from someone on my Facebook friend list in Outlook or I get a phone call from them, I see the picture from their Facebook profile. I did this using OutSync which is an application that utilizes the Facebook platform to merge data from my contacts in Outlook/Exchange with my Facebook contacts.

So if Facebook allows you to extract information about your Facebook friends via their APIs, why would Robert Scoble need to run a screen scraping script? The fact is that the information returned by the Facebook API about a user contains no contact information (no email address, no IM screen names, no telephone numbers, no street address). Thus if you are trying to “grow virally” by spamming the Facebook friend list of one of your new users about the benefits of your brand new Web 2.0 site then you have to screen scrape Facebook.  However there is the additional wrinkle that unlike address books in Web email applications Robert Scoble did not enter any of this contact information about his friends. With this in mind, it is hard for Robert Scoble to argue that the data is “his” to extract from Facebook. In addition, as a Facebook user I consider it a feature that Facebook makes it hard for my personal data to be harvested in this way. Secondly, since Robert’s script was screen scraping it means that it had to hit the site five thousand times (once for each of his contacts) to fetch all of Robert’s friends personally idenitifiable information (PII).  Given that eBay won a court injunction against Bidder’s Edge for running 100,000 queries a day, it isn’t hard to imagine that the kind of screen scraping script that Robert is using would be considered malicious even by a court of law.

I should note that Facebook is being a bit hypocritical here since they do screen scrape other sites to get the email addresses of the contacts of new users. This is why I’ve called them the Social Graph Roach Motel in the recent past. 

O’Reilly Social Graph FOO Camp

This past weekend I got an email from Tim O'Reilly, David Recordon, and Scott Kveton inviting me to a Friends of O’Reilly Camp (aka FOO Camp) dedicated to “social graph” problems. I’m still trying to figure out if I can make it based on my schedule and whether I’m really the best person to be representing Microsoft at such an event given that I’m a technical person and “social graph problems” for the most part are not technical issues.

Regardless of whether I am able to attend or not, there were some topics I wanted to recommend should be added to a list of “red herring” topics that shouldn’t be discussed until the important issues have been hashed out.

  • Google OpenSocial: This was an example of unfortunate branding. Google should really have called this “Google OpenWidgets” or “Google Gadgets for your Domain” since the goal was competing with Facebook’s widget platform not actually opening up social networks. Since widget platforms aren’t a “social graph problem” it doesn’t seem fruitful the spend time discussing this when there are bigger fish to fry.

  • Social Network Portability: When startups talk about “social network portability” it’s usually a euphemism for collecting a person’s username and password for another site, retrieving their contact/friend list and spamming those people about their hot new Web 2.0 startup. As a user of the Web, making it easier to receive spam from startups isn’t something I think should be done let alone a “problem” that needs solving. I understand that lots of people will disagree with this [even at Microsoft] but I’m convinced that this is not the real problem facing the majority of users of social networking sites on the the Web today.  

What I Want When It Comes to Social Network Interoperability

Having I’ve said what I don’t think is important to discuss when it comes to “social graph problems”, it would be rude not to provide an example fof what I think would be fruitful discussion. I wrote the problem I think we should be solving as an industry a while back in a post entitled A Proposal for Social Network Interoperability via OpenID which is excerpted below

I have a Facebook profile while my fiancée wife has a MySpace profile. Since I’m now an active user of Facebook, I’d like her to be able to be part of my activities on the site such as being able to view my photos, read my wall posts and leave wall posts of her own. I could ask her to create a Facebook account, but I already asked her to create a profile on Windows Live Spaces so we could be friends on that service and quite frankly I don’t think she’ll find it reasonable if I keep asking her to jump from social network to social network because I happen to try out a lot of these services as part of my day job. So how can this problem be solved in the general case? 

This is a genuine user problem which the established players have little incentive to fix. The data portability folks want to make it easy for you to jump from service to service. I want to make it easy for users of one service to talk to people on another service. Can you imagine if email interoperability was achieved by making it easy for Gmail users to export their contacts to Yahoo! mail instead of it being that Gmail users can send email to Yahoo! Mail users and vice versa?

Think about that.

Now playing: DJ Drama - The Art Of Storytellin' Part 4 (Feat. Outkast And Marsha Ambrosius)


 

For the past few years I've heard a lot of hype about dynamic programming languages like Python and Ruby. The word on the street has been that their dynamic nature makes developers more productive that those of us shackled to statically typed languages like C# and Java. A couple of weeks ago I decided to take the plunge and start learning Python after spending the past few years doing the majority of my software development in C#. I learned that it was indeed true that you could get things the same stuff done in far less lines of Python than you could in C#. Since it is a general truism in the software industry that the number of bugs per thousand lines of code is constant irrespective of programming language, the more you can get done in fewer lines of code, the less defects you will have in your software.

Shortly after I started using Python regularly as part of the prototyping process for developing new features for RSS Bandit, I started trying out C# 3.0.  I quickly learned that a lot of the features I'd considered as language bloat a couple of months ago actually made a lot of sense if you're familiar with the advantages of dynamic and functional programming approaches to the tasks of software development. In addition, C# 3.0 actually fixed one of the problems I'd encountered in my previous experience with a dynamic programming language while in college.

Why I Disliked Dynamism: Squeak Smalltalk

Back in my college days I took one of Mark Guzdial's classes which involved a group programming project using Squeak Smalltalk. At the time, I got the impression that Squeak was composed of a bunch of poorly documented libraries cobbled together from the top graded submissions to assignments from Mark's class. What was particularly frustrating about the lack of documentation was that even looking at method prototypes gave no insight into how to call a particular library. For example, here's an example of a SalariedEmployee class taken from an IBM SmallTalk tutorial

"A subclass of Employee that adds protocol needed for 
          employees with salaries"

       Employee subclass: #SalariedEmployee
          instanceVariableNames:  'position salary'
          classVariableNames: ' '
          poolDictionaries: ' ' !

       ! SalariedEmployee publicMethods !

          position: aString 
             position := aString !
          position 
             ^position !
          salary: n 
             salary := n !
          salary 
             ^salary ! !

In the example above, there is a method called salary() that takes a parameter n whose type we don't know. n could be a string, an integer or a floating point number. If you are using the SalariedEmployee class in your code and want to set the employee's salary, the only way to find out what to pass in is to grep through the code and find out how the method is being used by others. You can imagine how frustrating it gets when every time you want to perform a basic task like perform a Web request you have to grep around trying to figure out if the url parameter you pass to the Http classes is a string, a Uri class or some oter random thing you haven't encountered yet.

For a long time, this was my only experience with a dynamic programming language and I thought it sucked...a lot.

Why I Grew to Love Dynamism: XML and C#

The first half of my career at Microsoft was spent working on the XML team which was responsible for the core XML processing APIs that are utilized by the majority of Microsoft's product line. One of the things that was so cool about XML was that it enabled data formats to be as strongly structured or semi-structured depending on the needs of the application. This flexibility is what gives us data formats like the Atom syndication format which although rigidly structured in parts (e.g. atom:entry elements MUST contain exactly one atom:id element, etc)  also supports semi-structured data (e.g. atom:content can contain blocks of XHTML) and enables distributed extensibility where anyone on the Web is free to extend the data format as long as they place their extensions in the right namespace.

However one problem we repeatedly bumped against is that data formats that can have unknown data types show up in them at runtime bump up against the notion of static typing that is a key aspect of languages in C#. I've written about this in the past in posts such as What's Right and Wrong with Code Generation in Web Services which is excerpted below

Another problem is that the inflexible and rigid requirements of static typing systems runs counter to the distributed and flexible nature of the Web. I posted a practical example a few years ago in my post entitled Why You Should Avoid Using Enumerated Types in XML Web Services. In that example, I pointed out that if you have a SOAP Web Service  that returns an enumeration with the possible value {CDF, RSS10, RSS20} and in a future release modify that enumeration by adding a new syndication format {CDF, RSS10, RSS20, Atom} then even if you never return that syndication format to old clients written in .NET, these clients will still have to be recompiled because of the introduction of a new enumeration value. I find it pretty ridiculous that till today I have list of "people we need to call and tell to recompile their code whenever we change an enum value in any of our SOAP Web Services".

I came to the realization that some degree of dynamism is desirable especially when dealing with the loosely coupled world of distributed programming on the Web. I eventually decided to ignore my earlier misgivings and start exploring dynamic programming languages. I chose IronPython because I could focus on learning the language while relying on the familiar .NET Framework class library when I wanted to deal with necessary tasks like file I/O or Web requests.

After getting up to speed with Python and then comparing it to C# 2.0, it was clear that the dynamic features of Python made my life as a programmer a lot easier. However something interesting happened along the way. Microsoft shipped C# 3.0 around the same time I started delving into Python. As I started investigating C# 3.0, I discovered that almost all the features I'd fallen in love with in Python which made my life as a developer easier had been integrated into C#. In addition, there was also a feature which is considered to be a killer feature of the Ruby programming language which also made it into C# 3.0.

Python vs. C# 3.0: Lambda Expressions

According to the Wikipedia entry on Dynamic Programming Languages

There are several mechanisms closely associated with the concept of dynamic programming. None are essential to the classification of a language as dynamic, but most can be found in a wide variety of such languages.
...
Higher-order functions
However, Erik Meijer and Peter Drayton caution that any language capable of loading executable code at runtime is capable of eval in some respect, even when that code is in the form of dynamically linked shared libraries of machine code. They suggest that higher-order functions are the true measure of dynamic programming, and some languages "use eval as a poor man's substitute for higher-order functions."[1]

The capability of a programming language to treat functions as first class objects that can be the input(s) or the output(s) of a function call is a key feature of many of today's popular "dynamic" programming languages. Additionally, creating a short hand syntax where anonymous blocks of code can be treated as function objects is now commonly known as "lambda expressions". Although C# has had functions as first class objects since version 1.0 with delegates and introduced anonymous delegates in C# 2.0, it is in C# 3.0 where the short hand syntax of lambda expressions has found its way into the language. Below are source code excerpts showing the difference between the the lambda expression functionality in C# and IronPython 

C# Code

//decide what filter function to use depending on mode 
Func<RssItem, bool> filterFunc = null;
if(mode == MemeMode.PopularInPastWeek) 
   filterFunc = x => (DateTime.Now - x.Date < one_week) ;
else 
   filterFunc = x => x.Read == false;

IronPython Code

#decide what filter function to use depending on mode
filterFunc = mode and (lambda x : (DateTime.Now - x.date) < one_week) or (lambda x : x.read == 0)

Although the functionality is the same, it takes a few more lines of code to express the same idea in C# 3.0 than in Python. The main reason for this is due to the strong and static typing requirements in C#. Ideally developers should be able to write code like 

Func<RssItem, bool> filterFunc = (mode == MemeMode.PopularInPastWeek ? x => (DateTime.Now - x.Date < one_week) : x => x.read == false);

However this doesn’t work because the compiler cannot determine whether each of the lambda expressions that can be returned by the conditional expression are of the same type. Despite the limitations due to the static and strong typing requirements of C#, the lambda expression feature in C# 3.0 is extremely powerful.

You don’t have to take my word for it. Read Joel Spolsky’s Can Your Programming Language Do This? and Peter Norvig’s Design Patterns in Dynamic Programming.  Peter Norvig’s presentation makes a persuasive argument that a number of the Gang of Four’s Design Patterns either require a lot less code or are simply unneeded in a dynamic programming language that supports higher order functions. For example, he argues that the Strategy pattern does not need separate classes for each algorithm in a dynamic language and that closures eliminate the need for Iterator classes. Read the entire presentation, it is interesting and quite illuminating.

Python vs. C# 3.0: List Comprehensions vs. Language Integrated Query

A common programming task is to iterate over a list of objects and either filter or transform the objects in the list thus creating a new list. Python has list comprehensions as a way of simplifying this common programming task. Below is an excerpt from An Introduction to Python by Guido van Rossum on list expressions   

List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs. Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it.

Below is a code sample showing how list comprehensions can be used to first transform a list of objects (i.e. XML nodes) to another (i.e. RSS items) and then how the resulting list can be further filtered to those from a particular date. 

IronPython Code

 # for each item in feed        
 # convert each <item> to an RssItem object then apply filter to pick candidate items
  items = [ MakeRssItem(node) for node in doc.SelectNodes("//item")]
  filteredItems = [item for item in items if filterFunc(item)]

My friend Erik Meijer once observed that certain recurring programming patterns become more obvious as a programming language evolves, these patterns first become encapsulated by APIs and eventually become part of the programming language’s syntax. This is what happened in the case of the Python’s map() and filter() functions which eventually gave way to list comprehensions.

C# 3.0 does something similar but goes a step further. In C# 3.0, the language designers made the observation that performing SQL-like projection and selection is really the common operation and not just filtering/mapping of lists. This lead to Language Integrated Query (LINQ). Below is the same filtering operation on a list of XML nodes performed using C# 3.0

C# 3.0 Code

//for each item in feed        
// convert each <item> to an RssItem object then apply filter to pick candidate items
var items = from rssitem in 
              (from itemnode in doc.Descendants("item") select MakeRssItem(itemnode))
            where filterFunc(rssitem)
            select rssitem;

These are two fundamentally different approaches to tackling the same problem. Where LINQ really shines is when it is combined with custom data sources that have their own query languages such as with LINQ to SQL and LINQ to XML which map the query operations to SQL and XPath queries respectively.

Python vs. C# 3.0: Tuples and Dynamic Typing vs. Anonymous Types and Type Inferencing

As I’ve said before, tuples are my favorite Python feature. I’ve found tuples useful in situations where I have to temporarily associate two or three objects and don’t want to go through the hassle of creating a new class just to represent the temporary association between these types. I’d heard that a new feature in C# 3.0 called anonymous types which seemed like it would be just what I need to fix this pet peeve once and for all. The description of the feature is as follows

Anonymous types are a convenient language feature of C# and VB that enable developers to concisely define inline CLR types within code, without having to explicitly define a formal class declaration of the type.

I assumed this feature in combination with the var keyword would make it so I would no longer miss Python tuples when I worked with C#. However I was wrong. Let’s compare two equivalent blocks of code in C# and IronPython. Pay particular attention to the highlighed lines

IronPython Code

      for item in filteredItems:
            vote = (voteFunc(item), item, feedTitle)

            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) == None:
                    all_links[url] = []
                all_links.get(url).append(vote)

    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

The key things to note about the above code block are (i) the variable named vote is a tuple of three values; the numeric weight given to a link received from a particular RSS item, an RSS item and the title of the feed Python and (ii) the items in the tuple can be unpacked into individual variables when looping over the contents of the tuple in a for loop.

Here’s the closest I could come in C# 3.0

C# 3.0 Code

// calculate vote for each outgoing url
 foreach (RssItem item in items) { 
       var vote = new Vote(){ Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };
       //add a vote for each of the URLs
       foreach (var url in item.OutgoingLinks.Keys) {
           List<Vote> value = null;
           if (!all_links.TryGetValue(url, out value))
                value = all_links[url] = new List<Vote>(); 
                            
           value.Add(vote);                                                    
         }
   }// foreach (RssItem item in items)
//tally the votes
  List<RankedLink> weighted_links = new List<RankedLink>();
  foreach (var link_n_votes in all_links) {
       Dictionary<string, double> site = new Dictionary<string, double>();
       foreach (var vote in link_n_votes.Value) {
           double oldweight;
           site[vote.FeedTitle] = site.TryGetValue(vote.FeedTitle, out oldweight) ? 
                                  Math.Min(oldweight, vote.Weight): vote.Weight; 
        }
        weighted_links.Add(new RankedLink(){Score=site.Values.Sum(), Url=link_n_votes.Key});
    }
    weighted_links.Sort((x, y) => y.Score.CompareTo(x.Score));

The relevant line above is

var vote = new Vote() { Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };

which I had INCORRECTLY assumed I would be able to write as

var vote = new { Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };

In Python, dynamic typing is all about the developer knowing what types they are working with while the compiler is ignorant about the data types. However type inferencing in C# supports the opposite scenario, when the compiler knows the data types but the developer does not.  

The specific problem here is that if I place an anonymous type in a list, I have no way of knowing what the data type of the object I’m pulling out of the list will be. So I will either have to interact with them as instances of System.Object when popped from the list which makes them pretty useless or access their fields via reflection. Python doesn’t have this problem because I don’t need to know the type of an object to interact with it, I just need to know how what properties/fields and methods it supports.

At the end of the day, I realized that the var keyword is really only useful when constructing anonymous types as a result of LINQ expressions. In every other instance where it is acceptable to use var, you have to know the type of the object anyway so all you are doing is saving keystrokes by using it. Hmmmm.

Ruby vs. C# 3.0: Extension Methods

Extension methods is a fairly disconcerting feature that has been made popular by the Ruby programming language. The description of the feature is excerpted below

Extension methods allow developers to add new methods to the public contract of an existing CLR type, without having to sub-class it or recompile the original type.  Extension Methods help blend the flexibility of "duck typing" support popular within dynamic languages today with the performance and compile-time validation of strongly-typed languages.

I consider this feature to be the new incarnation of operator overloading. Operator overloading became widely reviled because it made code harder to read because you couldn’t just look at a code block and know what it does if you didn’t know how the operators had been implemented. Similarly, looking at an excerpt of C# code you may not realize that everything isn’t what it seems.

I spent several minutes being confused today because I couldn’t get the line

XAttribute read_node = itemnode.XPathEvaluate("//@*[local-name() = 'read']") as XAttribute;

to compile. It turns out that XPathEvaluate is an extension method and you need to import the System.Xml.XPath namespace into your project before the XPathEvaluate() method shows up as a method in the XElement class.

I’ve heard the arguments that Ruby makes it easier to express programmer intent and although I can see how XElement.XPathEvaluate(string) is a more readable choice than XPathQueryEngine.Evaluate(XElement, string) if you want to perform an XPath query on an XElement object, for now I think the readability issues it causes by hiding dependencies isn’t worth it. I wonder if any Ruby developers out there with a background in other dynamic languages that don’t have that feature (e.g. Python) care to offer a counter opinion based on their experience?

FINAL THOUGHTS

C# has added features that make it close to being on par with the expressiveness of functional and dynamic programming languages. The only thing missing is dynamic typing (not duck typing), which I’ve come to realize is has a lot more going for it than lots of folks in the strongly and statically typed world would care to admit. At first, I had expected that after getting up to speed with C# 3.0, I’d lose interest in Python but that is clearly not the case.

I love the REPL, I love the flexibility that comes from having natural support tuples in the language and I love the more compact syntax.  I guess I’ll be doing a lot more coding in Python in 2008.

Now Playing: Da Back Wudz - U Gonna Love Me


 

Categories: Programming

January 2, 2008
@ 03:05 AM

A few weeks ago, I wrote a prototype for the meme tracking feature of RSS Bandit in IronPython. The code was included in my blog post A Meme Tracker In IronPython.   The script was a port of Sam Ruby's original MeMeme script which shows the most recently popular links from from a set of RSS feeds.

I was impressed with how succinct the code was in IronPython when compared to what the code eventually looked like when I ported it to C# 2.0 and integrated it into RSS Bandit. Looking over the list of new features in C# 3.0, it occurred to me that a C# 3.0 version of the script would be as concise or even more concise than the IronPython version. So I ported the script to C# 3.0 and learned a few things along the way.

I'll post something shortly that goes into some details on my perspectives on the pros and cons of the various C# 3.0 features when compared to various Python features. For now, here's the meme tracker script in C# 3.0. Comparing it to the IronPython version should provide some food for thought.


using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Globalization;

namespace Memetracker {

    enum MemeMode { PopularInUnread, PopularInPastWeek }

    class RankedLink{
       public string Url { get; set;}
       public double Score { get; set; }   
    }

    class Vote {
        public double Weight { get; set; }
        public RssItem Item { get; set; }
        public string FeedTitle { get; set; }
    }


    class RssItem {
        public string Title { get; set; }
        public DateTime Date { get; set; }
        public bool Read { get; set; }
        public string Permalink { get; set; }
        public Dictionary<string, string> OutgoingLinks { get; set; }
    }

    class Program {

        static Dictionary<string, List<Vote>> all_links = new Dictionary<string, List<Vote>>();
        static TimeSpan one_week = new TimeSpan(7, 0, 0, 0);
        static MemeMode mode = MemeMode.PopularInPastWeek;

        static string cache_location = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), "Temp");
        static string href_regex = @"<a[\s]+[^>]*?href[\s]?=[\s""']+(.*?)[\""']+.*?>([^<]+|.*?)?<\/a>";
        static Regex regex       = new Regex(href_regex);

        static RssItem MakeRssItem(XElement itemnode) {

            XElement link_node = itemnode.Element("link");
            var permalink = (link_node == null ? "" : link_node.Value);
            XElement title_node = itemnode.Element("title");
            var title = (title_node == null ? "" : title_node.Value);
            XElement date_node = itemnode.Element("pubDate");
            var date = (date_node == null ? DateTime.Now : DateTime.Parse(date_node.Value, null, DateTimeStyles.AdjustToUniversal));
            XAttribute read_node = itemnode.XPathEvaluate("//@*[local-name() = 'read']") as XAttribute;
            var read = (read_node == null ? false : Boolean.Parse(read_node.Value));
            XElement desc_node = itemnode.Element("description");
            // obtain href value and link text pairs
            var outgoing = (desc_node == null ? regex.Matches(String.Empty) : regex.Matches(desc_node.Value));
            var outgoing_links = new Dictionary<string, string>();
            //ensure we only collect unique href values from entry by replacing list returned by regex with dictionary
            if (outgoing.Count > 0) {
                foreach (Match m in outgoing)
                    outgoing_links[m.Groups[1].Value] = m.Groups[2].Value;
            }
            return new RssItem() { Permalink = permalink, Title = title, Date = date, Read = read, OutgoingLinks = outgoing_links };
        }

        static void Main(string[] args) {

            if (args.Length > 0) //get directory of RSS feeds
                cache_location = args[0];
            if (args.Length > 1) //mode = 0 means use only unread items, mode != 0 means use all items from past week
                mode = (Int32.Parse(args[1]) != 0 ? MemeMode.PopularInPastWeek : MemeMode.PopularInUnread);

            Console.WriteLine("Processing items from {0} seeking items that are {1}", cache_location,
                (mode == MemeMode.PopularInPastWeek ? "popular in items from the past week" : "popular in unread items"));
            
            //decide what filter function to use depending on mode 
            Func<RssItem, bool> filterFunc = null;
            if(mode == MemeMode.PopularInPastWeek) 
                filterFunc = x => (DateTime.Now - x.Date < one_week) ;
            else 
                filterFunc = x => x.Read == false;

            //in mode = 0 each entry linking to an item counts as a vote, in mode != 0 value of vote depends on item age
            Func<RssItem, double> voteFunc   = null; 
            if(mode == MemeMode.PopularInPastWeek) 
                voteFunc = x => 1.0 - (DateTime.Now.Ticks - x.Date.Ticks) * 1.0 / one_week.Ticks; 
            else 
                voteFunc = x => 1.0;

            
            var di = new DirectoryInfo(cache_location); 
            foreach(var fi in di.GetFiles("*.xml")){
                var doc = XElement.Load(Path.Combine(cache_location, fi.Name));
                // for each item in feed
                //  1. Get permalink, title, read status and date
                //  2. Get list of outgoing links + link title pairs
                //  3. Convert above to RssItem object
                //  4. apply filter to pick candidate items
                var items = from rssitem in 
                            (from itemnode in doc.Descendants("item")                            
                            select MakeRssItem(itemnode))
                            where filterFunc(rssitem)
                            select rssitem;
                var feedTitle = doc.XPathSelectElement("channel/title").Value;
                // calculate vote for each outgoing url
                foreach (RssItem item in items) { 
                    var vote = new Vote(){ Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };
                    //add a vote for each of the URLs
                    foreach (var url in item.OutgoingLinks.Keys) {
                        List<Vote> value = null;
                        if (!all_links.TryGetValue(url, out value))
                            value = all_links[url] = new List<Vote>(); 
                            
                        value.Add(vote);                                                    
                    }
                }// foreach (RssItem item in items)
            }// foreach(var fi in di.GetFiles("*.xml"))
           
            //tally the votes
            List<RankedLink> weighted_links = new List<RankedLink>();
            foreach (var link_n_votes in all_links) {
                Dictionary<string, double> site = new Dictionary<string, double>();
                foreach (var vote in link_n_votes.Value) {
                    double oldweight;
                    site[vote.FeedTitle] = site.TryGetValue(vote.FeedTitle, out oldweight) ? 
                                            Math.Min(oldweight, vote.Weight): vote.Weight; 
                }
                weighted_links.Add(new RankedLink(){Score=site.Values.Sum(), Url=link_n_votes.Key});
            }
            weighted_links.Sort((x, y) => y.Score.CompareTo(x.Score));

            //output the results, choose link text from first item we saw story linked from
            Console.WriteLine("<html><body><ol>");
            foreach(var rankedlink in weighted_links.GetRange(0, 10)){
                var link_text = (all_links[rankedlink.Url][0]).Item.OutgoingLinks[rankedlink.Url];
                Console.WriteLine("<li><a href='{0}'>{1}</a> {2}", rankedlink.Url, link_text, rankedlink.Score);
                Console.WriteLine("<p>Seen on:");
                Console.WriteLine("<ul>");
                foreach (var vote in all_links[rankedlink.Url]) {
                    Console.WriteLine("<li>{0}: <a href='{1}'>{2}</a></li>", vote.FeedTitle, vote.Item.Permalink, vote.Item.Title);
                }
                Console.WriteLine("</ul></p></li>");
            }
            Console.WriteLine("</ol></body></html>");
            Console.ReadLine();
        }
    }
}
Now Playing: Lloyd Banks - Boywonder
 

Categories: Programming