Last year was the year of big changes in my personal life. I bought a house, got married and brought a very cute and lovable Shih Tzu into our household. Some time during 2007, I realized I'd been at Microsoft for over 5 years and decided that I'd also look for change in my professional life as well.

I learned a couple of lessons from the experience. The first was that looking around for a job while trying to buy a house, moving into a new home and working towards getting married is pretty stressful. The second thing I learned was that I hadn't really thought about what I want from my career in several years. Back in my college days, I had a clear idea where I wanted to be within my first year of graduation and every thing I did back then moved me closer to that goal, from the classes I took to ensuring that I interned every summer break. Since then, I haven't really had a "five year plan" to get me to the next stage in my career. I now have a much clearer idea where I want to be by 2010 than I have in the past two or three years. Finally, I realized that I actually really like working at Microsoft especially within my current job.

Ever since I came to that final realization I've wanted to blog about why this is the case but it seemed like such a corny thing to write about that I didn't want people reading this to think I was shilling for Microsoft. However this morning I was reading a blog post entitled Gone Indie by Jens Alfke which explained why he was leaving Apple Inc. after ten years and a lot of the reasons he is leaving are the same reasons I'm still at Microsoft. 

Social Software

Jens wrote

But I’m fascinated with social software. Apple isn’t. Despite some promising starts, the most I’ve been able to get accomplished in that vein at Apple was iChat [the IM part; I’m really not interested in videoconferencing], Safari RSS, and the “PubSub” [which turned out to be “RSS and Atom”] framework. There were some very promising prototypes of sexier things, but I really can’t talk about those, other than to say that they were canceled.

I looked around after Leopard was finished, and didn’t see any place in the company where I could pursue my ideas. It would have meant evangelizing reluctant executives into sharing my vision … and that’s something that I know I have little talent at.

I am similarly fascinated by Social Software and have been since I wrote down my epiphany Social Software is the Platform of the Future after a couple of conversations with my friend Mike Vernal. This epiphany is the reason I decided to start working in Microsoft's social graph contacts platform team which is where I continue to work till this day. Three and a half years later, this epiphany has been borne out by the rise of MySpace and Facebook as well as the realization by the technocrat masses that without data portability social software is the new vendor lock-in. This is all stuff Mike and I used to chat about back in 2004, and now Mike is off at Facebook and I'm here at Microsoft trying to make our visions a reality in our own little ways.

Unlike Jens, I don't have to evangelize reluctant execs into sharing my vision. A lot of our execs understand the importance of social software and have clear ideas of how Microsoft can add value to our users lives with our contributions to this space. When I talk to folks like Ray Ozzie, Chris Jones or David Treadwell about some of the problems I see in the social software space today, not only do they get it, I always leave the conversation with a strong sense that Microsoft will do the right thing.

Some people may criticize Microsoft for not being quick to jump onto every fad. However as Phil Haack mentioned in his blog post about his first few days as a new Microsoft hire, Microsoft invests for the long run and expects it's employees to think deeply about issues before acting. At the end of the day, the software we build in Windows Live impacts how hundreds of millions of people interact, share and communicate with their friends, family and loved ones. We endeavor to be good stewards of the trust they've placed in us.

Sharing Your Ideas

Jens wrote

I tend to have a lot of ideas. I’m not bragging, and that’s not always a good trait; it can be hard for me to focus on something long enough to finish it. A structured job has helped me stay on-task. On the other hand, though, the development cycle in a big company is such that every significant idea takes a year or more to finish, and during that time, more ideas pile up in my brain.

That wouldn’t be bad if there were some other channels to express those ideas. And if they took the form of songs, or novels, or scrimshaw carvings of Biblical scenes on walrus tusks, I could do whatever I wanted with them. But on software, Apple’s position (not unusually for the industry) is “All Your Idea Are Belong To Us”, and I signed onto that when I accepted the job offer. In other words, anything I do that relates in any way to Apple’s areas of business, no matter when or where I do it, belongs to Apple. [Edit: Ha! Note I’m still using present tense.]

(Again, this isn’t something particular about Apple. Most tech companies are like this, and if you work for one, you probably signed a very similar “Proprietary Rights Agreement” that they hid in the stack of paperwork beneath your offer letter. And yes, companies will enforce that if they see profit in it.)

I believe all Microsoft employees sign similar agreements with the company when hired. However, Microsoft is very good about letting employees explore their ideas in software on their own time without getting in the way. Projects like Script#, Reflector, RSS Bandit, DasBlog, Tweak UI and WiX are examples of software projects either developed or maintained by Microsoft employees in their free time that are now benefiting thousands to hundreds of thousands of end users.

However I think that more important than being able to share our ideas in code, being able to share our ideas in words is one of the coolest things about working at Microsoft. Thousands of Microsoft employees share their ideas with their coworkers, competitors and customers via blogs on a daily basis. Lots of companies would clamp down on that sort of behavior and ensure that only sanctioned company positions go out in employee communications but not Microsoft.

Even more surprisingly, Microsoft tolerates employees that may have ideas that differ from the company's ideas of how things should be done. You may wonder why that is surprisingly until you remember that even supposedly enlightened "Web 2.0" companies like Friendster and Google can fire you for disagreeing with the company's technology choices or hinting about future products or complaining about the company's benefits.

A lot of people [including Microsoft employees] wonder how I still have a job at Microsoft even though I've been critical of some of the company's strategies and products in my almost six years as an employee. Although I've had conversations with peers, middle managers and senior execs about my blog, I've never felt that my job was in danger. If anything, I've had it confirmed that Microsoft's culture is about being open and respectful. The one thing I have tried to change about my blog [and in fact all my communications] is being more respectful of other's perspectives and personal feelings especially when I disagree with them since you catch more flies with honey than with vinegar...or so I heard.

Individuality

Jens wrote

Finally — and this may seem petty — Apple’s lack of individuality bugs me. I don’t mean internally: within the company, communication is reasonably open (modulo confidentiality issues) and there’s lots of room for self-expression. But ever since the return of Steve Jobs, the company has been pretty maniacal about micro-managing its visible face, to make it as smooth and featureless as an iPod’s backside. (In my darker moments I’ve compared it to the brutal whiteness of “THX-1138”.)

It’s deeply ironic: For a company that famously celebrates individuality and Thinking Different, Apple has in the past decade kept its image remarkably impersonal. Other than the trinity who go onstage at press events — Steve Jobs, Jonathan Ive, Phil Schiller — how many people can you name who work for Apple? How many engineers?
...
And then there are blogs. Apple doesn’t like them, not when they talk about it. (Big surprise.) I’ve heard it said that there are hardly any bloggers working at Apple; there are actually a lot more than you’d think, but they mostly keep it a secret. (I could out a few people, including at least one director…) I think Apple’s policy on blogging is one of the least enlightened of major tech companies; Microsoft in particular is surprisingly open.

There really isn't much more I can add to that. The fact that you are reading my blog and know who I am is a testament to how much Microsoft encourages it's employees to express their individuality in their products and in our communications with our customers.

This may not be a big deal in 2008 when everyone is blogging but it was back in 2003 when the early community of Microsoft bloggers could all fit at a table in a single restaurant. Especially since when you consider it, Microsoft bloggers are probably a large part of the reason corporate blogging is mainstream today. That alone is a worthy legacy in my book.

I'd like to leave you with this image from Scott Hanselman's post about joining Microsoft. Everyone's goal should be trying to get to center of the picture.

Now Playing: Wu-Tang Clan - Can It Be All So Simple


 

Categories:

Yesterday:

After publishing an invitation to Facebook to join the DataPortability Working Group January 4, we never thought that Facebook would accept it. Today changes everything you’ve ever thought about social-networking data and lock-in before, because today Facebook, Google and Plaxo have joined the DataPortability Workgroup.

Today:

Joining the cavalcade of companies jumping on the open data bandwagon, LinkedIn has now joined Facebook, Google, Plaxo (announcement here) in joining the DataPortability Work Group.

LinkedIn has worked hard to become open since announcing their own open platform in June 2007 in response to Facebook, then becoming an initial OpenSocial launch partner in October 2007.

Update: Web developers from Flickr, SixApart, and Twitter have also joined.

Some folks at work asked me what I thought of DataPortability.org. The short version is that it reminds me of AttentionTrust, long on intention and short on implementation.

Principles are nice but working code is even better. 

Now playing: Aerosmith - Eat the Rich


 

Categories: Social Software

Brad Fitzpatrick, founder of Livejournal, has a blog post entitled Naming twins in Python & Perl where he writes

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.
I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.
Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:

#!/usr/bin/python

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    letters.sort()
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []
    by_anagram[sorted_name].append(name)

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:
        continue

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy

You can guess where this is going. Below is my solution in C# 3.0

using System;
using System.Linq;
using System.IO;

namespace Name_agram{
    class Program{
        static void Main(string[] args){
            var names    = from line in File.ReadAllLines("dist.male.first.txt")
                           let name = line.Split(' ')[0]
                           group name by new string(name.ToCharArray().OrderBy(x=>x).ToArray()) into anagrams
                           select anagrams;
                                        
            foreach (var list_of_names in names) {
                if (list_of_names.Count() > 1)
                    Console.WriteLine(String.Join(" ",list_of_names.ToArray()));
            }
            Console.ReadLine();
        }
    }
}

There are a ton of solutions in various languages in the comments to Brad's post. Again, it is startling to see how different an idiomatic C# 3.0 solution using LINQ is from the traditional imperative/procedural style of programming.

Now Playing: Rick James - Give It To Me Baby


 

Categories: Programming

This is likely my last post in Robert Scoble vs. Facebook saga but I think there are some subtle points being lost because of the typical blog feeding frenzy where people either choose to flame Facebook, Scoble or both. Robert Scoble has a post entitled Plaxo: the social monster? where he writes

Judi Sohn rips into the trustworthiness of both me and Plaxo for attempting to import email addresses, names, and birthdays.
...
What if I wrote down Judi’s email and then manually put it into my Outlook’s contact database. Wouldn’t that have been exactly the same thing that I tried to do with Plaxo’s script?

There are a couple of things wrong with Robert's analogy.

When I entire my personally identifiable information (PII) into Facebook, I am entering into a social contract with two entities. I am trusting Facebook to protect my data so it is safe from malicious hackers and not sell it to malicious third parties like spammers or telemarketers, in return I provide Facebook with accurate data which improves their service and the user experience of the people in my social network.  In addition, I am implicitly trusting the people in my social network not to abuse the privilege of having my personal information (e.g. by prank calling my cell phone, giving my personal details to third parties I don't trust).

There is a key difference between Robert taking my personal information I shared with him on Facebook and importing into Outlook versus importing it into Plaxo Pulse. In the former case, Robert is taking data I shared with him and viewing it in a different application. In the latter case, Robert is additionally sharing my personal details with a corporate entity; Plaxo, Inc. This is an entity that is synonymous with spam and at the time of writing this post there 209,000 hits returned for a search for "Plaxo Spam" on the Google search engine. This is the key difference between Robert importing my personal details into Outlook and importing it into Plaxo Pulse.

Lots of geeks have focused on the fact that since it was possible for Robert to manually extract this data, then then people sharing data with him shouldn't complain since they gave him access to the data. This ignores the fact that just because something is technically possible doesn't make it right even if it is legal. Just because it is technically possible for you to read the RSS feed for my blog and republish it on a splog so you can make money from AdSense ads doesn't make it right. Just because it is technically possible for you to view my photo albums on Windows Live Spaces doesn't mean I'd think it was OK to use Omar's Send to Smugmug script to republish these photos on Smugmug. Just because you have my phone number doesn't mean I think it is OK for you to share it with all your drinking buddies that want to work at Microsoft and need a recommendation. And so on...

In all of these cases, there the social contract between us would have been broken. This is independent of whether it's technically possible for you to do these things by hand without needing a script or whatever.

Taking my data and sharing it with a third party without my permission isn't cool. Just because I shared information with you doesn't give you the right to share it with others.

 Now Playing: Eminem - Mockingbird


 

Categories: Social Software

One of the things to keep in mind when learning a new programming language is that it isn’t enough to learn the syntax and semantics of various language features you are unfamiliar with. Just as important is learning the idioms and way of thinking that goes with these language features. On reddit, ubernostrum pointed out that my usage of

if all_links.get(url) == None:

was jarring to read as a Python programmer when compared to the more idiomatic

if url not in all_links:

Of course this is just a stylistic issue but his point is valid. A similar thing happened with regards to other aspects of my recent post entitled Does C# 3.0 Beat Dynamic Languages at their Own Game? 

I argued that type inferencing and anonymous types in C# 3.0 did not offer the same degree of functionality that tuples and dynamic typing did when it came to processing intermediate values in a computation without requiring nominal types (i.e. named classes) to hold these values.

Specifically I had the following IronPython code,

IronPython Code

      for item in filteredItems:
            vote = (voteFunc(item), item, feedTitle)

            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) is None:
                    all_links[url] = []
                all_links.get(url).append(vote)

    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

The key things to note about the above code block are (i) the variable named vote is a tuple of three values; the numeric weight given to a link received from a particular RSS item, an RSS item and the title of the feed Python and (ii) the items in the tuple can be unpacked into individual variables when looping over the contents of the tuple in a for loop.

When I tried to write the same code in C# 3.0 with a vote variable that was an anonymous type, I hit a road block. When I placed instances of the anonymous type in the list, I had no way of knowing what the data type of the object I’d be pulling out of the list would be when I wanted to extract it later to tally the votes. Since C# is statically typed, knowing the type’s name is a requirement for retrieving the objects from the list later unless I planned to interact with them as instances of System.Object and access their fields through reflection (or something just as weird).

So in my C# 3.0 solution I ended up creating RankedLink and Vote types to simulate the functionality I was getting from tuples in Python.

However it turns out I was using anonymous types incorrectly. I tried to take a feature that was meant to be coupled with C# 3.0’s declarative Language Integrated Query (LINQ) and use it in the traditional imperative loop constructs I’ve been familiar with since my days programming in C.

Ian Griffith’s set me straight with his blog post entitled Dare Obasanjo on C# Anonymous Types where he showed how to use anonymous types to get the solution I wanted without having to create unnecessary named types to hold intermediate values. Ian’s code is shown below

C# 3.0 Code

// calculate vote for each outgoing url
var all_links = from item in items
                from url in item.OutgoingLinks.Keys 
                group item by url into itemUrlGroup
                select new
                {
                  Url=itemUrlGroup.Key,
                  Votes=from item in itemUrlGroup
                        select new
                        {
                          Weight=voteFunc(item),
                          Item=item,
                          FeedTitle=feedTitle
                        }
                };

// tally the votes
var weighted_links = from link_n_votes in all_links
                     select new
                     {
                       Url=link_n_votes.Url,
                       Score=(from vote in link_n_votes.Votes
                              group vote by vote.FeedTitle into feed
                              select feed.Min(vote => vote.Weight)
                             ).Sum()
                     } into weighted_link
                     orderby weighted_link.Score descending
                     select weighted_link;

As you can see, Ian’s code performs the same task as the Python code does but with a completely different approach. The anonymous types are performing the same function as the Python tuples did in my previous code sample and there is no need to create RankedLink and Vote types to hold these intermediate values.

What I find interesting about this is that even though I’ve been using C# for the past five or six years, I feel like I have to relearn the language from scratch to fully understand or be able to take advantage the LINQ features. Perhaps a few stints as a SQL developer may be necessary as well?  

 


 

Categories: Programming

Paul Buchheit, creator of Gmail now the founder of FriendFeed, has a blog post entitled Should Gmail, Yahoo, and Hotmail block Facebook? where he writes

Apparently Facebook will ban you (or at least Robert Scoble) if you attempt to extract your friend's email addresses from the service. Automated access is a difficult issue for any web service, so I won't argue with their decision -- it's their service and they own you. However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:
...
So the question is, should Gmail, Yahoo, and Hotmail block Facebook (or close the accounts of anyone who uses Facebook's "friend finder") for violating their Terms of Use?

I don't want to single out Facebook here since pretty much every "Web 2.0" website with social features is very in-your-face about asking for your credentials from your email provider and then screen scraping your contact's email addresses. I just signed up for Twitter and the user interface makes it cumbersome to even start using the service after creating an account without giving up your email username and password.

I think there are two questions here. The first is whether users should be able to extract their data [including social graph data] from one service and import it into another. I personally believe the answer is Yes and this philosophy underlies what we've been working on at Windows Live and specifically the team I'm on which is responsible for the social graph contacts platform.

The next question is whether screen scraping is the way to get this data? I think the answer is definitely not. The first problem with this approach is that when I give some random "Web 2.0" social network my email username and password, I’m not only giving them access to my address book but also access to This seems like a lot of valuable data to trust  to some fly by night "Web 2.0"  service that can't seem to hire a full time sys admin or a full rack in a data center let alone know how to properly safeguard my personal information.

Another problem with this approach is that it encourages users to give up their usernames and passwords when prompted by any random Web site which increases incidences of phishing. Some have gone as far as calling this approach an anti-pattern that is kryptonite to the Open Web.

Finally, there is no way to identify the application that is accessing data on the user's behalf if it turns out to be a malicious application. For example, if you read articles like Are you getting Quechup spammed you'll note that there's been more than one incident where a "Web 2.0" company turned out to either be spamming users via the email addresses they had harvested in this manner or straight up just resold the email addresses to spammers. Have you ever wondered how much spam you get because someone who has your email address blithely gave up your email credentials to some social network site who in turn used a Web service that is run by spammers to retrieve your contact details?

So if I think that user's should be able to get out their data yet screen scraping isn't the way, what should we do? At Windows Live, we believe the right approach is to provide user-centric APIs which allow users to grant and revoke permission to third party applications to access their personal data. For the specific case of social graph data, we've provided an ALPHA Windows Live Contacts API which is intended to meet exactly this scenario. The approach taken by this API and similar patterns (e.g. using OAuth) solves all three concerns I've raised above.

Now given what I've written above, do you think Hotmail should actively block or hinder screen scraping applications used to obtain the email addresses of a user's contacts?


 

Categories: Platforms | Windows Live

I’ve read a number of stories this week that highlight that interoperability between social networking sites will be a “top ask” in 2008 (as we say at Microsoft). Earlier this week I read the Wired article Should Web Giants Let Startups Use the Information They Have About You? which does a good job of telling both sides of the story when it comes to startups screen scraping importing user data such as social graphs (i.e. friend and contact lists) from more successful sites as a way to bootstrap their social networks. The Wired article is a good read if you want to hear all sides of the story when it comes to the issue of sharing user social data between sites.

Yesterday, I saw Social Network Aggregation, Killer App in 2008? which points out the problem that users often belong to multiple social networks at once and that bridging between them is key. However I disagree with the premise that this points to need for a “Social Network Aggregator” category of applications. I personally believe that the list of 20 or so Social Network Aggregators on Mashable are all companies that would cease to exist if the industry got off it’s behind and worked towards actual interoperability between social networking sites.

Today, I saw saw Facebook disabled Robert Scoble’s account. After reading Robert’s account of the incident, I completely agree with Facebook.

Why Robert Scoble is Wrong and Facebook is Right

Here’s what Robert Scoble wrote about the incident

My account has been “disabled” for breaking Facebook’s Terms of Use. I was running a script that got them to keep me from accessing my account

I am working with a company to move my social graph to other places and that isn’t allowable under Facebook’s terms of service. Here’s the email I received:

+++++

Hello,

Our systems indicate that you’ve been highly active on Facebook lately and viewing pages at a quick enough rate that we suspect you may be running an automated script. This kind of Activity would be a violation of our Terms of Use and potentially of federal and state laws.

As a result, your account has been disabled. Please reply to this email with a description of your recent activity on Facebook. In addition, please confirm with us that in the future you will not scrape or otherwise attempt to obtain in any manner information from our website except as permitted by our Terms of Use, and that you will immediately delete and not use in any manner any such information you may have previously obtained.

The first thing to note is that Facebook allows you to extract your social graph data from their site using the Facebook platform. In fact, right now whenever I get an email from someone on my Facebook friend list in Outlook or I get a phone call from them, I see the picture from their Facebook profile. I did this using OutSync which is an application that utilizes the Facebook platform to merge data from my contacts in Outlook/Exchange with my Facebook contacts.

So if Facebook allows you to extract information about your Facebook friends via their APIs, why would Robert Scoble need to run a screen scraping script? The fact is that the information returned by the Facebook API about a user contains no contact information (no email address, no IM screen names, no telephone numbers, no street address). Thus if you are trying to “grow virally” by spamming the Facebook friend list of one of your new users about the benefits of your brand new Web 2.0 site then you have to screen scrape Facebook.  However there is the additional wrinkle that unlike address books in Web email applications Robert Scoble did not enter any of this contact information about his friends. With this in mind, it is hard for Robert Scoble to argue that the data is “his” to extract from Facebook. In addition, as a Facebook user I consider it a feature that Facebook makes it hard for my personal data to be harvested in this way. Secondly, since Robert’s script was screen scraping it means that it had to hit the site five thousand times (once for each of his contacts) to fetch all of Robert’s friends personally idenitifiable information (PII).  Given that eBay won a court injunction against Bidder’s Edge for running 100,000 queries a day, it isn’t hard to imagine that the kind of screen scraping script that Robert is using would be considered malicious even by a court of law.

I should note that Facebook is being a bit hypocritical here since they do screen scrape other sites to get the email addresses of the contacts of new users. This is why I’ve called them the Social Graph Roach Motel in the recent past. 

O’Reilly Social Graph FOO Camp

This past weekend I got an email from Tim O'Reilly, David Recordon, and Scott Kveton inviting me to a Friends of O’Reilly Camp (aka FOO Camp) dedicated to “social graph” problems. I’m still trying to figure out if I can make it based on my schedule and whether I’m really the best person to be representing Microsoft at such an event given that I’m a technical person and “social graph problems” for the most part are not technical issues.

Regardless of whether I am able to attend or not, there were some topics I wanted to recommend should be added to a list of “red herring” topics that shouldn’t be discussed until the important issues have been hashed out.

  • Google OpenSocial: This was an example of unfortunate branding. Google should really have called this “Google OpenWidgets” or “Google Gadgets for your Domain” since the goal was competing with Facebook’s widget platform not actually opening up social networks. Since widget platforms aren’t a “social graph problem” it doesn’t seem fruitful the spend time discussing this when there are bigger fish to fry.

  • Social Network Portability: When startups talk about “social network portability” it’s usually a euphemism for collecting a person’s username and password for another site, retrieving their contact/friend list and spamming those people about their hot new Web 2.0 startup. As a user of the Web, making it easier to receive spam from startups isn’t something I think should be done let alone a “problem” that needs solving. I understand that lots of people will disagree with this [even at Microsoft] but I’m convinced that this is not the real problem facing the majority of users of social networking sites on the the Web today.  

What I Want When It Comes to Social Network Interoperability

Having I’ve said what I don’t think is important to discuss when it comes to “social graph problems”, it would be rude not to provide an example fof what I think would be fruitful discussion. I wrote the problem I think we should be solving as an industry a while back in a post entitled A Proposal for Social Network Interoperability via OpenID which is excerpted below

I have a Facebook profile while my fiancée wife has a MySpace profile. Since I’m now an active user of Facebook, I’d like her to be able to be part of my activities on the site such as being able to view my photos, read my wall posts and leave wall posts of her own. I could ask her to create a Facebook account, but I already asked her to create a profile on Windows Live Spaces so we could be friends on that service and quite frankly I don’t think she’ll find it reasonable if I keep asking her to jump from social network to social network because I happen to try out a lot of these services as part of my day job. So how can this problem be solved in the general case? 

This is a genuine user problem which the established players have little incentive to fix. The data portability folks want to make it easy for you to jump from service to service. I want to make it easy for users of one service to talk to people on another service. Can you imagine if email interoperability was achieved by making it easy for Gmail users to export their contacts to Yahoo! mail instead of it being that Gmail users can send email to Yahoo! Mail users and vice versa?

Think about that.

Now playing: DJ Drama - The Art Of Storytellin' Part 4 (Feat. Outkast And Marsha Ambrosius)


 

For the past few years I've heard a lot of hype about dynamic programming languages like Python and Ruby. The word on the street has been that their dynamic nature makes developers more productive that those of us shackled to statically typed languages like C# and Java. A couple of weeks ago I decided to take the plunge and start learning Python after spending the past few years doing the majority of my software development in C#. I learned that it was indeed true that you could get things the same stuff done in far less lines of Python than you could in C#. Since it is a general truism in the software industry that the number of bugs per thousand lines of code is constant irrespective of programming language, the more you can get done in fewer lines of code, the less defects you will have in your software.

Shortly after I started using Python regularly as part of the prototyping process for developing new features for RSS Bandit, I started trying out C# 3.0.  I quickly learned that a lot of the features I'd considered as language bloat a couple of months ago actually made a lot of sense if you're familiar with the advantages of dynamic and functional programming approaches to the tasks of software development. In addition, C# 3.0 actually fixed one of the problems I'd encountered in my previous experience with a dynamic programming language while in college.

Why I Disliked Dynamism: Squeak Smalltalk

Back in my college days I took one of Mark Guzdial's classes which involved a group programming project using Squeak Smalltalk. At the time, I got the impression that Squeak was composed of a bunch of poorly documented libraries cobbled together from the top graded submissions to assignments from Mark's class. What was particularly frustrating about the lack of documentation was that even looking at method prototypes gave no insight into how to call a particular library. For example, here's an example of a SalariedEmployee class taken from an IBM SmallTalk tutorial

"A subclass of Employee that adds protocol needed for 
          employees with salaries"

       Employee subclass: #SalariedEmployee
          instanceVariableNames:  'position salary'
          classVariableNames: ' '
          poolDictionaries: ' ' !

       ! SalariedEmployee publicMethods !

          position: aString 
             position := aString !
          position 
             ^position !
          salary: n 
             salary := n !
          salary 
             ^salary ! !

In the example above, there is a method called salary() that takes a parameter n whose type we don't know. n could be a string, an integer or a floating point number. If you are using the SalariedEmployee class in your code and want to set the employee's salary, the only way to find out what to pass in is to grep through the code and find out how the method is being used by others. You can imagine how frustrating it gets when every time you want to perform a basic task like perform a Web request you have to grep around trying to figure out if the url parameter you pass to the Http classes is a string, a Uri class or some oter random thing you haven't encountered yet.

For a long time, this was my only experience with a dynamic programming language and I thought it sucked...a lot.

Why I Grew to Love Dynamism: XML and C#

The first half of my career at Microsoft was spent working on the XML team which was responsible for the core XML processing APIs that are utilized by the majority of Microsoft's product line. One of the things that was so cool about XML was that it enabled data formats to be as strongly structured or semi-structured depending on the needs of the application. This flexibility is what gives us data formats like the Atom syndication format which although rigidly structured in parts (e.g. atom:entry elements MUST contain exactly one atom:id element, etc)  also supports semi-structured data (e.g. atom:content can contain blocks of XHTML) and enables distributed extensibility where anyone on the Web is free to extend the data format as long as they place their extensions in the right namespace.

However one problem we repeatedly bumped against is that data formats that can have unknown data types show up in them at runtime bump up against the notion of static typing that is a key aspect of languages in C#. I've written about this in the past in posts such as What's Right and Wrong with Code Generation in Web Services which is excerpted below

Another problem is that the inflexible and rigid requirements of static typing systems runs counter to the distributed and flexible nature of the Web. I posted a practical example a few years ago in my post entitled Why You Should Avoid Using Enumerated Types in XML Web Services. In that example, I pointed out that if you have a SOAP Web Service  that returns an enumeration with the possible value {CDF, RSS10, RSS20} and in a future release modify that enumeration by adding a new syndication format {CDF, RSS10, RSS20, Atom} then even if you never return that syndication format to old clients written in .NET, these clients will still have to be recompiled because of the introduction of a new enumeration value. I find it pretty ridiculous that till today I have list of "people we need to call and tell to recompile their code whenever we change an enum value in any of our SOAP Web Services".

I came to the realization that some degree of dynamism is desirable especially when dealing with the loosely coupled world of distributed programming on the Web. I eventually decided to ignore my earlier misgivings and start exploring dynamic programming languages. I chose IronPython because I could focus on learning the language while relying on the familiar .NET Framework class library when I wanted to deal with necessary tasks like file I/O or Web requests.

After getting up to speed with Python and then comparing it to C# 2.0, it was clear that the dynamic features of Python made my life as a programmer a lot easier. However something interesting happened along the way. Microsoft shipped C# 3.0 around the same time I started delving into Python. As I started investigating C# 3.0, I discovered that almost all the features I'd fallen in love with in Python which made my life as a developer easier had been integrated into C#. In addition, there was also a feature which is considered to be a killer feature of the Ruby programming language which also made it into C# 3.0.

Python vs. C# 3.0: Lambda Expressions

According to the Wikipedia entry on Dynamic Programming Languages

There are several mechanisms closely associated with the concept of dynamic programming. None are essential to the classification of a language as dynamic, but most can be found in a wide variety of such languages.
...
Higher-order functions
However, Erik Meijer and Peter Drayton caution that any language capable of loading executable code at runtime is capable of eval in some respect, even when that code is in the form of dynamically linked shared libraries of machine code. They suggest that higher-order functions are the true measure of dynamic programming, and some languages "use eval as a poor man's substitute for higher-order functions."[1]

The capability of a programming language to treat functions as first class objects that can be the input(s) or the output(s) of a function call is a key feature of many of today's popular "dynamic" programming languages. Additionally, creating a short hand syntax where anonymous blocks of code can be treated as function objects is now commonly known as "lambda expressions". Although C# has had functions as first class objects since version 1.0 with delegates and introduced anonymous delegates in C# 2.0, it is in C# 3.0 where the short hand syntax of lambda expressions has found its way into the language. Below are source code excerpts showing the difference between the the lambda expression functionality in C# and IronPython 

C# Code

//decide what filter function to use depending on mode 
Func<RssItem, bool> filterFunc = null;
if(mode == MemeMode.PopularInPastWeek) 
   filterFunc = x => (DateTime.Now - x.Date < one_week) ;
else 
   filterFunc = x => x.Read == false;

IronPython Code

#decide what filter function to use depending on mode
filterFunc = mode and (lambda x : (DateTime.Now - x.date) < one_week) or (lambda x : x.read == 0)

Although the functionality is the same, it takes a few more lines of code to express the same idea in C# 3.0 than in Python. The main reason for this is due to the strong and static typing requirements in C#. Ideally developers should be able to write code like 

Func<RssItem, bool> filterFunc = (mode == MemeMode.PopularInPastWeek ? x => (DateTime.Now - x.Date < one_week) : x => x.read == false);

However this doesn’t work because the compiler cannot determine whether each of the lambda expressions that can be returned by the conditional expression are of the same type. Despite the limitations due to the static and strong typing requirements of C#, the lambda expression feature in C# 3.0 is extremely powerful.

You don’t have to take my word for it. Read Joel Spolsky’s Can Your Programming Language Do This? and Peter Norvig’s Design Patterns in Dynamic Programming.  Peter Norvig’s presentation makes a persuasive argument that a number of the Gang of Four’s Design Patterns either require a lot less code or are simply unneeded in a dynamic programming language that supports higher order functions. For example, he argues that the Strategy pattern does not need separate classes for each algorithm in a dynamic language and that closures eliminate the need for Iterator classes. Read the entire presentation, it is interesting and quite illuminating.

Python vs. C# 3.0: List Comprehensions vs. Language Integrated Query

A common programming task is to iterate over a list of objects and either filter or transform the objects in the list thus creating a new list. Python has list comprehensions as a way of simplifying this common programming task. Below is an excerpt from An Introduction to Python by Guido van Rossum on list expressions   

List comprehensions provide a concise way to create lists without resorting to use of map(), filter() and/or lambda. The resulting list definition tends often to be clearer than lists built using those constructs. Each list comprehension consists of an expression followed by a for clause, then zero or more for or if clauses. The result will be a list resulting from evaluating the expression in the context of the for and if clauses which follow it.

Below is a code sample showing how list comprehensions can be used to first transform a list of objects (i.e. XML nodes) to another (i.e. RSS items) and then how the resulting list can be further filtered to those from a particular date. 

IronPython Code

 # for each item in feed        
 # convert each <item> to an RssItem object then apply filter to pick candidate items
  items = [ MakeRssItem(node) for node in doc.SelectNodes("//item")]
  filteredItems = [item for item in items if filterFunc(item)]

My friend Erik Meijer once observed that certain recurring programming patterns become more obvious as a programming language evolves, these patterns first become encapsulated by APIs and eventually become part of the programming language’s syntax. This is what happened in the case of the Python’s map() and filter() functions which eventually gave way to list comprehensions.

C# 3.0 does something similar but goes a step further. In C# 3.0, the language designers made the observation that performing SQL-like projection and selection is really the common operation and not just filtering/mapping of lists. This lead to Language Integrated Query (LINQ). Below is the same filtering operation on a list of XML nodes performed using C# 3.0

C# 3.0 Code

//for each item in feed        
// convert each <item> to an RssItem object then apply filter to pick candidate items
var items = from rssitem in 
              (from itemnode in doc.Descendants("item") select MakeRssItem(itemnode))
            where filterFunc(rssitem)
            select rssitem;

These are two fundamentally different approaches to tackling the same problem. Where LINQ really shines is when it is combined with custom data sources that have their own query languages such as with LINQ to SQL and LINQ to XML which map the query operations to SQL and XPath queries respectively.

Python vs. C# 3.0: Tuples and Dynamic Typing vs. Anonymous Types and Type Inferencing

As I’ve said before, tuples are my favorite Python feature. I’ve found tuples useful in situations where I have to temporarily associate two or three objects and don’t want to go through the hassle of creating a new class just to represent the temporary association between these types. I’d heard that a new feature in C# 3.0 called anonymous types which seemed like it would be just what I need to fix this pet peeve once and for all. The description of the feature is as follows

Anonymous types are a convenient language feature of C# and VB that enable developers to concisely define inline CLR types within code, without having to explicitly define a formal class declaration of the type.

I assumed this feature in combination with the var keyword would make it so I would no longer miss Python tuples when I worked with C#. However I was wrong. Let’s compare two equivalent blocks of code in C# and IronPython. Pay particular attention to the highlighed lines

IronPython Code

      for item in filteredItems:
            vote = (voteFunc(item), item, feedTitle)

            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) == None:
                    all_links[url] = []
                all_links.get(url).append(vote)

    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

The key things to note about the above code block are (i) the variable named vote is a tuple of three values; the numeric weight given to a link received from a particular RSS item, an RSS item and the title of the feed Python and (ii) the items in the tuple can be unpacked into individual variables when looping over the contents of the tuple in a for loop.

Here’s the closest I could come in C# 3.0

C# 3.0 Code

// calculate vote for each outgoing url
 foreach (RssItem item in items) { 
       var vote = new Vote(){ Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };
       //add a vote for each of the URLs
       foreach (var url in item.OutgoingLinks.Keys) {
           List<Vote> value = null;
           if (!all_links.TryGetValue(url, out value))
                value = all_links[url] = new List<Vote>(); 
                            
           value.Add(vote);                                                    
         }
   }// foreach (RssItem item in items)
//tally the votes
  List<RankedLink> weighted_links = new List<RankedLink>();
  foreach (var link_n_votes in all_links) {
       Dictionary<string, double> site = new Dictionary<string, double>();
       foreach (var vote in link_n_votes.Value) {
           double oldweight;
           site[vote.FeedTitle] = site.TryGetValue(vote.FeedTitle, out oldweight) ? 
                                  Math.Min(oldweight, vote.Weight): vote.Weight; 
        }
        weighted_links.Add(new RankedLink(){Score=site.Values.Sum(), Url=link_n_votes.Key});
    }
    weighted_links.Sort((x, y) => y.Score.CompareTo(x.Score));

The relevant line above is

var vote = new Vote() { Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };

which I had INCORRECTLY assumed I would be able to write as

var vote = new { Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };

In Python, dynamic typing is all about the developer knowing what types they are working with while the compiler is ignorant about the data types. However type inferencing in C# supports the opposite scenario, when the compiler knows the data types but the developer does not.  

The specific problem here is that if I place an anonymous type in a list, I have no way of knowing what the data type of the object I’m pulling out of the list will be. So I will either have to interact with them as instances of System.Object when popped from the list which makes them pretty useless or access their fields via reflection. Python doesn’t have this problem because I don’t need to know the type of an object to interact with it, I just need to know how what properties/fields and methods it supports.

At the end of the day, I realized that the var keyword is really only useful when constructing anonymous types as a result of LINQ expressions. In every other instance where it is acceptable to use var, you have to know the type of the object anyway so all you are doing is saving keystrokes by using it. Hmmmm.

Ruby vs. C# 3.0: Extension Methods

Extension methods is a fairly disconcerting feature that has been made popular by the Ruby programming language. The description of the feature is excerpted below

Extension methods allow developers to add new methods to the public contract of an existing CLR type, without having to sub-class it or recompile the original type.  Extension Methods help blend the flexibility of "duck typing" support popular within dynamic languages today with the performance and compile-time validation of strongly-typed languages.

I consider this feature to be the new incarnation of operator overloading. Operator overloading became widely reviled because it made code harder to read because you couldn’t just look at a code block and know what it does if you didn’t know how the operators had been implemented. Similarly, looking at an excerpt of C# code you may not realize that everything isn’t what it seems.

I spent several minutes being confused today because I couldn’t get the line

XAttribute read_node = itemnode.XPathEvaluate("//@*[local-name() = 'read']") as XAttribute;

to compile. It turns out that XPathEvaluate is an extension method and you need to import the System.Xml.XPath namespace into your project before the XPathEvaluate() method shows up as a method in the XElement class.

I’ve heard the arguments that Ruby makes it easier to express programmer intent and although I can see how XElement.XPathEvaluate(string) is a more readable choice than XPathQueryEngine.Evaluate(XElement, string) if you want to perform an XPath query on an XElement object, for now I think the readability issues it causes by hiding dependencies isn’t worth it. I wonder if any Ruby developers out there with a background in other dynamic languages that don’t have that feature (e.g. Python) care to offer a counter opinion based on their experience?

FINAL THOUGHTS

C# has added features that make it close to being on par with the expressiveness of functional and dynamic programming languages. The only thing missing is dynamic typing (not duck typing), which I’ve come to realize is has a lot more going for it than lots of folks in the strongly and statically typed world would care to admit. At first, I had expected that after getting up to speed with C# 3.0, I’d lose interest in Python but that is clearly not the case.

I love the REPL, I love the flexibility that comes from having natural support tuples in the language and I love the more compact syntax.  I guess I’ll be doing a lot more coding in Python in 2008.

Now Playing: Da Back Wudz - U Gonna Love Me


 

Categories: Programming

January 2, 2008
@ 03:05 AM

A few weeks ago, I wrote a prototype for the meme tracking feature of RSS Bandit in IronPython. The code was included in my blog post A Meme Tracker In IronPython.   The script was a port of Sam Ruby's original MeMeme script which shows the most recently popular links from from a set of RSS feeds.

I was impressed with how succinct the code was in IronPython when compared to what the code eventually looked like when I ported it to C# 2.0 and integrated it into RSS Bandit. Looking over the list of new features in C# 3.0, it occurred to me that a C# 3.0 version of the script would be as concise or even more concise than the IronPython version. So I ported the script to C# 3.0 and learned a few things along the way.

I'll post something shortly that goes into some details on my perspectives on the pros and cons of the various C# 3.0 features when compared to various Python features. For now, here's the meme tracker script in C# 3.0. Comparing it to the IronPython version should provide some food for thought.


using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
using System.IO;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Globalization;

namespace Memetracker {

    enum MemeMode { PopularInUnread, PopularInPastWeek }

    class RankedLink{
       public string Url { get; set;}
       public double Score { get; set; }   
    }

    class Vote {
        public double Weight { get; set; }
        public RssItem Item { get; set; }
        public string FeedTitle { get; set; }
    }


    class RssItem {
        public string Title { get; set; }
        public DateTime Date { get; set; }
        public bool Read { get; set; }
        public string Permalink { get; set; }
        public Dictionary<string, string> OutgoingLinks { get; set; }
    }

    class Program {

        static Dictionary<string, List<Vote>> all_links = new Dictionary<string, List<Vote>>();
        static TimeSpan one_week = new TimeSpan(7, 0, 0, 0);
        static MemeMode mode = MemeMode.PopularInPastWeek;

        static string cache_location = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData), "Temp");
        static string href_regex = @"<a[\s]+[^>]*?href[\s]?=[\s""']+(.*?)[\""']+.*?>([^<]+|.*?)?<\/a>";
        static Regex regex       = new Regex(href_regex);

        static RssItem MakeRssItem(XElement itemnode) {

            XElement link_node = itemnode.Element("link");
            var permalink = (link_node == null ? "" : link_node.Value);
            XElement title_node = itemnode.Element("title");
            var title = (title_node == null ? "" : title_node.Value);
            XElement date_node = itemnode.Element("pubDate");
            var date = (date_node == null ? DateTime.Now : DateTime.Parse(date_node.Value, null, DateTimeStyles.AdjustToUniversal));
            XAttribute read_node = itemnode.XPathEvaluate("//@*[local-name() = 'read']") as XAttribute;
            var read = (read_node == null ? false : Boolean.Parse(read_node.Value));
            XElement desc_node = itemnode.Element("description");
            // obtain href value and link text pairs
            var outgoing = (desc_node == null ? regex.Matches(String.Empty) : regex.Matches(desc_node.Value));
            var outgoing_links = new Dictionary<string, string>();
            //ensure we only collect unique href values from entry by replacing list returned by regex with dictionary
            if (outgoing.Count > 0) {
                foreach (Match m in outgoing)
                    outgoing_links[m.Groups[1].Value] = m.Groups[2].Value;
            }
            return new RssItem() { Permalink = permalink, Title = title, Date = date, Read = read, OutgoingLinks = outgoing_links };
        }

        static void Main(string[] args) {

            if (args.Length > 0) //get directory of RSS feeds
                cache_location = args[0];
            if (args.Length > 1) //mode = 0 means use only unread items, mode != 0 means use all items from past week
                mode = (Int32.Parse(args[1]) != 0 ? MemeMode.PopularInPastWeek : MemeMode.PopularInUnread);

            Console.WriteLine("Processing items from {0} seeking items that are {1}", cache_location,
                (mode == MemeMode.PopularInPastWeek ? "popular in items from the past week" : "popular in unread items"));
            
            //decide what filter function to use depending on mode 
            Func<RssItem, bool> filterFunc = null;
            if(mode == MemeMode.PopularInPastWeek) 
                filterFunc = x => (DateTime.Now - x.Date < one_week) ;
            else 
                filterFunc = x => x.Read == false;

            //in mode = 0 each entry linking to an item counts as a vote, in mode != 0 value of vote depends on item age
            Func<RssItem, double> voteFunc   = null; 
            if(mode == MemeMode.PopularInPastWeek) 
                voteFunc = x => 1.0 - (DateTime.Now.Ticks - x.Date.Ticks) * 1.0 / one_week.Ticks; 
            else 
                voteFunc = x => 1.0;

            
            var di = new DirectoryInfo(cache_location); 
            foreach(var fi in di.GetFiles("*.xml")){
                var doc = XElement.Load(Path.Combine(cache_location, fi.Name));
                // for each item in feed
                //  1. Get permalink, title, read status and date
                //  2. Get list of outgoing links + link title pairs
                //  3. Convert above to RssItem object
                //  4. apply filter to pick candidate items
                var items = from rssitem in 
                            (from itemnode in doc.Descendants("item")                            
                            select MakeRssItem(itemnode))
                            where filterFunc(rssitem)
                            select rssitem;
                var feedTitle = doc.XPathSelectElement("channel/title").Value;
                // calculate vote for each outgoing url
                foreach (RssItem item in items) { 
                    var vote = new Vote(){ Weight=voteFunc(item), Item=item, FeedTitle=feedTitle };
                    //add a vote for each of the URLs
                    foreach (var url in item.OutgoingLinks.Keys) {
                        List<Vote> value = null;
                        if (!all_links.TryGetValue(url, out value))
                            value = all_links[url] = new List<Vote>(); 
                            
                        value.Add(vote);                                                    
                    }
                }// foreach (RssItem item in items)
            }// foreach(var fi in di.GetFiles("*.xml"))
           
            //tally the votes
            List<RankedLink> weighted_links = new List<RankedLink>();
            foreach (var link_n_votes in all_links) {
                Dictionary<string, double> site = new Dictionary<string, double>();
                foreach (var vote in link_n_votes.Value) {
                    double oldweight;
                    site[vote.FeedTitle] = site.TryGetValue(vote.FeedTitle, out oldweight) ? 
                                            Math.Min(oldweight, vote.Weight): vote.Weight; 
                }
                weighted_links.Add(new RankedLink(){Score=site.Values.Sum(), Url=link_n_votes.Key});
            }
            weighted_links.Sort((x, y) => y.Score.CompareTo(x.Score));

            //output the results, choose link text from first item we saw story linked from
            Console.WriteLine("<html><body><ol>");
            foreach(var rankedlink in weighted_links.GetRange(0, 10)){
                var link_text = (all_links[rankedlink.Url][0]).Item.OutgoingLinks[rankedlink.Url];
                Console.WriteLine("<li><a href='{0}'>{1}</a> {2}", rankedlink.Url, link_text, rankedlink.Score);
                Console.WriteLine("<p>Seen on:");
                Console.WriteLine("<ul>");
                foreach (var vote in all_links[rankedlink.Url]) {
                    Console.WriteLine("<li>{0}: <a href='{1}'>{2}</a></li>", vote.FeedTitle, vote.Item.Permalink, vote.Item.Title);
                }
                Console.WriteLine("</ul></p></li>");
            }
            Console.WriteLine("</ol></body></html>");
            Console.ReadLine();
        }
    }
}
Now Playing: Lloyd Banks - Boywonder
 

Categories: Programming

A few days ago I blogged about my plans to make RSS Bandit a desktop client for Google Reader. As part of that process I needed to verify that it is possible to programmatically interact with Google Reader from a desktop client in a way that provides a reasonable user experience. To this end, I wrote a command line client in IronPython based on the documentation I found at the pyrfeed Website.

The command line client isn't terribly useful on its own as a way to read your feeds but it might be useful for other developers who are trying to interact with Google Reader programmatically who would learn better from  code samples than reverse engineered API documentation.

Enjoy...

PS: Note the complete lack of error handling. I never got a hang of error handling in Python let alone going back and forth between handling errors in Python vs. handling underlying .NET/CLR errors.


import sys
from System import *
from System.IO import *
from System.Net import *
from System.Text import *
from System.Globalization import DateTimeStyles
import clr
clr.AddReference("System.Xml")
from System.Xml import *
clr.AddReference("System.Web")
from System.Web import *

#################################################################
#
# USAGE: ipy greader.py <Gmail username> <password> <path-to-directory-for-storing-feeds>
# 
# username & password are required
# feed directory location is optional, defaults to C:\Windows\Temp\
#################################################################

#API URLs
auth_url          = rhttps://www.google.com/accounts/ClientLogin?continue=http://www.google.com&service=reader&source=Carnage4Life&Email=%s&Passwd=%s
feed_url_prefix   = rhttp://www.google.com/reader/atom/
api_url_prefix    = rhttp://www.google.com/reader/api/0/
feed_cache_prefix = r"C:\\Windows\Temp\\"
add_url           = r"http://www.google.com/reader/quickadd"

#enumerations
(add_label, remove_label) = range(1,3)

class TagList:
    """Represents a list of the labels/tags used in Google Reader"""
    def __init__(self, userid, labels):
        self.userid = userid
        self.labels = labels

class SubscriptionList:
    """Represents a list of RSS feeds subscriptions"""
    def __init__(self, modified, feeds):
        self.modified = modified
        self.feeds    = feeds

class Subscription:
    """Represents an RSS feed subscription"""
    def __init__(self, feedid, title, categories, firstitemmsec):
        self.feedid        = feedid
        self.title         = title
        self.categories    = categories
        self.firstitemmsec = firstitemmsec

def MakeHttpPostRequest(url, params, sid):
    """Performs an HTTP POST request to a Google service and returns the results in a HttpWebResponse object"""
    req = HttpWebRequest.Create(url)
    req.Method = "POST"
    SetGoogleCookie(req, sid)

    encoding = ASCIIEncoding();
    data     = encoding.GetBytes(params)

    req.ContentType="application/x-www-form-urlencoded"
    req.ContentLength = data.Length
    newStream=req.GetRequestStream()
    newStream.Write(data,0,data.Length)
    newStream.Close()
    resp = req.GetResponse()
    return resp

def MakeHttpGetRequest(url, sid):
    """Performs an HTTP GET request to a Google service and returns the results in an XmlDocument"""
    req          = HttpWebRequest.Create(url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    doc          = XmlDocument()
    doc.LoadXml(reader.ReadToEnd())
    return doc

def GetToken(sid):
    """Gets an edit token which is needed for any edit operations using the Google Reader API"""
    token_url = api_url_prefix + "token"
    req          = HttpWebRequest.Create(token_url)
    SetGoogleCookie(req, sid)
    reader = StreamReader(req.GetResponse().GetResponseStream())
    return reader.ReadToEnd()

def MakeSubscription(xmlNode):
    """Creates a Subscription class out of an XmlNode that was obtained from the feed list"""
    id_node     = xmlNode.SelectSingleNode("string[@name='id']")
    feedid      = id_node and id_node.InnerText or ''
    title_node  = xmlNode.SelectSingleNode("string[@name='title']")
    title       = title_node and title_node.InnerText or ''
    fim_node    =  xmlNode.SelectSingleNode("string[@name='firstitemmsec']")
    firstitemmsec = fim_node and fim_node.InnerText or ''
    categories  = [MakeCategory(catNode) for catNode in xmlNode.SelectNodes("list[@name='categories']/object")]
    return Subscription(feedid, title, categories, firstitemmsec)

def MakeCategory(catNode):
    """Returns a tuple of (label, category id) from an XmlNode representing a feed's labels that was obtained from the feed list"""
    id_node     = catNode.SelectSingleNode("string[@name='id']")
    catid       = id_node and id_node.InnerText or ''
    label_node  = catNode.SelectSingleNode("string[@name='label']")
    label       = label_node and label_node.InnerText or ''
    return (label, catid)

def AuthenticateUser(username, password):
    """Authenticates the user and returns a username/password combination"""
    req = HttpWebRequest.Create(auth_url % (username, password))
    reader = StreamReader(req.GetResponse().GetResponseStream())
    response = reader.ReadToEnd().split('\n')
    for s in response:
        if s.startswith("SID="):
            return s[4:]

def SetGoogleCookie(webRequest, sid):
    """Sets the Google authentication cookie on the HttpWebRequest instance"""
    cookie = Cookie("SID", sid, "/", ".google.com")
    cookie.Expires = DateTime.Now + TimeSpan(7,0,0,0)
    container      = CookieContainer()
    container.Add(cookie)
    webRequest.CookieContainer = container

def GetSubscriptionList(feedlist, sid):
    """Gets the users list of subscriptions"""
    feedlist_url = api_url_prefix + "subscription/list"
    #download the JSON-esque XML feed list
    doc = MakeHttpGetRequest(feedlist_url, sid)

    #create subscription nodes
    feedlist.feeds  = [MakeSubscription(node) for node in doc.SelectNodes("/object/list[@name='subscriptions']/object")]
    feedlist.modified = False

def GetTagList(sid):
  """Gets a list of the user's tags"""
  taglist_url = api_url_prefix + "tag/list"
  doc = MakeHttpGetRequest(taglist_url, sid)
  #get the user id needed for creating new labels from Google system tags

  userid = doc.SelectSingleNode("/object/list/object/string[contains(string(.), 'state/com.google/starred')]").InnerText
  userid = userid.replace("/state/com.google/starred", "");
  userid = userid[5:]
  #get the user-defined labels
  tags = [node.InnerText.Replace("user/" + userid + "/label/" ,"") for node in doc.SelectNodes("/object/list[@name='tags']/object/string[@name='id']") if node.InnerText.IndexOf( "/com.google/") == -1 ]
  return TagList(userid, tags)

def DownloadFeeds(feedlist, sid):
    """Downloads each feed from the subscription list to a local directory"""
    for feedinfo in feedlist.feeds:
        unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)
        oneweek_ago   = DateTime.Now - TimeSpan(7,0,0,0)
        ifmodifiedsince = oneweek_ago - unixepoch
        feed_url = feed_url_prefix + feedinfo.feedid +  "?n=25&r=o&ot=" + str(int(ifmodifiedsince.TotalSeconds))
        continuation = True
        continuation_token = ''
        feedDoc      = None

        while True:
            print "Downloading feed at %s" % (feed_url  + continuation_token)
            doc = MakeHttpGetRequest(feed_url + continuation_token, sid)
            continuation_node     = doc.SelectSingleNode("//*[local-name()='continuation']")
            continuation_token    = continuation_node and ("&c=" + continuation_node.InnerText) or ''

            if feedDoc is None:
                feedDoc = doc
            else:
                for node in doc.SelectNodes("//*[local-name()='entry']"):
                    node = feedDoc.ImportNode(node, True)
                    feedDoc.DocumentElement.AppendChild(node)

            if continuation_token == '':
                break

        print "Saving %s" % (feed_cache_prefix + feedinfo.title + ".xml")
        feedDoc.Save(feed_cache_prefix + feedinfo.title + ".xml")

def ShowSubscriptionList(feedlist, sid):
    """Displays the users list of subscriptions including the labels applied to each item"""
    if feedlist.modified:
        GetSubscriptionList(feedlist, sid)
    count = 1
    for feedinfo in feedlist.feeds:
        print "%s. %s (%s)" % (count, feedinfo.title, [category[0] for category in feedinfo.categories])
        count = count + 1

def Subscribe(url, sid):
    """Subscribes to the specified feed URL in Google Reader"""
    params        = "quickadd=" + HttpUtility.UrlEncode(url) + "&T=" + GetToken(sid)
    resp = MakeHttpPostRequest(add_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "%s successfully added to subscription list" % url
        return True
    else:
        print resp.StatusDescription
        return False

def Unsubscribe(index, feedlist, sid):
    """Unsubscribes from the feed at the specified index in the feed list"""
    unsubscribe_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=unsubscribe&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(unsubscribe_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully removed from subscription list" % feed.title
        return True
    else:
        print resp.StatusDescription
        return False

def Rename(new_title, index, feedlist, sid):
    """Renames the feed at the specified index in the feed list"""
    api_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + new_title  + "&s=" + feed.feedid
    resp = MakeHttpPostRequest(api_url, params, sid)

    if resp.StatusCode == HttpStatusCode.OK:
        print "'%s' successfully renamed to '%s'" % (feed.title, new_title)
        return True
    else:
        print resp.StatusDescription
        return False

def EditLabel(label, editmode, userid, feedlist, index, sid):
    """Adds or removes the specified label to the feed at the specified index depending on the edit mode"""
    full_label = "user/" + userid + "/label/" + label
    label_url = api_url_prefix + "subscription/edit"
    feed = feedlist.feeds[index]
    params = "ac=edit&i=null&T=" + GetToken(sid) + "&t=" + feed.title  + "&s=" + feed.feedid

    if editmode == add_label:
        params = params + "&a=" + full_label
    elif editmode == remove_label:
        params = params + "&r=" + full_label
    else:
        return

    resp = MakeHttpPostRequest(label_url, params, sid)
    if resp.StatusCode == HttpStatusCode.OK:
        print "Successfully edited label '%s' of feed '%s'" % (label, feed.title)
        return True
    else:
        print resp.StatusDescription
        return False

def MarkAllItemsAsRead(index, feedlist, sid):
    """Marks all items from the selected feed as read"""
    unixepoch  = DateTime(1970, 1,1, 0,0,0,0, DateTimeKind.Utc)

    markread_url = api_url_prefix + "mark-all-as-read"
    feed = feedlist.feeds[index]
    params = "s=" + feed.feedid + "&T=" + GetToken(sid) + "&ts=" + str(int((DateTime.Now - unixepoch).TotalSeconds))
    MakeHttpPostRequest(markread_url, params, sid)
    print "All items in '%s' have been marked as read" % feed.title

def GetFeedIndexFromUser(feedlist):
    """prompts the user for the index of the feed they are interested in and returns the index as the result of this function"""
    print "Enter the numeric position of the feed from 1 - %s" % (len(feedlist.feeds))
    index = int(sys.stdin.readline().strip())
    if (index < 1) or (index > len(feedlist.feeds)):
        print "Invalid index specified: %s" % feed2label_indx
        return -1
    else:
        return index

if __name__ == "__main__":
       if len(sys.argv) < 3:
           print "ERROR: Please specify a Gmail username and password"
       else:
           if len(sys.argv) > 3:
               feed_cache_prefix = sys.argv[3]

           SID = AuthenticateUser(sys.argv[1], sys.argv[2])
           feedlist = SubscriptionList(True, [])
           GetSubscriptionList(feedlist, SID)
           taglist = GetTagList(SID)

           options = "***Your options are (f)etch your feeds, (l)ist your subscriptions, (s)ubscribe to a new feed, (u)nsubscribe, (m)ark read , (r)ename, (a)dd a label to a feed, (d)elete a label from a feed or (e)xit***"
           print "\n"

           while True:
               print options
               cmd = sys.stdin.readline()
               if cmd == "e\n":
                   break
               elif cmd == "l\n": #list subscriptions
                   ShowSubscriptionList(feedlist, SID)
               elif cmd == "s\n": #subscribe to a new feed
                   print "Enter url: "
                   new_feed_url = sys.stdin.readline().strip()
                   success = Subscribe(new_feed_url, SID)

                   if feedlist.modified == False:
                       feedlist.modified = success
               elif cmd == "u\n": #unsubscribe from a feed
                   feed2remove_indx = GetFeedIndexFromUser(feedlist)
                   if feed2remove_indx != -1:
                       success = Unsubscribe(feed2remove_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "r\n": #rename a feed
                   feed2rename_indx = GetFeedIndexFromUser(feedlist)
                   if feed2rename_indx != -1:
                       print "'%s' selected" % feedlist.feeds[feed2rename_indx -1].title
                       print "Enter the new title for the subscription:"
                       success = Rename(sys.stdin.readline().strip(), feed2rename_indx-1, feedlist, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               elif cmd == "f\n": #fetch feeds
                   feedlist = DownloadFeeds(feedlist, SID)
               elif cmd == "m\n": #mark all items as read
                   feed2markread_indx = GetFeedIndexFromUser(feedlist)
                   if feed2markread_indx != -1:
                       MarkAllItemsAsRead(feed2markread_indx-1, feedlist, SID)
               elif (cmd == "a\n") or (cmd == "d\n"): #add/remove a label on a feed
                   editmode = (cmd == "a\n") and add_label or remove_label
                   feed2label_indx = GetFeedIndexFromUser(feedlist)
                   if feed2label_indx != -1:
                       feed = feedlist.feeds[feed2label_indx-1]
                       print "'%s' selected" % feed.title
                       print "%s" % ((cmd == "a\n") and "Enter the new label:" or "Enter the label to delete:")
                       label_name = sys.stdin.readline().strip()
                       success = EditLabel(label_name, editmode, taglist.userid, feedlist, feed2label_indx-1, SID)

                       if feedlist.modified == False:
                           feedlist.modified = success
               else:
                   print "Unknown command"

Now Playing: DJ Drama - Cannon (Remix) (Feat. Lil Wayne, Willie The Kid, Freeway And T.I.)


 

Categories: Programming