Friday, 18 January 2008 - Dare Obasanjo's weblog

January 17, 2008

@ 04:00 AM

Regional Networks on Facebook are a Design Flaw

I'm also befuddled by the slippery slope of Facebook. Today, they announced public search listings on Facebook. I'm utterly fascinated by how people talk about Facebook as being more private, more secure than MySpace. By default, people's FB profiles are only available to their network. Join a City network and your profile is far more open than you realize. Accept the default search listings and you're findable on Google. The default is far beyond friends-only and locking a FB profile down to friends-only takes dozens of clicks in numerous different locations. Plus, you never can really tell because if you join a new network, everything is by-default open to that network (including your IM and phone number).

From Caroline McCarthy: Report: Facebook threatens to ban Gawker's Denton

Facebook isn't too happy with Gawker Media founder Nick Denton over some screenshots of a member's profile that he posted on Gawker.com on Tuesday, Portfolio.com reports. The social-networking site reportedly plans to send a warning letter to the New York-based digital-media entrepreneur citing several terms-of-service violations--one more, and he's out.

Facebook representatives were not immediately available for comment.

On Tuesday, Denton--who took over as managing editor of Gawker.com this month after several staff departures--posted a bit of an expose on 25-year-old Emily Brill, daughter of New York publishing figure Steve Brill. Screenshots of the younger Brill's Facebook profile, featuring glamorous photos of a yachting trip to the British Virgin Islands, as well as excited "status" messages about an impending trip to the Caribbean luxury getaway of St. Barth's, were juxtaposed with an older photograph of the Brown graduate when she was significantly heavier.
…
It's not clear whether Denton and Brill are "friends" on the site, or if it was even Denton (rather than a source or another Gawker Media employee) who pulled the screenshots from Facebook. But both Denton and Brill are members of the New York regional network, so there is a chance that Denton would have been able to view Brill's profile even without being connected as friends.

It boggles my mind that someone sat down and coded “Anyone who lives in the same city as me” as a privacy control and didn’t immediately smack themselves on the head for writing something so ridiculously useless and that is guaranteed to cause privacy issues.

It would have been easier to have a notion of public profiles and appropriate scary warnings or defaults that protected people’s privacy than the farce that is “regional networks”.

Now playing: Chris Brown - Say Goodbye

Categories: Social Software

January 17, 2008

@ 04:00 AM

Comments [2]

Myth: RESTful Web Services Don't Need an Interface Definition Language

I’ve noticed a meme that seems to have been going around in various blogs of folks who continue to indulge in the long since dead REST vs. SOAP discussion. This meme is that (i) you don’t want or need an interface definition language if you are building a RESTful Web Service and (ii) an interface definition language for RESTful Web Services has to look something like CORBA IDL or WSDL.

You can find examples of this kind of thinking in blog posts like Steve Vinoski’s IDLs vs. Human Documentation post excerpted below

Note that Patrick mostly talks about data schemas, whereas my posting talks only of interface definition languages. These are two very different things, which I’ve noted in comments on his blog. In a reply comment he said they’re both metadata, which is true, but still, they’re very separable. REST depends heavily on data definitions, but it doesn’t require specialized interface definitions because it promotes a uniform interface. For data definition REST relies on and promotes media/MIME types, and the standardization of such data definitions is critical to allowing independently-developed consumers and providers to interact correctly. I doubt Patrick and I really disagree on this last point.

and Ryan Tomayko’s Speaking of, "lying through their teeth..." also excerpted below

The WS-* folks have historically been obsessed with making things easy, usually for an imaginary business analyst who is nowhere near as technically adept as they. The REST folks, on the other hand, seem much more interested in keeping the entire stack simple, and for everyone involved.

This difference in priorities (easy vs. simple) often manifests itself in arguments about technological issues on the surface. Take the never ending debate about whether REST needs a description language like WSDL; which, incidentally, Sanjiva is largely responsible for. If building systems in your world can be made easier with the addition of a description language, then WSDL probably makes a lot of sense. If, however, building distributed systems in your world is a tediously hard pain in the ass whether you have these cockamamie description files or not, well, then you fight to keep the system as simple as possible by reducing the number of actors, dependencies, and concepts to an absolute minimum.

Let’s start with Steve Vinoski’s post. Steve is right to point out that there is a difference between data schemas and interface definitional languages. When building services with WS-*, you have a WSDL to describe your methods & expected inputs/outputs and XSD schema(s) to describe the schemas for said inputs/outputs. When building a RESTful Web Service, the need for both of these documents does not go away regardless of how often you repeat the phrase “uniform interface”.

Steve argues that instead of using an XML schema to describe your document formats, you should rely on registered MIME types. The benefit of this is that you’ve broadened your horizon from thinking that the only payload for your Web service can be XML documents. The WS-* folks had to jump through lots of mental hoops to try and get non-XML data to fit in their model with wacky schemes like SOAP with Attachments (SwA), Direct Internet Message Encapsulation (DIME), WS-Attachments, Message Transmission Optimization Mechanism (MTOM) and XML-binary Optimized Packaging (XOP). All of this complexity existed because of the fundamental design of WS-* is that all data going in and out of a SOAP Web service must either be an XML document or modelled as an XML document.

However this doesn’t mean everything is plain sailing if you stick to only using registered IANA MIME types as the payloads of your Web services. What happens when you have a document format that doesn’t have a registered MIME type? You have two choices, you can either co-opt an existing MIME type and use it as an envelope format as Google has done with GData or you can use your own custom XML format as Facebook has done with the Facebook REST API. In both cases, it would be useful for developers if your data schema is documented either in prose or via some XML schema language. This doesn’t require an interface definition language like WSDL nor does it require a schema definition language like XSD.

On the other hand, how does a client application discover your application’s service end points? Today, when you point your browser to my blog at http://www.25hoursaday.com/weblog, your browser automatically detects that I have an RSS feed. When you point Windows Live Writer to a weblog, it automatically detects how to post and edit blog posts programmatically if your weblog software supports the Atom Publishing Protocol.

In the case of the RSS feed, your browser knows I have one by looking at the link element pointing to my RSS feed. The browser knows what to do with the file via the MIME type and there is no interface to be defined because the only contract of an RSS feed is that it should support HTTP GET. On the flip side an Atom service document, which is what Windows Live Writer reads to learn about your blog describes, describes the various service end points (i.e. collections) as well as the accepted inputs/outputs (either as MIME types or the hardcoded string ‘entry’ for Atom entries since they don’t have a MIME type).

The examples of Atom service documents and link elements in HTML, highlight that there is real world value in describing the interfaces to your RESTful Web Service. In addition, Atom service documents show that you can define an interface definition language for Web services without resorting to reinventing CORBA IDL (i.e. WSDL). So I respectfully disagree with Ryan Tomayko…just because my life is made easier with a service description language doesn’t make WSDL a good idea.

Now playing: Dead Prez - Hell Yeah (Pimp the System)

Categories: XML Web Services

January 17, 2008

@ 04:00 AM

Comments [1]

Greg Linden Joins Microsoft Live Labs

In his blog post entitled Joining Microsoft Live Labs Greg Linden writes

I am starting at Microsoft Live Labs next week.

Live Labs is an applied research group affiliated with Microsoft Research and MSN. The group has the enjoyable goal of not only trying to solve hard problems with broad impact, but also getting useful research work out the door and into products so it can help as many people as possible as quickly as possible.

Live Labs is lead by Gary Flake, the former head of Yahoo Research. It is a fairly new group, formed only two years ago. Gary wrote a manifesto that has more information about Live Labs.

when I found out Greg was shutting down Findory I thought myself that he’d be a great hire for Microsoft especially since he already lived in the area. It seems someone else though the same thing and now Greg has been assimilated. Congratulations, Greg.

I seem to be bumping into more and more people who are either working for or with Live Labs. Besides Justin Rudd who I just referred to the team, there’s Mike Deem and Erik Meijer, two people I know from my days on the XML team. I wonder what Gary Flake is cooking up in those swanky offices in Bellevue that has so many smart folks gravitating to his group?

Now playing: Kool & The Gang - Celebration

Categories: Mindless Link Propagation | Windows Live

January 17, 2008

@ 04:00 AM

Comments [6]

FeedDemon and NetNewsWire are now Free

It seems I missed the news when this happened last week but according to Greg Reinacker’s post NewsGator’s RSS clients are now free!

We’ve got a lot of big news today at NewsGator.

First, we’ve got new releases of our most popular applications: FeedDemon 2.6, NetNewsWire 3.1, Inbox 3.0 (beta), and NewsGator Go! for Windows Mobile 2.0. Each of these is a pretty major release on its own - tons of new features in all of them.

But second, every one of those applications is now free! Free as in beer, that is. And add to the free list NewsGator Go! for BlackBerry as well. And not only are they free, but our online services (including synchronization) are now free as well! Not to mention our iPhone reader, HTML mobile reader, and all of the other applications that are part of our online platform.

According to Greg and Nick Bradbury, the reason they are doing this is because the bulk of their profits/revenues come from selling Enterprise licenses and the desktop readers are now being used as advertising to get enterprise customers.

They also mention that the other reason they are giving away their desktop application is that they see a lot of financial value from collecting information about what feeds their users are reading. My assumption was that this is because the demographic data is being resold to marketers although both Greg and Nick make it seem like the collection of this data is benign and only used for end user facing features.

Anyway, this is pretty great news for fans of desktop RSS readers. If I didn’t already have RSS Bandit, FeedDemon would be my first choice when it comes to a desktop RSS reader for Windows. I also like the fact that we get a shout out as part of the setup experience for FeedDemon which is shown below

Thanks for the shout out Nick. Smile

Now that these apps are free, it does encourage us to step our game up with RSS Bandit. Right now, my thinking is that official version number for the release currently codenamed Phoenix will be RSS Bandit 2.0. This release will be deserve the monicker “2.0” for two reasons

The user interface will be completely rewritten from scratch by Oren and Torsten using WPF.
The application will become capable of being a full-blown desktop client to both Google Reader and NewsGator Online.

If I seem to have been blogging less, it is because I’ve been spending more time reading about code, thinking about code and writing code in my free time. I can’t wait to get the first beta out to you guys in a few months.

Now playing: Kid Rock - Welcome To The Party (Ode 2 the Old School)

Categories: RSS Bandit | Syndication Technology

January 13, 2008

@ 03:57 PM

Comments [7]

Change the World or Go Home: Why I Love Working at Microsoft

Last year was the year of big changes in my personal life. I bought a house, got married and brought a very cute and lovable Shih Tzu into our household. Some time during 2007, I realized I'd been at Microsoft for over 5 years and decided that I'd also look for change in my professional life as well.

I learned a couple of lessons from the experience. The first was that looking around for a job while trying to buy a house, moving into a new home and working towards getting married is pretty stressful. The second thing I learned was that I hadn't really thought about what I want from my career in several years. Back in my college days, I had a clear idea where I wanted to be within my first year of graduation and every thing I did back then moved me closer to that goal, from the classes I took to ensuring that I interned every summer break. Since then, I haven't really had a "five year plan" to get me to the next stage in my career. I now have a much clearer idea where I want to be by 2010 than I have in the past two or three years. Finally, I realized that I actually really like working at Microsoft especially within my current job.

Ever since I came to that final realization I've wanted to blog about why this is the case but it seemed like such a corny thing to write about that I didn't want people reading this to think I was shilling for Microsoft. However this morning I was reading a blog post entitled Gone Indie by Jens Alfke which explained why he was leaving Apple Inc. after ten years and a lot of the reasons he is leaving are the same reasons I'm still at Microsoft.

Social Software

Jens wrote

But I’m fascinated with social software. Apple isn’t. Despite some promising starts, the most I’ve been able to get accomplished in that vein at Apple was iChat [the IM part; I’m really not interested in videoconferencing], Safari RSS, and the “PubSub” [which turned out to be “RSS and Atom”] framework. There were some very promising prototypes of sexier things, but I really can’t talk about those, other than to say that they were canceled.

I looked around after Leopard was finished, and didn’t see any place in the company where I could pursue my ideas. It would have meant evangelizing reluctant executives into sharing my vision … and that’s something that I know I have little talent at.

I am similarly fascinated by Social Software and have been since I wrote down my epiphany Social Software is the Platform of the Future after a couple of conversations with my friend Mike Vernal. This epiphany is the reason I decided to start working in Microsoft's ~~social graph~~ contacts platform team which is where I continue to work till this day. Three and a half years later, this epiphany has been borne out by the rise of MySpace and Facebook as well as the realization by the technocrat masses that without data portability social software is the new vendor lock-in. This is all stuff Mike and I used to chat about back in 2004, and now Mike is off at Facebook and I'm here at Microsoft trying to make our visions a reality in our own little ways.

Unlike Jens, I don't have to evangelize reluctant execs into sharing my vision. A lot of our execs understand the importance of social software and have clear ideas of how Microsoft can add value to our users lives with our contributions to this space. When I talk to folks like Ray Ozzie, Chris Jones or David Treadwell about some of the problems I see in the social software space today, not only do they get it, I always leave the conversation with a strong sense that Microsoft will do the right thing.

Some people may criticize Microsoft for not being quick to jump onto every fad. However as Phil Haack mentioned in his blog post about his first few days as a new Microsoft hire, Microsoft invests for the long run and expects it's employees to think deeply about issues before acting. At the end of the day, the software we build in Windows Live impacts how hundreds of millions of people interact, share and communicate with their friends, family and loved ones. We endeavor to be good stewards of the trust they've placed in us.

Sharing Your Ideas

Jens wrote

I tend to have a lot of ideas. I’m not bragging, and that’s not always a good trait; it can be hard for me to focus on something long enough to finish it. A structured job has helped me stay on-task. On the other hand, though, the development cycle in a big company is such that every significant idea takes a year or more to finish, and during that time, more ideas pile up in my brain.

That wouldn’t be bad if there were some other channels to express those ideas. And if they took the form of songs, or novels, or scrimshaw carvings of Biblical scenes on walrus tusks, I could do whatever I wanted with them. But on software, Apple’s position (not unusually for the industry) is “All Your Idea Are Belong To Us”, and I signed onto that when I accepted the job offer. In other words, anything I do that relates in any way to Apple’s areas of business, no matter when or where I do it, belongs to Apple. [Edit: Ha! Note I’m still using present tense.]

(Again, this isn’t something particular about Apple. Most tech companies are like this, and if you work for one, you probably signed a very similar “Proprietary Rights Agreement” that they hid in the stack of paperwork beneath your offer letter. And yes, companies will enforce that if they see profit in it.)

I believe all Microsoft employees sign similar agreements with the company when hired. However, Microsoft is very good about letting employees explore their ideas in software on their own time without getting in the way. Projects like Script#, Reflector, RSS Bandit, DasBlog, Tweak UI and WiX are examples of software projects either developed or maintained by Microsoft employees in their free time that are now benefiting thousands to hundreds of thousands of end users.

However I think that more important than being able to share our ideas in code, being able to share our ideas in words is one of the coolest things about working at Microsoft. Thousands of Microsoft employees share their ideas with their coworkers, competitors and customers via blogs on a daily basis. Lots of companies would clamp down on that sort of behavior and ensure that only sanctioned company positions go out in employee communications but not Microsoft.

Even more surprisingly, Microsoft tolerates employees that may have ideas that differ from the company's ideas of how things should be done. You may wonder why that is surprisingly until you remember that even supposedly enlightened "Web 2.0" companies like Friendster and Google can fire you for disagreeing with the company's technology choices or hinting about future products or complaining about the company's benefits.

A lot of people [including Microsoft employees] wonder how I still have a job at Microsoft even though I've been critical of some of the company's strategies and products in my almost six years as an employee. Although I've had conversations with peers, middle managers and senior execs about my blog, I've never felt that my job was in danger. If anything, I've had it confirmed that Microsoft's culture is about being open and respectful. The one thing I have tried to change about my blog [and in fact all my communications] is being more respectful of other's perspectives and personal feelings especially when I disagree with them since you catch more flies with honey than with vinegar...or so I heard.

Individuality

Jens wrote

Finally — and this may seem petty — Apple’s lack of individuality bugs me. I don’t mean internally: within the company, communication is reasonably open (modulo confidentiality issues) and there’s lots of room for self-expression. But ever since the return of Steve Jobs, the company has been pretty maniacal about micro-managing its visible face, to make it as smooth and featureless as an iPod’s backside. (In my darker moments I’ve compared it to the brutal whiteness of “THX-1138”.)

It’s deeply ironic: For a company that famously celebrates individuality and Thinking Different, Apple has in the past decade kept its image remarkably impersonal. Other than the trinity who go onstage at press events — Steve Jobs, Jonathan Ive, Phil Schiller — how many people can you name who work for Apple? How many engineers?
...
And then there are blogs. Apple doesn’t like them, not when they talk about it. (Big surprise.) I’ve heard it said that there are hardly any bloggers working at Apple; there are actually a lot more than you’d think, but they mostly keep it a secret. (I could out a few people, including at least one director…) I think Apple’s policy on blogging is one of the least enlightened of major tech companies; Microsoft in particular is surprisingly open.

There really isn't much more I can add to that. The fact that you are reading my blog and know who I am is a testament to how much Microsoft encourages it's employees to express their individuality in their products and in our communications with our customers.

This may not be a big deal in 2008 when everyone is blogging but it was back in 2003 when the early community of Microsoft bloggers could all fit at a table in a single restaurant. Especially since when you consider it, Microsoft bloggers are probably a large part of the reason corporate blogging is mainstream today. That alone is a worthy legacy in my book.

I'd like to leave you with this image from Scott Hanselman's post about joining Microsoft. Everyone's goal should be trying to get to center of the picture.

Now Playing: Wu-Tang Clan - Can It Be All So Simple

Categories:

January 11, 2008

@ 04:00 AM

Comments [1]

Dataportability.org is the new Black

Yesterday:

After publishing an invitation to Facebook to join the DataPortability Working Group January 4, we never thought that Facebook would accept it. Today changes everything you’ve ever thought about social-networking data and lock-in before, because today Facebook, Google and Plaxo have joined the DataPortability Workgroup.

Today:

Joining the cavalcade of companies jumping on the open data bandwagon, LinkedIn has now joined Facebook, Google, Plaxo (announcement here) in joining the DataPortability Work Group.

LinkedIn has worked hard to become open since announcing their own open platform in June 2007 in response to Facebook, then becoming an initial OpenSocial launch partner in October 2007.
…
Update: Web developers from Flickr, SixApart, and Twitter have also joined.

Some folks at work asked me what I thought of DataPortability.org. The short version is that it reminds me of AttentionTrust, long on intention and short on implementation.

Principles are nice but working code is even better.

Now playing: Aerosmith - Eat the Rich

Categories: Social Software

January 8, 2008

@ 03:29 PM

Comments [5]

Finding Names that are Anagrams of Each Other in C# 3.0

Brad Fitzpatrick, founder of Livejournal, has a blog post entitled Naming twins in Python & Perl where he writes

Last night at Beau's party, one of Beau's guests mentioned he's expecting twins shortly, which is why is wife wasn't at the party.
I drunkenly suggested he name his kids two names that were anagrams of each other. I then wandered off downstairs to find such suitable names.
Because I'm supposed to be be working in Python these days, not Perl, I gathered what little Python knowledge I had and wrote out:
#!/usr/bin/python

by_anagram = {}

names_file = open("dist.male.first")
for line in names_file:
    # lines in file look like:
    # WILLIAM        2.451 14.812      5
    # we want just the first field.
    name = (line.split())[0]
    letters = [letter for letter in name]
    letters.sort()
    sorted_name = "".join(letters)
    if not sorted_name in by_anagram:
        by_anagram[sorted_name] = []
    by_anagram[sorted_name].append(name)

for sorted_name in by_anagram:
    if len(by_anagram[sorted_name]) < 2:
        continue

    print by_anagram[sorted_name]
Not so happy with it, but it worked, and I printed out the results and brought them up to the guy

You can guess where this is going. Below is my solution in C# 3.0

using System;
using System.Linq;
using System.IO;

namespace Name_agram{
    class Program{
        static void Main(string[] args){
            var names    = from line in File.ReadAllLines("dist.male.first.txt")
                           let name = line.Split(' ')[0]
                           group name by new string(name.ToCharArray().OrderBy(x=>x).ToArray()) into anagrams
                           select anagrams;
                                        
            foreach (var list_of_names in names) {
                if (list_of_names.Count() > 1)
                    Console.WriteLine(String.Join(" ",list_of_names.ToArray()));
            }
            Console.ReadLine();
        }
    }
}

There are a ton of solutions in various languages in the comments to Brad's post. Again, it is startling to see how different an idiomatic C# 3.0 solution using LINQ is from the traditional imperative/procedural style of programming.

Now Playing: Rick James - Give It To Me Baby

Categories: Programming

January 7, 2008

@ 03:39 AM

Comments [13]

Breaking the Social Contract: My Data is not Your Data

This is likely my last post in Robert Scoble vs. Facebook saga but I think there are some subtle points being lost because of the typical blog feeding frenzy where people either choose to flame Facebook, Scoble or both. Robert Scoble has a post entitled Plaxo: the social monster? where he writes

Judi Sohn rips into the trustworthiness of both me and Plaxo for attempting to import email addresses, names, and birthdays.
...
What if I wrote down Judi’s email and then manually put it into my Outlook’s contact database. Wouldn’t that have been exactly the same thing that I tried to do with Plaxo’s script?

There are a couple of things wrong with Robert's analogy.

When I entire my personally identifiable information (PII) into Facebook, I am entering into a social contract with two entities. I am trusting Facebook to protect my data so it is safe from malicious hackers and not sell it to malicious third parties like spammers or telemarketers, in return I provide Facebook with accurate data which improves their service and the user experience of the people in my social network. In addition, I am implicitly trusting the people in my social network not to abuse the privilege of having my personal information (e.g. by prank calling my cell phone, giving my personal details to third parties I don't trust).

There is a key difference between Robert taking my personal information I shared with him on Facebook and importing into Outlook versus importing it into Plaxo Pulse. In the former case, Robert is taking data I shared with him and viewing it in a different application. In the latter case, Robert is additionally sharing my personal details with a corporate entity; Plaxo, Inc. This is an entity that is synonymous with spam and at the time of writing this post there 209,000 hits returned for a search for "Plaxo Spam" on the Google search engine. This is the key difference between Robert importing my personal details into Outlook and importing it into Plaxo Pulse.

Lots of geeks have focused on the fact that since it was possible for Robert to manually extract this data, then then people sharing data with him shouldn't complain since they gave him access to the data. This ignores the fact that just because something is technically possible doesn't make it right even if it is legal. Just because it is technically possible for you to read the RSS feed for my blog and republish it on a splog so you can make money from AdSense ads doesn't make it right. Just because it is technically possible for you to view my photo albums on Windows Live Spaces doesn't mean I'd think it was OK to use Omar's Send to Smugmug script to republish these photos on Smugmug. Just because you have my phone number doesn't mean I think it is OK for you to share it with all your drinking buddies that want to work at Microsoft and need a recommendation. And so on...

In all of these cases, there the social contract between us would have been broken. This is independent of whether it's technically possible for you to do these things by hand without needing a script or whatever.

Taking my data and sharing it with a third party without my permission isn't cool. Just because I shared information with you doesn't give you the right to share it with others.

Now Playing: Eminem - Mockingbird

Categories: Social Software

January 5, 2008

@ 03:55 PM

Comments [9]

Python vs C# 3.0: Tuples vs. Anonymous Types (Redux)

One of the things to keep in mind when learning a new programming language is that it isn’t enough to learn the syntax and semantics of various language features you are unfamiliar with. Just as important is learning the idioms and way of thinking that goes with these language features. On reddit, ubernostrum pointed out that my usage of

if all_links.get(url) == None:

was jarring to read as a Python programmer when compared to the more idiomatic

if url not in all_links:

Of course this is just a stylistic issue but his point is valid. A similar thing happened with regards to other aspects of my recent post entitled Does C# 3.0 Beat Dynamic Languages at their Own Game?

I argued that type inferencing and anonymous types in C# 3.0 did not offer the same degree of functionality that tuples and dynamic typing did when it came to processing intermediate values in a computation without requiring nominal types (i.e. named classes) to hold these values.

Specifically I had the following IronPython code,

IronPython Code

      for item in filteredItems:
            vote = (voteFunc(item), item, feedTitle)

            #add a vote for each of the URLs
            for url in item.outgoing_links.Keys:
                if all_links.get(url) is None:
                    all_links[url] = []
                all_links.get(url).append(vote)

    # tally the votes, only 1 vote counts per feed
    weighted_links = []
    for link, votes in all_links.items():
        site = {}
        for weight, item, feedTitle in votes:
            site[feedTitle] = min(site.get(feedTitle,1), weight)
        weighted_links.append((sum(site.values()), link))
    weighted_links.sort()
    weighted_links.reverse()

The key things to note about the above code block are (i) the variable named vote is a tuple of three values; the numeric weight given to a link received from a particular RSS item, an RSS item and the title of the feed Python and (ii) the items in the tuple can be unpacked into individual variables when looping over the contents of the tuple in a for loop.

When I tried to write the same code in C# 3.0 with a vote variable that was an anonymous type, I hit a road block. When I placed instances of the anonymous type in the list, I had no way of knowing what the data type of the object I’d be pulling out of the list would be when I wanted to extract it later to tally the votes. Since C# is statically typed, knowing the type’s name is a requirement for retrieving the objects from the list later unless I planned to interact with them as instances of System.Object and access their fields through reflection (or something just as weird).

So in my C# 3.0 solution I ended up creating RankedLink and Vote types to simulate the functionality I was getting from tuples in Python.

However it turns out I was using anonymous types incorrectly. I tried to take a feature that was meant to be coupled with C# 3.0’s declarative Language Integrated Query (LINQ) and use it in the traditional imperative loop constructs I’ve been familiar with since my days programming in C.

Ian Griffith’s set me straight with his blog post entitled Dare Obasanjo on C# Anonymous Types where he showed how to use anonymous types to get the solution I wanted without having to create unnecessary named types to hold intermediate values. Ian’s code is shown below

C# 3.0 Code

// calculate vote for each outgoing url
var all_links = from item in items
                from url in item.OutgoingLinks.Keys 
                group item by url into itemUrlGroup
                select new
                {
                  Url=itemUrlGroup.Key,
                  Votes=from item in itemUrlGroup
                        select new
                        {
                          Weight=voteFunc(item),
                          Item=item,
                          FeedTitle=feedTitle
                        }
                };

// tally the votes
var weighted_links = from link_n_votes in all_links
                     select new
                     {
                       Url=link_n_votes.Url,
                       Score=(from vote in link_n_votes.Votes
                              group vote by vote.FeedTitle into feed
                              select feed.Min(vote => vote.Weight)
                             ).Sum()
                     } into weighted_link
                     orderby weighted_link.Score descending
                     select weighted_link;

As you can see, Ian’s code performs the same task as the Python code does but with a completely different approach. The anonymous types are performing the same function as the Python tuples did in my previous code sample and there is no need to create RankedLink and Vote types to hold these intermediate values.

What I find interesting about this is that even though I’ve been using C# for the past five or six years, I feel like I have to relearn the language from scratch to fully understand or be able to take advantage the LINQ features. Perhaps a few stints as a SQL developer may be necessary as well?

Categories: Programming

January 4, 2008

@ 04:48 PM

Comments [3]

Should Hotmail Block Screen Scrapers?

Paul Buchheit, creator of Gmail now the founder of FriendFeed, has a blog post entitled Should Gmail, Yahoo, and Hotmail block Facebook? where he writes

Apparently Facebook will ban you (or at least Robert Scoble) if you attempt to extract your friend's email addresses from the service. Automated access is a difficult issue for any web service, so I won't argue with their decision -- it's their service and they own you. However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:
...
So the question is, should Gmail, Yahoo, and Hotmail block Facebook (or close the accounts of anyone who uses Facebook's "friend finder") for violating their Terms of Use?

I don't want to single out Facebook here since pretty much every "Web 2.0" website with social features is very in-your-face about asking for your credentials from your email provider and then screen scraping your contact's email addresses. I just signed up for Twitter and the user interface makes it cumbersome to even start using the service after creating an account without giving up your email username and password.

I think there are two questions here. The first is whether users should be able to extract their data [including social graph data] from one service and import it into another. I personally believe the answer is Yes and this philosophy underlies what we've been working on at Windows Live and specifically the team I'm on which is responsible for the ~~social graph~~ contacts platform.

The next question is whether screen scraping is the way to get this data? I think the answer is definitely not. The first problem with this approach is that when I give some random "Web 2.0" social network my email username and password, I’m not only giving them access to my address book but also access to

my blog posts and all my photos (http://spaces.live.com)
my travel history (http://www.expedia.com)
my search history (http://www.google.com/psearch)
my personal email (http://www.hotmail.com)
my medical information (http://www.healthvault.com)
my business documents (http://www.officelive.com)
my personal documents (http://docs.google.com)
my purchase history (https://checkout.google.com)
and so on…

This seems like a lot of valuable data to trust to some fly by night "Web 2.0" service that can't seem to hire a full time sys admin or a full rack in a data center let alone know how to properly safeguard my personal information.

Another problem with this approach is that it encourages users to give up their usernames and passwords when prompted by any random Web site which increases incidences of phishing. Some have gone as far as calling this approach an anti-pattern that is kryptonite to the Open Web.

Finally, there is no way to identify the application that is accessing data on the user's behalf if it turns out to be a malicious application. For example, if you read articles like Are you getting Quechup spammed you'll note that there's been more than one incident where a "Web 2.0" company turned out to either be spamming users via the email addresses they had harvested in this manner or straight up just resold the email addresses to spammers. Have you ever wondered how much spam you get because someone who has your email address blithely gave up your email credentials to some social network site who in turn used a Web service that is run by spammers to retrieve your contact details?

So if I think that user's should be able to get out their data yet screen scraping isn't the way, what should we do? At Windows Live, we believe the right approach is to provide user-centric APIs which allow users to grant and revoke permission to third party applications to access their personal data. For the specific case of social graph data, we've provided an ALPHA Windows Live Contacts API which is intended to meet exactly this scenario. The approach taken by this API and similar patterns (e.g. using OAuth) solves all three concerns I've raised above.

Now given what I've written above, do you think Hotmail should actively block or hinder screen scraping applications used to obtain the email addresses of a user's contacts?

Categories: Platforms | Windows Live

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Friday, 18 January 2008 - Dare Obasanjo's weblog

Social Software

Sharing Your Ideas

Individuality