Should Hotmail Block Screen Scrapers? - Dare Obasanjo's weblog

January 4, 2008

@ 04:48 PM

Paul Buchheit, creator of Gmail now the founder of FriendFeed, has a blog post entitled Should Gmail, Yahoo, and Hotmail block Facebook? where he writes

Apparently Facebook will ban you (or at least Robert Scoble) if you attempt to extract your friend's email addresses from the service. Automated access is a difficult issue for any web service, so I won't argue with their decision -- it's their service and they own you. However, when I signed up for Facebook I gave them my Gmail address and password, using their find friends feature:
...
So the question is, should Gmail, Yahoo, and Hotmail block Facebook (or close the accounts of anyone who uses Facebook's "friend finder") for violating their Terms of Use?

I don't want to single out Facebook here since pretty much every "Web 2.0" website with social features is very in-your-face about asking for your credentials from your email provider and then screen scraping your contact's email addresses. I just signed up for Twitter and the user interface makes it cumbersome to even start using the service after creating an account without giving up your email username and password.

I think there are two questions here. The first is whether users should be able to extract their data [including social graph data] from one service and import it into another. I personally believe the answer is Yes and this philosophy underlies what we've been working on at Windows Live and specifically the team I'm on which is responsible for the ~~social graph~~ contacts platform.

The next question is whether screen scraping is the way to get this data? I think the answer is definitely not. The first problem with this approach is that when I give some random "Web 2.0" social network my email username and password, I’m not only giving them access to my address book but also access to

my blog posts and all my photos (http://spaces.live.com)
my travel history (http://www.expedia.com)
my search history (http://www.google.com/psearch)
my personal email (http://www.hotmail.com)
my medical information (http://www.healthvault.com)
my business documents (http://www.officelive.com)
my personal documents (http://docs.google.com)
my purchase history (https://checkout.google.com)
and so on…

This seems like a lot of valuable data to trust to some fly by night "Web 2.0" service that can't seem to hire a full time sys admin or a full rack in a data center let alone know how to properly safeguard my personal information.

Another problem with this approach is that it encourages users to give up their usernames and passwords when prompted by any random Web site which increases incidences of phishing. Some have gone as far as calling this approach an anti-pattern that is kryptonite to the Open Web.

Finally, there is no way to identify the application that is accessing data on the user's behalf if it turns out to be a malicious application. For example, if you read articles like Are you getting Quechup spammed you'll note that there's been more than one incident where a "Web 2.0" company turned out to either be spamming users via the email addresses they had harvested in this manner or straight up just resold the email addresses to spammers. Have you ever wondered how much spam you get because someone who has your email address blithely gave up your email credentials to some social network site who in turn used a Web service that is run by spammers to retrieve your contact details?

So if I think that user's should be able to get out their data yet screen scraping isn't the way, what should we do? At Windows Live, we believe the right approach is to provide user-centric APIs which allow users to grant and revoke permission to third party applications to access their personal data. For the specific case of social graph data, we've provided an ALPHA Windows Live Contacts API which is intended to meet exactly this scenario. The approach taken by this API and similar patterns (e.g. using OAuth) solves all three concerns I've raised above.

Now given what I've written above, do you think Hotmail should actively block or hinder screen scraping applications used to obtain the email addresses of a user's contacts?

Categories: Platforms | Windows Live

« Facebook Right, Scoble Wrong: Social Net... | Home | Python vs C# 3.0: Tuples vs. Anonymous T... »

Friday, 04 January 2008 17:05:08 (GMT Standard Time, UTC+00:00)

Well, you make a good point, but until there is an API for every piece of data that developers want to get at, then screen scraping is the only thing we can do. For instance, I want to create an iphone friendly xbox live friends page, but I cannot get at my friends list without screen scraping. Isn't this something that just makes sense as an API? Same with getting anything else Microsoft or any other company has - like you even said, getting google reader info out was using an undocumented API, which almost is like screen scraping. It would be much easier if everyone had an API, it should almost be a requirement before we start using a service.

Steve

Sunday, 06 January 2008 02:30:09 (GMT Standard Time, UTC+00:00)

APIs, granting permissions, etc. -- this is all cool. What I don't get is what's the difference between application screen-scraping the emails and getting them through the API. Sure, you can revoke the permission to access your social graph, but it does not matter after the application saw it once -- if the authors are bad, it's too late, they already have all the data... The only positive thing is if the API security model is granlar enough, and allows to grant permissions to "contact list only", not the email service/travel history/etc.

max

Tuesday, 08 January 2008 14:29:33 (GMT Standard Time, UTC+00:00)

Very interesting; it feels good to know that there are implementers like you who get the point.

A minor nit though : in the phrase "...someone who has your email address blithely gave up your email credentials to some social network ..." you meant to write "gave up *their* email credentials", right ?

Cheers,
--Jonathan

Jonathan Perret

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Should Hotmail Block Screen Scrapers? - Dare Obasanjo's weblog