Learning from our Mistakes: The Failure of OpenID, AtomPub and XML on the Web

January 30, 2011

@ 05:44 PM

I’ve now been working and blogging about web technology long enough to see technologies that we once thought were the best thing since sliced bread turn out to be rather poor solutions to the problem or even worse that they create more problems than they solve. Since I’ve written favorably about all of the technologies mentioned below this is also a mea culpa where I try to see what I can learn about judging the suitability of technologies to solving problems on the web without being blinded by the hype from the “cool kids” on the web.

The Failure of OpenID

According to Wikipedia, “OpenID is an open standard that describes how users can be authenticated in a decentralized manner, obviating the need for services to provide their own ad hoc systems and allowing users to consolidate their digital identities”. So the problem that OpenID solves is having to create multiple accounts on different websites but instead being able to re-use from the identity provider (i.e. website) of your choice. OpenID was originally invented in 2005 by Brad Fitzpatrick to solve the problem of having bloggers having to create an account on a person’s weblog or blogging service before being able to leave a comment.

OpenID soon grew beyond it’s blog-centric origins and has had a number of the big name web companies either implement it in some way or be active in it's community. Large companies and small companies alike have been lobbied to implement OpenID and accused of not being “open” when they haven’t immediately jumped on the band wagon. However now that we’ve had five years of OpenID, there are a number of valid problems that have begun to indicate the emperor may either have no close or at least is just in his underwear.

The most recent set of hand wringing about the state of OpenID has been inspired by 37 Signals announcing they'll be retiring OpenID support but the arguments against OpenID have been gathering steam for months if not years.

First of all, there have been the arguments that OpenID is too complex and yet doesn't have enough features from people who’ve been supporting the technology for years like David Recordon. Here is an excerpt from David Recordon’s writings on the need for an OpenID Connect

In 2005 I don't think that Brad Fitzpatrick or I could have imagined how successful OpenID would become. Today there are over 50,000 websites supporting it and that number grows into the millions if you include Google FriendConnect. There are over a billion OpenID enabled URLs and production implementations from the largest companies on the Internet.
…
We've heard loud and clear that sites looking to adopt OpenID want more than just a unique URL; social sites need basic things like your name, photo, and email address. When Joseph Smarr and I built the OpenID/OAuth hybrid we were looking for a way to provide that functionality, but it proved complex to implement. So now there's a simple JSON User Info API similar to those already offered by major social providers.

We have also heard that people want OpenID to be simple. I've heard story after story from developers implementing OpenID 2.0 who don't understand why it is so complex and inevitably forgot to do something. With OpenID Connect, discovery no longer takes over 3,000 lines of PHP to implement correctly. Because it's built on top of OAuth 2.0, the whole spec is fairly short and technology easy to understand. Building on OAuth provides amazing side benefits such as potentially being the first version of OpenID to work natively with desktop applications and even on mobile phones.

50,000 websites sounds like a lot until you think about the fact that Facebook Connect which solves a similar problem had been adopted by 250,000 websites during the same time frame and had been around less than half as long as OpenID. It’s also telling to ask yourself how often you as an end user actually have used OpenID or even seen that it is available on a site.

The reason for why you can count the instances you’ve had this occur on one or two hands is eloquently articulated in Yishan Wong’s answer to the question What's wrong with OpenID on Quora which is excerpted below

The short answer is that OpenID is the worst possible "solution" I have ever seen in my entire life to a problem that most people don't really have. That's what's "wrong" with it.

To answer the most immediate question of "isn't having to register and log into many sites a big problem that everyone has?," I will say this: No, it's not. Regular normal people have a number of solutions to this problem. Here are some of them:

use the same username/password for multiple sites

use their browser's ability to remember their password (enabled by default)

don't register for the new site

don't ever log in to the site

log in once, click "remember me"

click the back button on their browser and never come back to the site

maintain a list of user IDs and passwords in an offline document

These are all perfectly valid solutions that a regular user finds acceptable. A nerd will wrinkle up his nose at these solutions and grumble about the "security vulnerabilities" (and they'll be right, technically) but the truth is that these solutions get people into the site and doing what they want and no one really cares about security anyways. On the security angle, no one is going to adopt a product to solve a problem they don't care about (or in many cases, even understand).
…
The fact that anyone even expects that OpenID could possibly see any amount of adoption is mind-boggling to me. Proponents are literally expecting people to sign up for yet another third-party service, in some cases log in by typing in a URL, and at best flip away to another branded service's page to log in and, in many cases, answer an obscurely-worded prompt about allowing third-party credentials, all in order to log in to a site. This is the height of irony - in order to ease my too-many-registrations woes, you are asking me to register yet again somewhere else?? Or in order to ease my inconvenience of having to type in my username and password, you are having me log in to another site instead??

Not only that, but in the cases where OpenID has been implemented without the third-party proxy login, the technical complexity behind what is going on in terms of credential exchange and delegation is so opaque that even extremely sophisticated users cannot easily understand it (I have literally had some of Silicon Valley's best engineers tell me this). At best, a re-directed third-party proxy login is used, which is the worst possible branding experience known on the web - discombobulating even for savvy internet users and utterly confusing for regular users. Even Facebook Connect suffers from this problem - people think "Wait, I want to log into X, not Facebook..." and needs to overcome it by making the brand and purpose of what that "Connect with Facebook" button ubiquitous in order to overcome the confusion.

I completely agree with Yishan’s analysis here. Not only does OpenID complicate the sign-in/sign-up experience for sites that adopt it but also it is hard to confidently make the argument that end users actually consider the problem OpenID is trying to solve be worth the extra complication.

The Failure of XML on the Web

At the turn of the last decade, XML could do no wrong. There was no problem that couldn’t be solved by applying XML to it and every technology was going to be replaced by it. XML was going to kill HTML. XML was going to kill CORBA, EJB and DCOM as we moved to web services. XML was a floor wax and a dessert topping. Unfortunately, after over a decade it is clear that XML has not and is unlikely to ever be the dominant way we create markup for consumption by browsers or how applications on the Web communicate.

James Clark has XML vs the Web where he talks about this grim realization

Twitter and Foursquare recently removed XML support from their Web APIs, and now support only JSON. This prompted Norman Walsh to write an interesting post, in which he summarised his reaction as "Meh". I won't try to summarise his post; it's short and well-worth reading.

From one perspective, it's hard to disagree. If you're an XML wizard with a decade or two of experience with XML and SGML before that, if you're an expert user of the entire XML stack (eg XQuery, XSLT2, schemas), if most of your data involves mixed content, then JSON isn't going to be supplanting XML any time soon in your toolbox.
…
There's a bigger point that I want to make here, and it's about the relationship between XML and the Web. When we started out doing XML, a big part of the vision was about bridging the gap from the SGML world (complex, sophisticated, partly academic, partly big enterprise) to the Web, about making the value that we saw in SGML accessible to a broader audience by cutting out all the cruft. In the beginning XML did succeed in this respect. But this vision seems to have been lost sight of over time to the point where there's a gulf between the XML community and the broader Web developer community; all the stuff that's been piled on top of XML, together with the huge advances in the Web world in HTML5, JSON and JavaScript, have combined to make XML be perceived as an overly complex, enterprisey technology, which doesn't bring any value to the average Web developer.

This is not a good thing for either community (and it's why part of my reaction to JSON is "Sigh"). XML misses out by not having the innovation, enthusiasm and traction that the Web developer community brings with it, and the Web developer community misses out by not being able to take advantage of the powerful and convenient technologies that have been built on top of XML over the last decade.

So what's the way forward? I think the Web community has spoken, and it's clear that what it wants is HTML5, JavaScript and JSON. XML isn't going away but I see it being less and less a Web technology; it won't be something that you send over the wire on the public Web, but just one of many technologies that are used on the server to manage and generate what you do send over the wire.

The fact that XML based technologies are no longer required tools in the repertoire of the Web developer isn’t news to anyone who follows web development trends. However it is interesting to look back and consider that there was once a time when the W3C and the broader web development community assumed this was going to be the case. The reasons for its failure on the Web are self evident in retrospect.

There have been many articles published about the failure of XML as a markup language over the past few years. My favorites being Sending XHTML as text/html Considered Harmful and HTML5, XHTML2, and the Future of the Web which do a good job of capturing all of the problems with using XML with its rules about draconian error handling on the web where ill-formed, hand authored markup and non-XML savvy tools rule the roost.

As for XML as the protocol for intercommunication between Web apps, the simplicity of JSON over the triumvirate of SOAP, WSDL and XML Schema is so obvious it is almost ridiculous to have to point it out.

The Specific Failure of the Atom Publishing Protocol

Besides the general case of the failure of XML as a data interchange format for web applications, I think it is still worthwhile to call out the failure of the Atom Publishing Protocol (AtomPub) which was eventually declared a failure by the editor of the spec, Joe Gregorio. AtomPub arose from the efforts of a number of geeks to build a better API for creating blog posts. The eventual purpose of AtomPub was to create a generic application programming interface for manipulating content on the Web. In his post titled AtomPub is a Failure, Joe Gregorio discussed why the technology failed to take off as follows

So AtomPub isn't a failure, but it hasn't seen the level of adoption I had hoped to see at this point in its life. There are still plenty of new protocols being developed on a seemingly daily basis, many of which could have used AtomPub, but don't. Also, there is a large amount of AtomPub being adopted in other areas, but that doesn't seem to be getting that much press, ala, I don't see any Atom-Powered Logo on my phones like Tim Bray suggested.

So why hasn't AtomPub stormed the world to become the one true protocol? Well, there are three answers:

Browsers

Browsers

Browsers

…
Thick clients, RIAs, were supposed to be a much larger component of your online life. The cliche at the time was, "you can't run Word in a browser". Well, we know how well that's held up. I expect a similar lifetime for today's equivalent cliche, "you can't do photo editing in a browser". The reality is that more and more functionality is moving into the browser and that takes away one of the driving forces for an editing protocol.
Another motivation was the "Editing on the airplane" scenario. The idea was that you wouldn't always be online and when you were offline you couldn't use your browser. The part of this cliche that wasn't put down by Virgin Atlantic and Edge cards was finished off by Gears and DVCS's.
…
The last motivation was for a common interchange format. The idea was that with a common format you could build up libraries and make it easy to move information around. The 'problem' in this case is that a better format came along in the interim: JSON. JSON, born of Javascript, born of the browser, is the perfect 'data' interchange format, and here I am distinguishing between 'data' interchange and 'document' interchange. If all you want to do is get data from point A to B then JSON is a much easier format to generate and consume as it maps directly into data structures, as opposed to a document oriented format like Atom, which has to be mapped manually into data structures and that mapping will be different from library to library.

As someone who has tried to both use and design APIs based on the Atom format, I have to agree that it is painful to have to map your data model to what is effectively a data format for blog entries instead of keeping your existing object model intact and using a better suited format like JSON.

The Common Pattern in these Failures

When I look at all three of these failures I see a common pattern which I’ll now be on the look out for when analyzing the suitability of technologies for my purposes. In each of these cases, the technology was designed for a specific niche with the assumption that the conditions that applied within that niche were general enough that the same technology could be used to solve a number of similar looking but very different problems.

The argument for OpenID is a lot stronger when limiting the audience to bloggers who all have a personal URL for their blog AND where it actually be a burden to sign up for an account on the millions of self hosted blogs out there. However it isn’t true that same set of conditions applies universally when trying to log-in or sign-up for the handful of websites I use regularly enough to decide I want to create an account.
XML arose from the world of SGML where experts created custom vocabularies for domain-specific purposes such as DocBook and EDGAR. The world of novices creating markup documents in a massively decoupled environment such as the Web needed a different set of underlying principles.
AtomPub assumed that the practice of people creating blog posts via custom blog editing tools (like the one I’m using the write this post) would be a practice that would spread to other sorts of web content and that these forms of web content wouldn’t be much distinguishable from blog posts. It turns out that most of our content editing still takes place in the browser and in the places where we do actually utilize custom tools (e.g. Facebook & Twitter clients), an object-centric domain specific data format is better than an XML-centric blog based data format.

So next time you’re evaluating a technology that is being much hyped by the web development blogosphere, take a look to see whether the fundamental assumptions that led to the creation of the technology actually generalize to your use case. An example that comes to mind that developers should consider doing with this sort of evaluation given the blogosphere hype is NoSQL.

Note Now Playing: Keri Hilson - Knock You Down (featuring Kanye West & Ne-Yo) Note