January, 2011 - Dare Obasanjo's weblog

January 30, 2011

@ 05:44 PM

Learning from our Mistakes: The Failure of OpenID, AtomPub and XML on the Web

I’ve now been working and blogging about web technology long enough to see technologies that we once thought were the best thing since sliced bread turn out to be rather poor solutions to the problem or even worse that they create more problems than they solve. Since I’ve written favorably about all of the technologies mentioned below this is also a mea culpa where I try to see what I can learn about judging the suitability of technologies to solving problems on the web without being blinded by the hype from the “cool kids” on the web.

The Failure of OpenID

According to Wikipedia, “OpenID is an open standard that describes how users can be authenticated in a decentralized manner, obviating the need for services to provide their own ad hoc systems and allowing users to consolidate their digital identities”. So the problem that OpenID solves is having to create multiple accounts on different websites but instead being able to re-use from the identity provider (i.e. website) of your choice. OpenID was originally invented in 2005 by Brad Fitzpatrick to solve the problem of having bloggers having to create an account on a person’s weblog or blogging service before being able to leave a comment.

OpenID soon grew beyond it’s blog-centric origins and has had a number of the big name web companies either implement it in some way or be active in it's community. Large companies and small companies alike have been lobbied to implement OpenID and accused of not being “open” when they haven’t immediately jumped on the band wagon. However now that we’ve had five years of OpenID, there are a number of valid problems that have begun to indicate the emperor may either have no close or at least is just in his underwear.

The most recent set of hand wringing about the state of OpenID has been inspired by 37 Signals announcing they'll be retiring OpenID support but the arguments against OpenID have been gathering steam for months if not years.

First of all, there have been the arguments that OpenID is too complex and yet doesn't have enough features from people who’ve been supporting the technology for years like David Recordon. Here is an excerpt from David Recordon’s writings on the need for an OpenID Connect

In 2005 I don't think that Brad Fitzpatrick or I could have imagined how successful OpenID would become. Today there are over 50,000 websites supporting it and that number grows into the millions if you include Google FriendConnect. There are over a billion OpenID enabled URLs and production implementations from the largest companies on the Internet.
…
We've heard loud and clear that sites looking to adopt OpenID want more than just a unique URL; social sites need basic things like your name, photo, and email address. When Joseph Smarr and I built the OpenID/OAuth hybrid we were looking for a way to provide that functionality, but it proved complex to implement. So now there's a simple JSON User Info API similar to those already offered by major social providers.

We have also heard that people want OpenID to be simple. I've heard story after story from developers implementing OpenID 2.0 who don't understand why it is so complex and inevitably forgot to do something. With OpenID Connect, discovery no longer takes over 3,000 lines of PHP to implement correctly. Because it's built on top of OAuth 2.0, the whole spec is fairly short and technology easy to understand. Building on OAuth provides amazing side benefits such as potentially being the first version of OpenID to work natively with desktop applications and even on mobile phones.

50,000 websites sounds like a lot until you think about the fact that Facebook Connect which solves a similar problem had been adopted by 250,000 websites during the same time frame and had been around less than half as long as OpenID. It’s also telling to ask yourself how often you as an end user actually have used OpenID or even seen that it is available on a site.

The reason for why you can count the instances you’ve had this occur on one or two hands is eloquently articulated in Yishan Wong’s answer to the question What's wrong with OpenID on Quora which is excerpted below

The short answer is that OpenID is the worst possible "solution" I have ever seen in my entire life to a problem that most people don't really have. That's what's "wrong" with it.

To answer the most immediate question of "isn't having to register and log into many sites a big problem that everyone has?," I will say this: No, it's not. Regular normal people have a number of solutions to this problem. Here are some of them:

use the same username/password for multiple sites

use their browser's ability to remember their password (enabled by default)

don't register for the new site

don't ever log in to the site

log in once, click "remember me"

click the back button on their browser and never come back to the site

maintain a list of user IDs and passwords in an offline document

These are all perfectly valid solutions that a regular user finds acceptable. A nerd will wrinkle up his nose at these solutions and grumble about the "security vulnerabilities" (and they'll be right, technically) but the truth is that these solutions get people into the site and doing what they want and no one really cares about security anyways. On the security angle, no one is going to adopt a product to solve a problem they don't care about (or in many cases, even understand).
…
The fact that anyone even expects that OpenID could possibly see any amount of adoption is mind-boggling to me. Proponents are literally expecting people to sign up for yet another third-party service, in some cases log in by typing in a URL, and at best flip away to another branded service's page to log in and, in many cases, answer an obscurely-worded prompt about allowing third-party credentials, all in order to log in to a site. This is the height of irony - in order to ease my too-many-registrations woes, you are asking me to register yet again somewhere else?? Or in order to ease my inconvenience of having to type in my username and password, you are having me log in to another site instead??

Not only that, but in the cases where OpenID has been implemented without the third-party proxy login, the technical complexity behind what is going on in terms of credential exchange and delegation is so opaque that even extremely sophisticated users cannot easily understand it (I have literally had some of Silicon Valley's best engineers tell me this). At best, a re-directed third-party proxy login is used, which is the worst possible branding experience known on the web - discombobulating even for savvy internet users and utterly confusing for regular users. Even Facebook Connect suffers from this problem - people think "Wait, I want to log into X, not Facebook..." and needs to overcome it by making the brand and purpose of what that "Connect with Facebook" button ubiquitous in order to overcome the confusion.

I completely agree with Yishan’s analysis here. Not only does OpenID complicate the sign-in/sign-up experience for sites that adopt it but also it is hard to confidently make the argument that end users actually consider the problem OpenID is trying to solve be worth the extra complication.

The Failure of XML on the Web

At the turn of the last decade, XML could do no wrong. There was no problem that couldn’t be solved by applying XML to it and every technology was going to be replaced by it. XML was going to kill HTML. XML was going to kill CORBA, EJB and DCOM as we moved to web services. XML was a floor wax and a dessert topping. Unfortunately, after over a decade it is clear that XML has not and is unlikely to ever be the dominant way we create markup for consumption by browsers or how applications on the Web communicate.

James Clark has XML vs the Web where he talks about this grim realization

Twitter and Foursquare recently removed XML support from their Web APIs, and now support only JSON. This prompted Norman Walsh to write an interesting post, in which he summarised his reaction as "Meh". I won't try to summarise his post; it's short and well-worth reading.

From one perspective, it's hard to disagree. If you're an XML wizard with a decade or two of experience with XML and SGML before that, if you're an expert user of the entire XML stack (eg XQuery, XSLT2, schemas), if most of your data involves mixed content, then JSON isn't going to be supplanting XML any time soon in your toolbox.
…
There's a bigger point that I want to make here, and it's about the relationship between XML and the Web. When we started out doing XML, a big part of the vision was about bridging the gap from the SGML world (complex, sophisticated, partly academic, partly big enterprise) to the Web, about making the value that we saw in SGML accessible to a broader audience by cutting out all the cruft. In the beginning XML did succeed in this respect. But this vision seems to have been lost sight of over time to the point where there's a gulf between the XML community and the broader Web developer community; all the stuff that's been piled on top of XML, together with the huge advances in the Web world in HTML5, JSON and JavaScript, have combined to make XML be perceived as an overly complex, enterprisey technology, which doesn't bring any value to the average Web developer.

This is not a good thing for either community (and it's why part of my reaction to JSON is "Sigh"). XML misses out by not having the innovation, enthusiasm and traction that the Web developer community brings with it, and the Web developer community misses out by not being able to take advantage of the powerful and convenient technologies that have been built on top of XML over the last decade.

So what's the way forward? I think the Web community has spoken, and it's clear that what it wants is HTML5, JavaScript and JSON. XML isn't going away but I see it being less and less a Web technology; it won't be something that you send over the wire on the public Web, but just one of many technologies that are used on the server to manage and generate what you do send over the wire.

The fact that XML based technologies are no longer required tools in the repertoire of the Web developer isn’t news to anyone who follows web development trends. However it is interesting to look back and consider that there was once a time when the W3C and the broader web development community assumed this was going to be the case. The reasons for its failure on the Web are self evident in retrospect.

There have been many articles published about the failure of XML as a markup language over the past few years. My favorites being Sending XHTML as text/html Considered Harmful and HTML5, XHTML2, and the Future of the Web which do a good job of capturing all of the problems with using XML with its rules about draconian error handling on the web where ill-formed, hand authored markup and non-XML savvy tools rule the roost.

As for XML as the protocol for intercommunication between Web apps, the simplicity of JSON over the triumvirate of SOAP, WSDL and XML Schema is so obvious it is almost ridiculous to have to point it out.

The Specific Failure of the Atom Publishing Protocol

Besides the general case of the failure of XML as a data interchange format for web applications, I think it is still worthwhile to call out the failure of the Atom Publishing Protocol (AtomPub) which was eventually declared a failure by the editor of the spec, Joe Gregorio. AtomPub arose from the efforts of a number of geeks to build a better API for creating blog posts. The eventual purpose of AtomPub was to create a generic application programming interface for manipulating content on the Web. In his post titled AtomPub is a Failure, Joe Gregorio discussed why the technology failed to take off as follows

So AtomPub isn't a failure, but it hasn't seen the level of adoption I had hoped to see at this point in its life. There are still plenty of new protocols being developed on a seemingly daily basis, many of which could have used AtomPub, but don't. Also, there is a large amount of AtomPub being adopted in other areas, but that doesn't seem to be getting that much press, ala, I don't see any Atom-Powered Logo on my phones like Tim Bray suggested.

So why hasn't AtomPub stormed the world to become the one true protocol? Well, there are three answers:

Browsers

Browsers

Browsers

…
Thick clients, RIAs, were supposed to be a much larger component of your online life. The cliche at the time was, "you can't run Word in a browser". Well, we know how well that's held up. I expect a similar lifetime for today's equivalent cliche, "you can't do photo editing in a browser". The reality is that more and more functionality is moving into the browser and that takes away one of the driving forces for an editing protocol.
Another motivation was the "Editing on the airplane" scenario. The idea was that you wouldn't always be online and when you were offline you couldn't use your browser. The part of this cliche that wasn't put down by Virgin Atlantic and Edge cards was finished off by Gears and DVCS's.
…
The last motivation was for a common interchange format. The idea was that with a common format you could build up libraries and make it easy to move information around. The 'problem' in this case is that a better format came along in the interim: JSON. JSON, born of Javascript, born of the browser, is the perfect 'data' interchange format, and here I am distinguishing between 'data' interchange and 'document' interchange. If all you want to do is get data from point A to B then JSON is a much easier format to generate and consume as it maps directly into data structures, as opposed to a document oriented format like Atom, which has to be mapped manually into data structures and that mapping will be different from library to library.

As someone who has tried to both use and design APIs based on the Atom format, I have to agree that it is painful to have to map your data model to what is effectively a data format for blog entries instead of keeping your existing object model intact and using a better suited format like JSON.

The Common Pattern in these Failures

When I look at all three of these failures I see a common pattern which I’ll now be on the look out for when analyzing the suitability of technologies for my purposes. In each of these cases, the technology was designed for a specific niche with the assumption that the conditions that applied within that niche were general enough that the same technology could be used to solve a number of similar looking but very different problems.

The argument for OpenID is a lot stronger when limiting the audience to bloggers who all have a personal URL for their blog AND where it actually be a burden to sign up for an account on the millions of self hosted blogs out there. However it isn’t true that same set of conditions applies universally when trying to log-in or sign-up for the handful of websites I use regularly enough to decide I want to create an account.
XML arose from the world of SGML where experts created custom vocabularies for domain-specific purposes such as DocBook and EDGAR. The world of novices creating markup documents in a massively decoupled environment such as the Web needed a different set of underlying principles.
AtomPub assumed that the practice of people creating blog posts via custom blog editing tools (like the one I’m using the write this post) would be a practice that would spread to other sorts of web content and that these forms of web content wouldn’t be much distinguishable from blog posts. It turns out that most of our content editing still takes place in the browser and in the places where we do actually utilize custom tools (e.g. Facebook & Twitter clients), an object-centric domain specific data format is better than an XML-centric blog based data format.

So next time you’re evaluating a technology that is being much hyped by the web development blogosphere, take a look to see whether the fundamental assumptions that led to the creation of the technology actually generalize to your use case. An example that comes to mind that developers should consider doing with this sort of evaluation given the blogosphere hype is NoSQL.

Note Now Playing: Keri Hilson - Knock You Down (featuring Kanye West & Ne-Yo) Note

Categories: Web Development

January 23, 2011

@ 07:27 PM

Comments [3]

Some Thoughts on Quora Crossing the Chasm

I’ve slowly become a big fan of Quora. I’ve learned quite a few things which I’ve actually applied in my day job or excerpted for blog posts at work from various questions being answered on Quora. I can see why Robert Scoble asks Is Quora the biggest blogging innovation in 10 years? because this is the same way I felt when I first discovered knowledgeable technical people sharing insights about building software or just historical context on blogs several years ago.

Quora has smart people with significant pedigrees freely sharing information about how and why things work in various parts of the software industry. It is a thing of beauty to log-in and get gems like Steve Case answering questions the history of AOL, Ian McAllister sharing product management tips from the bowels of Amazon or Andrew Bosworth [and others at Facebook] giving explanations for why and how they built key features like Messages, Chat and the News Feed at Facebook.

I’m not the only one that has been impressed by their experience on Quora and specifically there has been a lot of hype about Quora on TechCrunch. Today TechCrunch published a contrasting opinion by Vivek Wadwha titled Why I Don’t Buy the Quora Hype where he calls interest in Quora a fad and pooh poohs the sites chances of becoming mainstream.

Although I like Quora, I do agree that the site faces key challenges if it is to ever break out of its niche. The primary challenge the site faces is that since it is more of a community like Reddit or MetaFilter not a networked communication tool like Facebook and Twitter, is that the user experience is likely to get worse as it grows more popular not better.

A few weeks ago I found a description of one of their attempts to solve the problem in a post by Charlie Cheever titled Commitment to Keeping Quora High Quality where he wrote

One thing we're trying to do a better job ASAP on is educating the new users that join the site and getting them up to speed on the policies, guidelines, and conventions as quickly as possible. Yesterday, we added a quick tutorial quiz before a user posts his/her first question.

So far, we've found that the quiz has helped make more of the questions that new users post conform to the site guidelines and require less editing from experienced users. We also made changes to the way the homepage feed works and when notifications are sent yesterday.
Over the next few months, we're going to be heavily investing engineering effort in:

Educating new users about site policies and guidelines

Improving the feed and voting ranking mechanisms

Changing the core product to accomodate a Quora with many more users and many more questions and answers and topics

Building special tools to support the efforts of reviewers and admins to improve the site and maintain civility and generally make it more fun to make Quora better

What I found odd about all of the above efforts is that none of them seems to try to keep the magic of what makes Quora more interesting than Yahoo! Answers, Facebook Questions or Stack Overflow. Quora is interesting because the quality of the answers is amazing due to the fact that questions are often answered by the some of the most knowledgeable people on the topic. So the key problems to solve the preserve the Quora experience is really “how do you encourage subject matter experts to flock to the site and answer questions?”

The folks at Quora have already posted a follow up to the aforementioned post titled Scaling Up where some of the approaches above are already being called into question and there is a nod towards highlighting the high quality users. That post is excerpted below

Up until a few days ago, new questions and answers from new Quora users were all being reviewed by users (reviewers and admins) who had demonstrated over a period of time an understanding of the spirit, policies, and guidelines of Quora. There is now too much new content being posted on Quora to handle this in the same way.
…
Concretely, some of the projects we are working on in this area are:

(1) Getting many more people to participate in the evaluation of new content on the site. For people who want to see the newest content on Quora that might be good or might be bad, we want to let you opt in to evaluating the new stuff in mostly the same way that you browse the site. Most of the people who use Quora have pretty good judgement, and we believe there is some wisdom in crowds. Preliminarily, this approach is very promising

(2) We're developing an algorithm to determine user quality. The algorithm is somewhat similar to PageRank but since people are different from pages on the web and the signals that are available on Quora are different from those on the web, it's not exactly the same problem. We'll use this to help decide what to show in feeds, when to send notifications, and how to rank answers.

(3) Explaining Quora better to new users before they add content to the site. We added a very short tutorial quiz before new users add new questions and it made a big difference in reducing the number of questions that don't meet guidelines or policies.

The thing I still don’t see clearly here is a focus on catering to the high quality answerers that have made Quora more buzzworthy than innovative competitors like the StackExchange family of sites or mainstream Q&A sites like Yahoo! Answers. The question the folks at Quora should be asking themselves is what are they doing to not only have Steve Case continue to answer questions on the site but perhaps get similar quality answerers from other industries (e.g. Jack Welch or Russell Simmons).

Here I believe there is something Quora can learn from Q&A sites like StackOverflow and from sites that have attracted celebrity users like Twitter. Some things that I think would be useful to see implemented on the site to retain and attract quality answerers would include

Democratize voting on quality of questions. Although Quora has started quizzing users before they ask a question as a way to keep quality high. It would be even better if users could vote on the quality of questions so that the more interesting ones got a wider audience. Similarly, being able to mark questions as duplicates so answerers don’t keep seeing the same questions all the time which is a particular pain point with various answer forums would be valuable.
Better recognition of valuable users. The ability for people to provide topic-specific descriptions of their qualifications per topic area is a great idea. Of course, it does encourage to appeal to authority when judging their answers such as the case with someone posting a super long answer that doesn't answer the question but has "screen writer" in their qualifications being voted highly. Despite that, it is still useful to be able to look at a set of answers on movies and be able to tell which of the answerers is more qualified than others. Democratizing this by visibly showing which users have been judged by the community as being more valuable than others would be a useful addition. Whether it is copying StackOverflow badges or karma on reddit there is value both to readers of answers in determining which answerers are more trustworthy and to answerers in being able to get intangible value for the service they are providing to the Quora community. It is amazing how digital points systems like reputation scores, badges and achievements can motivate people and Quora can do more to harness this.
A better connection between people and their followers. People like Jack Welch and Russell Simmons are on Twitter communicating with hundreds of thousands of followers who’d like to learn from them and be inspired by their words. Twitter isn’t really great for conversations or lengthy answers to insightful questions. I believe Quora can fill the gap for such celebrities in the same way Twitter filled the original need of giving celebrities a direct channel to their fans without the media acting as intermediaries. Right now I have followers and people I follow on Quora but they are treated equivalent to “topics” in my feed and I there aren’t good facilities for us to communicate with each other on the site. Can you imagine if Twitter treated hashtags you’ve expressed an interest in and people you followed the same way in your stream? What Quora does is similar.

Note Now Playing: Chris Brown - Deuces (Remix) (featuring Drake, T.I., Kanye West, Fabolous, Rick Ross & André 3000) Note

Categories: Social Software

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for January, 2011 - Dare Obasanjo's weblog

The Failure of OpenID

The Failure of XML on the Web

The Specific Failure of the Atom Publishing Protocol

The Common Pattern in these Failures