April 17, 2006
@ 03:05 PM

I'm still continuing my exploration of the philosophy behind building distributed applications following the principles behind the REpresentational State architectural style (REST) and Web-style software. Recent comments in my blog have introduced a perspective that I hadn't considered much before. 

Robert Sayre wrote

Reading over your last few posts, I think it's important to keep in mind there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

API end-points encounter a much wider variety of clients that actually have a user expecting something coherent--as opposed to bots. Many of those clients will have less-than robust HTTP stacks. So, it turns out your API end-points have to be much more compliant than whatever is serving your web pages.

Sam Ruby wrote

While the accept header is how you segued into this discussion, Ian's and Joe's posts were explicitly about the Content-Type header.

Relevant to both discussions, my weblog varies the Content-Type header it returns based on the Accept header it receives, as there is at least one popular browser that does not support application/xhtml+xml.

So... Content-Type AND charset are very relevant to IE7. But are completely ignored by RSSBandit. If you want to talk about “how the Web r-e-a-l-l-y works”, you need to first recognize that you are talking about two very different webs with different set of rules. When you talk about how you would invest Don's $100, which web are you talking about?

This is an interesting distinction and one that makes me re-evaluate my reasons for being interested in RESTful web services. I see two main arguments for using RESTful approaches to building distributed applications on the Web.  The first is that it is simpler than other approaches to building distributed applications that the software industry has cooked up. The second is that it has been proven to scale on the Web.

The second reason is where it gets interesting. Once you start reading articles on building RESTful web services such as Joe Gregorio's How to Create a REST Protocol and Dispatching in a REST Protocol Application you realize that how REST advocates talk about how one should build RESTful applications is actually different from how the Web works. Few web applications support HTTP methods other than GET and POST, few web applications send out the correct MIME types when sending data to clients, many Web applications use cookies for storing application state instead of allowing hypermedia to be the engine of application state (i.e. keeping the state in the URL) and in a suprisingly large number of cases the markup in documents being transmitted is invalid or malformed in some ways. However the Web still works. 

REST is an attempt to formalize the workings of the Web ex post facto. However it describes an ideal of how the Web works and in many cases the reality of the Web deviates significantly from what advocates of RESTful approaches preach. The question is whether this disconnect invalidates the teachings of REST. I think the answer is no. 

In almost every case I've described above, the behavior of client applications and the user experience would be improved if HTTP [and XML]  were used correctly. This isn't supposition, as the developer of  an RSS reader my life and that of my users would be better if servers emitted the correct MIME types for their feeds, the feeds were always at least well-formed and feeds always pointed to related metadata/content such as comment feeds (i.e. hypermedia is the engine of application state).

Let's get back the notion of the Two Webs. Right now, there is the primarily HTML-powered Web which whose primary clients are Web browsers and search engine bots. For better or worse, over time Web browsers have had to deal with the fact that Web servers and Web masters ignore several rules of the Web from using incorrect MIME types for files to having malformed/invalid documents. This has cemented hacks and bad practices as the status quo on the HTML web. It is unlikely this is going to change anytime soon, if ever.

Where things get interesting is that we are now using the Web for more than serving Web documents for Web browsers. The primary clients for these documents aren't Web browsers written by Microsoft and Netscape AOL Mozilla and bots from a handful of search engines. For example, with RSS/Atom we have hundreds of clients with more to come as the technology becomes more mainstream. Also Web APIs becoming more popular, more and more Web sites are exposing services to the world on the Web using RESTTful approaches. In all of these examples, there is justification in being more rigorous in the way one uses HTTP than one would be when serving HTML documents for one's web site. 

In conclusion, I completely agree with Robert Sayre's statement that there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

When talking about REST and HTTP-For-APIs, we should be careful not to learn the wrong lessons from how HTTP-For-Browsers is used today.