The Two Webs - Dare Obasanjo's weblog

April 17, 2006

@ 03:05 PM

I'm still continuing my exploration of the philosophy behind building distributed applications following the principles behind the REpresentational State architectural style (REST) and Web-style software. Recent comments in my blog have introduced a perspective that I hadn't considered much before.

Robert Sayre wrote

Reading over your last few posts, I think it's important to keep in mind there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

API end-points encounter a much wider variety of clients that actually have a user expecting something coherent--as opposed to bots. Many of those clients will have less-than robust HTTP stacks. So, it turns out your API end-points have to be much more compliant than whatever is serving your web pages.

Sam Ruby wrote

While the accept header is how you segued into this discussion, Ian's and Joe's posts were explicitly about the Content-Type header.

Relevant to both discussions, my weblog varies the Content-Type header it returns based on the Accept header it receives, as there is at least one popular browser that does not support application/xhtml+xml.

So... Content-Type AND charset are very relevant to IE7. But are completely ignored by RSSBandit. If you want to talk about “how the Web r-e-a-l-l-y works”, you need to first recognize that you are talking about two very different webs with different set of rules. When you talk about how you would invest Don's $100, which web are you talking about?

This is an interesting distinction and one that makes me re-evaluate my reasons for being interested in RESTful web services. I see two main arguments for using RESTful approaches to building distributed applications on the Web. The first is that it is simpler than other approaches to building distributed applications that the software industry has cooked up. The second is that it has been proven to scale on the Web.

The second reason is where it gets interesting. Once you start reading articles on building RESTful web services such as Joe Gregorio's How to Create a REST Protocol and Dispatching in a REST Protocol Application you realize that how REST advocates talk about how one should build RESTful applications is actually different from how the Web works. Few web applications support HTTP methods other than GET and POST, few web applications send out the correct MIME types when sending data to clients, many Web applications use cookies for storing application state instead of allowing hypermedia to be the engine of application state (i.e. keeping the state in the URL) and in a suprisingly large number of cases the markup in documents being transmitted is invalid or malformed in some ways. However the Web still works.

REST is an attempt to formalize the workings of the Web ex post facto. However it describes an ideal of how the Web works and in many cases the reality of the Web deviates significantly from what advocates of RESTful approaches preach. The question is whether this disconnect invalidates the teachings of REST. I think the answer is no.

In almost every case I've described above, the behavior of client applications and the user experience would be improved if HTTP [and XML] were used correctly. This isn't supposition, as the developer of an RSS reader my life and that of my users would be better if servers emitted the correct MIME types for their feeds, the feeds were always at least well-formed and feeds always pointed to related metadata/content such as comment feeds (i.e. hypermedia is the engine of application state).

Let's get back the notion of the Two Webs. Right now, there is the primarily HTML-powered Web which whose primary clients are Web browsers and search engine bots. For better or worse, over time Web browsers have had to deal with the fact that Web servers and Web masters ignore several rules of the Web from using incorrect MIME types for files to having malformed/invalid documents. This has cemented hacks and bad practices as the status quo on the HTML web. It is unlikely this is going to change anytime soon, if ever.

Where things get interesting is that we are now using the Web for more than serving Web documents for Web browsers. The primary clients for these documents aren't Web browsers written by Microsoft and ~~Netscape~~ ~~AOL~~ Mozilla and bots from a handful of search engines. For example, with RSS/Atom we have hundreds of clients with more to come as the technology becomes more mainstream. Also Web APIs becoming more popular, more and more Web sites are exposing services to the world on the Web using RESTTful approaches. In all of these examples, there is justification in being more rigorous in the way one uses HTTP than one would be when serving HTML documents for one's web site.

In conclusion, I completely agree with Robert Sayre's statement that there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

When talking about REST and HTTP-For-APIs, we should be careful not to learn the wrong lessons from how HTTP-For-Browsers is used today.

Categories: Web Development | XML Web Services

« My Prayers Have Been Answered | Home | Simplifying Your Life »

Monday, 17 April 2006 22:29:16 (GMT Daylight Time, UTC+01:00)

"...you realize that how REST advocates talk about how one should build RESTful applications is actually different from how the Web works. Few web applications support HTTP methods other than GET and POST, few web applications send out the correct MIME types when sending data to clients, many Web applications use cookies for storing application state...However the Web still works."

I disagree. I like to say, "The Web only works because we allow it to be broken." But this doesn't imply that REST describes the Web. I believe it describes constraints upon what is possible with the Web, "When we let the Web work as intended our sites will really hum." What Dr. Fielding describes in his dissertation is a "layer" or wrapper approach for integrating legacy software.

REST describes an approach not just for developing applications, but for solving everyday web-development problems. I've applied REST's constraint-based problem-solving approach to modern CMS integration, with the goal of making any wrapped application conform to a fixed set of constraints as per the REST model, but even more restrictive.

We initially had WordPress, vBulletin and PmWiki on one domain sharing the same XSLT templates and minimal CSS styles, while delivering XHTML 1.1 as 'application/xhtml+xml' to some browsers and HTML 4.01 as 'text/html' to others. Use the CMS to edit a resource (by using a PUT on its URL even though those apps only accept POSTs from our wrapper), and it's changed in all representations, which are cached as output streams not written as files.

But we had to have the wiki in /wiki/ the forum in /forum/ and the blog app in /blog/ to make it work. No more. By following REST's constraint-based approach and innovating a little bit, the new version can mix and match applications at will. We can even have a shared weblog, under one directory with the same naming convention in the URLs, where one author uses WordPress to post and the other author uses Movable Type with WP running locally and MT on a remote machine, without anyone being able to tell the difference with 'view source'.

http://www.iwdn.net/showthread.php?t=3799

That link may interest you, as it illustrates what I mean although it starts off quoting this blog... and mocking your URL, sorry 'bout that... See also:

http://lists.w3.org/Archives/Public/uri/2006Apr/0002.html

Anyway, I mostly agree with this post but I think you've missed the point of REST a bit with your ex-post-facto characterization, and I think REST applies to how HTTP-For-Browsers is used today, too. REST has provided me with a framework for the development of a series of simple, efficient, here-and-now solutions (to the point of devising a use for cookies other than storing client state) to a host of complex (and longstanding) problems which arise from a typical marketplace desire for multi-CMS integration on a website using small, best-of-breed apps rather than a single monolithic solution, including SSO and integrated search.

Eric J. Bowman

Tuesday, 18 April 2006 06:28:38 (GMT Daylight Time, UTC+01:00)

Hi Dare, could you rewrite this without use the acronym REST or its four-word text label? I'd like to see what it is you'd like to do, thanks.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Two Webs - Dare Obasanjo's weblog