In a recent mail on the ietf-types mailing list Larry Masinter (one of the authors of the HTTP 1.1 specification) had the following to say about content negotiation in HTTP

> > GET /models/SomeModel.xml HTTP/1.1
>
> Host: www.example.org
>
> Accept: application/cellml-1.0+xml; q=0.5, application/cellml-1.1+xml; q=1

HTTP content negotiation was one of those "nice in theory" protocol additions that, in practice, didn't work out. The original theory of content negotiation was worked out when the idea of the web was that browsers would support a handful of media types (text, html, a couple of image types), and so it might be reasonable to send an 'accept:' header listing all of the types supported. But in practice as the web evolved, browsers would support hundreds of types of all varieties, and even automatically locate readers for content-types, so it wasn't practical to send an 'accept:' header for all of the types.

So content negotiation in practice doesn't use accept: headers except in limited circumstances; for the most part, the sites send some kind of 'active content' or content that autoselects for itself what else to download; e.g., a HTML page which contains Javascript code to detect the client's capabilities and figure out which other URLs to load. The most common kind of content negotiation uses the 'user agent' identification header, or some other 'x-...' extension headers to detect browser versions, among other things, to identify buggy implementations or proprietary extensions.

I think we should deprecate HTTP content negotiation, if only to make it clear to people reading the spec that it doesn't really work that way in practice. .

HTTP content negotiation has always seemed to me something that seems like a good idea in theory but didn't really seem to work out in practice. It's good to see one of the founding fathers of HTTP actually admit that it is an example of theory not matching reality. It's always good to remember that just because something is written in a specification from some standards body doesn't make it a holy writ. I've seen people debate online who throw out quotes from Roy Fieldings's dissertation and IETF RFCs as if they are evangelical preachers quoting chapter and verse from the Holy Bible.

Some of the things you find in specifications from the W3C and IETF are good ideas. However they are just that ideas. Sometimes technological advances make these ideas outdated and sometimes the spec authors simply failed to consider other perspectives for solving the problem at hand. Expecting a modern browser to send an itemized list of every file type that can be read by the applications on your operating system on every single GET request plus the priority in which these file types are preferred is simply not feasible or really useful in practice. It may have been a long time ago but not now. 

Similar outdated and infeasible ideas litter practically every W3C and IETF specification out there. Remember that the next time you quote chapter and verse from some Ph.D dissertation or IETF/W3C specification to justify a technology decision. Supporting standards is important but more important is applying critical thinking to the problem at hand. .

Thanks to Mark Baker for the link to Larry Masinter's post.


 

Wednesday, April 12, 2006 5:03:24 PM (GMT Daylight Time, UTC+01:00)
I don't disagree with most your points re. specifications, but I don't think this is a particularly good example. Conneg for the regular web browser may be effectively outdated, but that doesn't make it any less a good idea for other client tools.

The aggregator isn't such a bad example, where it may only understand say two different media types, but strongly prefer application/rss+xml over text/xml. For the server it may make a lot more sense to advertise a single URI for the feed resource. More generally, just because a large proportion of the web is sloppy on mime types when it comes to XML, doesn't necessarily mean there's nothing to be gained from following the specs. Anything that helps reduce ambiguity makes programming easier.

Actually, because of this I'm not sure how far you can stretch the analogy between referring to specs and quoting the Bible. For things like HTTP they're the nearest thing to The Manual we have. Critical thinking is to be encouraged, but it's hard to be omniscient in an environment like the web. As with other technologies, if you ignore the manual you shouldn't be too surprised when things break.
Wednesday, April 12, 2006 7:14:10 PM (GMT Daylight Time, UTC+01:00)
Danny,
Even the aggregator example is fishy. Most aggregators probably support three main media types (RSS 1.0 family, RSS 2.0 family and Atom family). However once you throw podcasts/enclosures into the mix then the problem becomes just as messy. One can decide whether to serve QuickTime vs. RealAudio vs. Windows Media files based on the capability of the podcast client.

Anyway, my point was that spec text isn't The Word Of God even though lots of folks tend to invoke it that way. YMMV.
Friday, April 14, 2006 10:03:21 PM (GMT Daylight Time, UTC+01:00)
Well, 'Accept:' wasn't broken as designed *when it was designed*; it was that the context changed. When the context changes, protocols need to adapt.

You must remember that HTTP added 'Accept:' before MarkA added IMG tags to HTML -- there were no image types at all.

(And, for that matter, there wasn't a problem that each HTTP request required a new TCP connection, because each web page required exactly one HTTP request.)

'Deprecate' was a bit of hyperbole. We should warn people that the appplicability of 'accept:' is limited. I'd still leave it in the standard.

Comments are closed.