To follow up my post asking Is HTTP Content Negotiation Broken as Designed?, I found a post by Ian Hickson on a related topic. In his post entitled Content-Type is dead he writes

Browsers and other user agents largely ignore the HTTP Content-Type header, relying on undefined sniffing heuristics to determine what the content of a page really is.

  • RSS feeds are always sniffed, regardless of their MIME type, because, to quote a Safari engineer, "none of them have the right mime type".
  • The target of img elements is almost always assumed to be an image, regardless of the declared type.
  • IE in particular is well known for ignoring the Content-Type header, despite this having been the source of security bugs in the past.
  • Browsers have been forced to implement heuristics to handle text/plain files as binary because video files are widely served with the wrong MIME types.

Unfortunately, we're now at a stage where browsers are continuously having to reverse-engineer each other to determine why they are handling content differently. A browser can't afford to render any less content than a browser with more market share, because otherwise users won't switch, and the new browser will not be adopted.

I think it may be time to retire the Content-Type header, putting to sleep the myth that it is in any way authoritative, and instead have well-defined content-sniffing rules for Web content.

Ian is someone who's definitely been around the block when it comes to HTTP given that he's been involved in Web standards groups for several years and used to work on the Opera Web Browser. On the other side of the argument is Joe Gregorio who posts Content-Type is dead, for a short period of time, for new media-types, film at 11 which does an excellent job of the kind of dogmatic arguing based on theory that I criticized in my previous post. In this case, Joe uses the W3C Technical Architecture Groups (TAG) findings on Authoritative Metadata

MIME types and HTTP content negotiation are good ideas in practice that have failed to take hold on the Web. Arguing that this fact contravenes stuff written in specs from last decade or from findings by some ivory tower group of folks from the W3C seems like religous dogmatism and not fodder for decent technical debate. 

That said, I don't think MIME types should be retired. However I do think some Web/REST advocates need to look around and realize what's happening on the Web instead of arguing from an "ideal" or "theoretical" perspective.


Categories: Web Development
Tracked by: [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback]
"Content Negotiation is not Broken" ( [Trackback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback] [Pingback]

Thursday, April 13, 2006 2:12:21 AM (GMT Daylight Time, UTC+01:00)
As usual Dare you can always be counted on to bring the level of discourse down to ad hominems after making it perfectly clear you didn't read a word of what I said, on top of which you manage to technically miss the point and claim that Ian and I are talking about content negotiation. Good to see you haven't lost your touch.
Thursday, April 13, 2006 2:22:48 AM (GMT Daylight Time, UTC+01:00)
I don't see where I claimed Ian is talking about content negotiation. This is about stuff that works in theory and what actually happens in practice. In theory MIME types for resources on the Web is a good idea. In practice, they are often false and clients have to sniff content. The same goes for content negotiation.

I read your blog post and it amounted to "MIME types are a good idea in theory". Ian's was "MIME types aren't accurate in practice". If drawing those conclusions from your posts count as ad hominems, then you must have a weird definition of the phrase.

Have a nice day.

Thursday, April 13, 2006 3:42:07 AM (GMT Daylight Time, UTC+01:00)
We argue from an "ideal" perspective only insofar as we can learn what it costs us when the architecture strays from that ideal. In this case, REST tell us that using sniffing over extrinsic metadata costs us some security (and a little performance).

While Ian's analysis of the situation is accurate AFAICT, it in no way supports his claim that Content-Type should be retired so long as security is considered important. What it his analysis means is that we've got a lot of educating to do over the longhaul, which the TAG document helps with.
Thursday, April 13, 2006 7:03:25 AM (GMT Daylight Time, UTC+01:00)
Reading over your last few posts, I think it's important to keep in mind there are really two kinds of HTTP. One is HTTP-For-Browsers, and one is HTTP-For-APIs.

API end-points encounter a much wider variety of clients that actually have a user expecting something coherent--as opposed to bots. Many of those clients will have less-than robust HTTP stacks. So, it turns out your API end-points have to be much more compliant than whatever is serving your web pages.

I happened to be fishing around the Jakarta HTTPClient CookiePolicy code today, and I noticed there are three modes: NetscapeDraft, RFC2109, and BrowserCompatibility. I've also hit bugs in Apache Axis when a server breaks with RFC2616 and returns a relative reference in a Location header (browsers cope just fine).

wrt to Content-Type, I certainly wouldn't want to ditch it for requests. I expect to be able to write a form handler and have ASP.NET or whatever automatically reject incoming requests that have Content-Types other than "application/x-www-form-urlencoded" etc.
Thursday, April 13, 2006 2:06:12 PM (GMT Daylight Time, UTC+01:00)
Parody: XML well formedness and validity are good ideas in practice that have failed to take hold on the Web. Arguing that this fact contravenes stuff written in specs from last decade or from findings by some ivory tower group of folks from the W3C seems like religous dogmatism and not fodder for decent technical debate.

The truth is that smearing people and ideas with terms like "ivory tower" an "religous dogmatism" are more conducive for creating an atmosphere of carnage than for a decent technical debate.
Thursday, April 13, 2006 2:46:09 PM (GMT Daylight Time, UTC+01:00)
That's a fair comment and for what it's worth I did say that I don't agree that MIME types should be retired. I'm approaching the discussion around HTTP and REST from the perspective of someone who's been pitched and has pitched to others that building RESTful web services is a good idea because that's how the Web works.

A couple of people have picked at this idea by pointing out that there is a difference between what REST advocates claim is how to build RESTful web services and how the Web r-e-a-l-l-y works. At the end of the day I'm trying to get Don's folks to spend their $100 building a platform that I can use to build distributed applications. This means trying to separate the wheat from the chaff when it comes ideas around what is important for building REST/POX web services.
Thursday, April 13, 2006 4:53:26 PM (GMT Daylight Time, UTC+01:00)
In terms of “how the Web r-e-a-l-l-y works” is the charset parameter on the Content-Type header “wheat” or “chaff”?
Thursday, April 13, 2006 5:28:56 PM (GMT Daylight Time, UTC+01:00)
I think reporting accurate MIME types is a good idea because it is better for consumers as you point out with your charset example. On the other hand, I'm not so sure about supporting the Accept header and all the baggage of HTTP content negotiation on input.

That I have to think about more.
Friday, April 14, 2006 1:47:11 PM (GMT Daylight Time, UTC+01:00)
While the accept header is how you segued into this discussion, Ian's and Joe's posts were explicitly about the Content-Type header.

Relevant to both discussions, my weblog varies the Content-Type header it returns based on the Accept header it receives, as there is at least one popular browser that does not support application/xhtml+xml.

So... Content-Type AND charset are very relevant to IE7. But are completely ignored by RSSBandit. If you want to talk about “how the Web r-e-a-l-l-y works”, you need to first recognize that you are talking about two very different webs with different set of rules. When you talk about how you would invest Don's $100, which web are you talking about?

This is not a matter of "ivory tower" or "religous dogmatism". It is a matter of simple observations of how existing products work. And how people can cope with and interoperate with a variety of divergent implementations, some of which are kept up to date, and others which aren't quite so.
Friday, April 14, 2006 4:51:35 PM (GMT Daylight Time, UTC+01:00)
Ian's conclusions in his post, don't meet the results of actually doing the test - for example he says IE gets everything all wrong, when it only fails 3 of the test cases he includes.

It's a shame when the conclusions are so at variance to the available data, it suggests that the author was looking for some particular result.
Tuesday, January 2, 2007 9:45:27 AM (GMT Standard Time, UTC+00:00)
Why will this post not print correctly in IE6? (I'm on my machine that I keep for IE6 testing.)
Dare, can you look at your CSS style sheets and implement make it so your blog prints cleanly? I print 4up (2up, front and back using FinePrint) so I can read, make notes, and file for reference and your blog prints poorly on IE7 & Firefox too. Just search for "CSS Printing" to see examples how (assuming you don't already know...)
Comments are closed.