February 23, 2008
@ 04:00 AM

About five years ago, I was pretty active on the XML-DEV mailing list. One of the discussions that cropped up every couple of weeks (aka permathreads) was whether markup languages could be successful if they were not simple enough that a relatively inexperienced developer could “View Source” and figure out how to author documents in that format. HTML (and to a lesser extent RSS) are examples of the success of the “View Source” principle. Danny Ayers had a classic post on the subject titled The Legend of View ‘Source’ which is excerpted below

Q: How do people learn markup?
A: 'View Source'.

This notion is one of the big guns that gets wheeled out in many permathreads - 'binary XML', 'RDF, bad' perhaps even 'XML Schema, too
complicated'. To a lot of people it's the show stopper, the argument that can never be defeated. Not being able to view source is the reason format X died; being able to view source is the reason for format Y's success.

But I'm beginning to wonder if this argument really holds water any more. Don't get me wrong, I'm sure it certainly used to be the case, that many people here got their initial momentum into XML by looking at that there text. I'm also sure that being able to view existing source can be a great aid in learning a markup language. What I'm questioning is whether the actual practice of 'View Source' really is so widespread these days, and more importantly whether it offers such benefits for it to be a major factor in language decisions. I'd be happy with the answer to : are people really using 'View Source' that much? I hear it a lot, yet see little evidence.

One last point, I think we should be clear about what is and what isn't 'View Source'. If I need an XSLT stylesheet the first thing I'll do is open an existing stylesheet and copy and paste half of it. Then I'll get Michael's reference off the shelf.  I bet a fair few folks here have the
bare-bones HTML 3.2 document etched into their lower cortex. But I'd argue that nothing is actually gained from 'View Source' in this, all it is is templating, the fact that it's a text format isn't of immediate relevance.

The mistake Danny made in his post was taking the arguments in favor of “View Source” literally. In hindsight, I think the key point of the “View Source” clan was that it is clear that there is a lot of cargo cult programming that goes on in the world of Web development. Whether it is directly via using the View Source feature of popular Web browsers or simply cutting and pasting code they find at places like quirks mode, A List Apart and W3C Schools, the fact is that lots of people building Web pages and syndication feeds are using technology and techniques they barely understand on a daily basis.

Back in the days when this debate came up, the existence of these markup cargo cults was celebrated because it meant that the ability to author content on the Web was available to the masses which is still the case today (Yaaay, MySpace Wink ). However there has been a number of down sides to the wide adoption of [X]HTML, CSS and other Web authoring technologies by large numbers of semi-knowledgeable developers and technologically challenged content authors.

One of these negative side effects has been discussed to death in a number of places including the article Beyond DOCTYPE: Web Standards, Forward Compatibility, and IE8 by Aaron Gustafson which is excerpted below

The DOCTYPE switch is broken

Back in 1998, Todd Fahrner came up with a toggle that would allow a browser to offer two rendering modes: one for developers wishing to follow standards, and another for everyone else. The concept was brilliantly simple. When the user agent encountered a document with a well-formed DOCTYPE declaration of a current HTML standard (i.e. HTML 2.0 wouldn’t cut it), it would assume that the author knew what she was doing and render the page in “standards” mode (laying out elements using the W3C’s box model). But when no DOCTYPE or a malformed DOCTYPE was encountered, the document would be rendered in “quirks” mode, i.e., laying out elements using the non-standard box model of IE5.x/Windows.

Unfortunately, two key factors, working in concert, have made the DOCTYPE unsustainable as a switch for standards mode:

  1. egged on by A List Apart and The Web Standards Project, well-intentioned developers of authoring tools began inserting valid, complete DOCTYPEs into the markup their tools generated; and
  2. IE6’s rendering behavior was not updated for five years, leading many developers to assume its rendering was both accurate and unlikely to change.

Together, these two circumstances have undermined the DOCTYPE switch because it had one fatal flaw: it assumed that the use of a valid DOCTYPE meant that you knew what you were doing when it came to web standards, and that you wanted the most accurate rendering possible. How do we know that it failed? When IE 7 hit the streets, sites broke.

Sure, as Roger pointed out, some of those sites were using IE-6-specific CSS hacks (often begrudgingly, and with no choice). But most suffered because their developers only checked their pages in IE6 —or only needed to concern themselves with how the site looked in IE6, because they were deploying sites within a homogeneous browserscape (e.g. a company intranet). Now sure, you could just shrug it off and say that since IE6’s inaccuracies were well-documented, these developers should have known better, but you would be ignoring the fact that many developers never explicitly opted into “standards mode,” or even knew that such a mode existed.

This seems like an intractible problem to me. If you ship a version of your software that is more standards compliant than previous versions you run the risk of breaking applications or content that worked in previous versions. This reminds me of Windows Vista getting the blame because Facebook had a broken IPv6 record. The fact is that the application can claim it is more standards compliant but that is meaningless if users can no longer access their data or visit their favorite sites. In addition, putting the onus on Web developers and content authors to always write standards compliant code is impossible given the acknowledged low level of expertise of said Web content authors. It would seem that this actually causes a lot of pressure to always be backwards (or is that bugwards) compatible. I definitely wouldn’t want to be in the Internet Explorer team’s shoes these days.

It puts an interesting wrinkle on the exhortations to make markup languages friendly to “View Source” doesn’t it?

Now playing: Green Day - Welcome To Paradise