SGML on the Web: A Failed Dream? - Dare Obasanjo's weblog

October 19, 2003

@ 07:56 PM

The people who got together to produce the XML 1.0 recommendation where motivated to do this because they saw a need for SGML on the Web. Specifically

their discussions focused on two general areas:

Classes of software applications for which HTML was an inadequate information format
Aspects of the SGML standard itself that impeded SGML's acceptance as a widespread information technology

The first discussion established the need for SGML on the web. By articulating worthwhile, even mission-critical work that could be done on the web if there were a suitable information format, the SGML experts hoped to justify SGML on the web with some compelling business cases.

The second discussion raised the thornier issue of how to "fix" SGML so that it was suitable for the web.

And thus XML was born. One might ask what classes of documents did HTML prove to be an inadequate format and for this Lou Burnard's presentation SGML on the Web: too little too soon, or too much too late? from seven years ago where he wrote

Lastly, what is wrong with HTML? Well, rather a lot, if we compare it with other general purpose document type definitions...Compare for example the following two declarations:
<!ELEMENT Book - - ((Title, TitleAbbrev?)?, BookInfo?, ToC?, LoT*, Preface*, (((%chapter.gp;)+, Reference*) | Part+ | Reference+ | Article+), (%appendix.gp;)*, Glossary?, Bibliography?, (%index.gp;)*, LoT*, ToC? ) +(%ubiq.gp;) >
<!ENTITY % html.content "HEAD, BODY"> <!ELEMENT HTML O O (%html.content)> <!ENTITY % body.content "(%heading | %text | %block | HR | ADDRESS)*"> <!ELEMENT BODY O O %body.content>

The first, from the DocBook dtd, makes explicit that books potentially contain a number of subcomponents, each of which is distinguishable, and has a proper place. The second, from the HTML 2.0 dtd, states that the body of an HTML document contains just about anything in just about any order....

HTML's permissiveness makes it difficult or impossible to do many of the things for which we go to the trouble of making information digitally accessible. Specifically, it is hard to:

validate document data structures (for example where documents are to be managed by database software)
impose editorial control (for example in co-operatively authored projects)
generate navigational aids such as tables of contents directly from the document itself
generate or manage cross-document (or even intra-document) links in anything other than an ad hoc and manual manner
address or manage objects smaller or larger than a single document
efficiently re-use document components

search within semantically significant components of a document

These are the problems that web authors supposedly had with HTML that would be fixed by bringing SGML to the Web (i.e. inventing XML). Seven years later, using XML and XML-related technologies to produce content for the Web does alievate a number of the issues with producing content with HTML but it is a far cry from actually having "SGML on the Web". What has instead happened is that XML and related technologies can be used to produce content for the Web but this content is placed "on the Web" as HTML.

The W3C's attempts to get people to author XML directly on the Web have mostly failed as can be seen by the dismal adoption rate of XHTML and in fact many [including myself] have come to the conclusion that the costs of adopting XHTML compared to the benefits are too low if not non-existent. There was once an expectation that content producers would be able to place documents conformant to their own XML vocabularies on the Web and then display would entirely be handled by stylesheets but this is yet to become widespread. In fact, at least one member of a W3C working group has called this a bad practice since it means that User Agents that aren't sophisticated enough to understand style sheets are left out in the cold.

Interestingly enough although XML has not been as successfully as its originators initially expected as a markup language for authoring documents on the Web it has found significant success as the successor to the Comma Separated Value (CSV) File Format. XML's primary usage on the Web and even within internal networks is for exchanging machine generated, structured data between applications. Speculatively, the largest usage of XML on the Web today is RSS and it conforms to this pattern.

A lot of the idiosyncracies of XML that tend developers tend to get hung up on are due to XML's legacy as a document authoring format. However in much the same way that Oak a programming language and environment designed for programming embedded systems transformed into Java a programming langauge and environment mostly used for building mid-tier applications so also has XML outgrown its roots.

Unfortunately a lot of people working on XML technologies today fail to understand its history but even worse a lot of those who know its history fail to realize that its usage scenarios and users have changed from what they originally thought.

Categories: XML

« The XML Litmus Test | Home | All That And A Bag Of Chips »

Tuesday, 21 October 2003 03:45:57 (GMT Daylight Time, UTC+01:00)

I agree that using XML to exchange human-readable documents over the Web hasn't happened, but I don't think this was the sole motivation behind producing the originally XML 1.0 Rec. What made XML happen was a desire to see SGML in the mainstream and a realization that

- SGML was never going to fly as a mainstream technology until a lot of the unnecessary cruft was removed

- W3C enjoyed a much greater degree of vendor support than ISO, and

- the Web was the focus of a lot of excitement and interest

The real dream was getting markup into the mainstream and I don't think that's failed.

I don't find the CSVish applications of XML that are prevalent on the Web at the moment terribly inspiring. But I think we will continue to see more and more interesting applications that take advantage of the rich structuring capabilities of XML, e.g. SVG and XForms.

I also think the use of XML for communication between applications on an intranet or even on a single machine is just as important as the use of XML on the Web. I see lots of interesting uses of XML here that go beyond anything CSVish. A good example is Infopath. I think that's just the sort of thing an SGMLer might have dreamed of back in 1996.

James Clark

Tuesday, 21 October 2003 06:46:21 (GMT Daylight Time, UTC+01:00)

>The real dream was getting markup into the mainstream and I don't think that's failed.

I would posit that HTML already brought markup to the mainstream. I don't think XML made markup any more mainstream than HTML did, it did however enjoy more hype than HTML ever did. Besides that I tend to agree with the rest of your statements.

Dare Obasanjo

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for SGML on the Web: A Failed Dream? - Dare Obasanjo's weblog