In his post I Want RELAX NG! Tim Ewald writes

This recent post on Mark Nottingham's site pushed me over the edge. I agree with Sean's comment: I want Relax NG. Can I make systems work with XSD? Yes, sort of. But it adds a ludicrous amount of complexity. First you have to know how it works, then what not to do because it's too complicated (like complicated type or element substitution models), then figure out how to contort your schema to do what you want (like extensibility and versioning). Relax NG is much simpler and much closer to how XML actually works. And yes, you can still map it to /from objects if you want to.

I can't help but wonder why, if WS-* and SOAP 1.2 keep XSD at arms length (referencing simple types only and providing non-normative schema definitions) and WSDL 2.0 defines its own simple types, everyone assumes I want to use XSD to define my Web service interface. Pretty much everyone I know who works in this space agrees that Relax NG is a better choice. What is stopping us from making this change?

This is one of those times where I both agree and disagree with Tim. To explain why, I first need to list the two three reasons people tend to write schemas.

  1. To provide a way to annotate an XML document with type information and thus created a type annotated infoset.
  2. To provide a means to ensure that an XML documents satisfies the constraints of a given message contract
  3. To provide terse, human readable documentation of an XML format.

In most developer scenarios [including XML Web Services] the most popular use case is the first from the list above. An XML Schema is used primarily for mapping the contents of an XML document either into relational tables (e.g. SQLXML, ADO.NET DataSet) or into a set of programming language objects (e.g. System.Xml.Serialization.XmlSerializer). Every XML Web Service toolkit I have encountered emphasizes this scenario and in fact most customers do not use XML schemas for validation of business documents for either performance reasons or the fact that their business rules cannot be adequately described using an XML schema. The main problem with XSD for this use case is that it is actually too expressive and has a richer type system than either the relational model or traditional object oriented programming languages. This leads to impedance mismatches which makes it hard for XML Web Service stacks to map schema declarations to objects thus leading to calls from folks like the WS-I to propose creating a subset or profile of XSD.

On the other hand, XSD is notoriously bad at dealing with the second use case described above. The language makes either makes it hard to describe common XML idioms (see the hoops I have to jump through in my Designing Extensible, Versionable XML Formats article) or impossible (e.g. if an attribute has a certain value then the element should have a certain content model or the providing a choice of attributes). This is where RELAX NG shines. Of course, being more expressive than XSD means that the impedance mismatch between it and the relational and OO models is even more significant. 

In practice today, most XML Web Services need an XML schema language for creating type annotated infosets not for validating message structure. This means that for their use cases XSD is preferable to RELAX NG. Ideally, a simple language that just allowed creating named structures and primitive types such as Microsoft's now-obsolete XML Data Reduced (XDR) would be even more optimal.  

Of course, the XML Web Services world could one day evolve to the point where being able to validate incoming messages against a schema is deemed more important than being able to deserialize the XML into objects and vice versa. In which case, Aaron Skonnard's statement in his post Could RelaxNG Replace XSD? which describe the existing industry inertia around XSD is also a point to consider.


 

Categories: XML

August 18, 2004
@ 10:16 AM

I saw the following excerpt in Shelley Powers's post entitled Differences of Humor where she wrote

Sam Ruby has posted a note about the upcoming Applied XML Conference put on by Chris Sells. When I looked at the agenda and realized that the conference managed to put together two days worth of presentations without one woman speaker,

Knowing the nature of Chris Sells's conferences this is unsurprising. They seem to mostly be an opportunity for Chris's DevelopMentor clique and their buddies to hang out. However Shelley's post did make me start thinking about how many women I knew who worked with XML and just like the time I started to keep a list of Seinfeld episodes in which at least one African or African American appeared in (don't ask) I started tracking down the number of women I knew off who worked on XML technologies who's works I'd rather see present than at least one of the presentations currently on the roster. Here is my list

Non-Microsoft

  • Eve Maler - Sun's most notable XML geek after Jon Bosak and Tim Bray. She's worked on SAML and UBL. I meet her at XML 2003 where we chatted about versioning in UBL and what truly meant by polymorphic XML processing.

  • Jeni Tennison - the most knowledgeable person on the planet about W3C XML Schema. I've lost count of the amount of times I've seen her school members of the W3C XML Schema working group about the technology on various mailing lists. Also an XSLT and XPath guru. She's always pushing boundaries in the XML world such as with her work on layered hierarchies in markup vocabularies with LMNL

  • Priscilla Walmsley - the author of Definitive XML Schema which is probably the best book on W3C XML Schema on the market. She's also co-written a book on XML in Office 2003 which I haven't read but would love to get a presentation on especially with regard to some finer details on how Office uses XML schemas. 

  • Amelia Lewis - a co-author of the WS-ReliableMessaging specification and the author of an excellent critique of the W3C XML Schema primitive types in her article Not My Type: Sizing Up W3C XML Schema Primitives

Microsoft

  • Elena - the Microsoft XML Web Service stack rests on her shoulders. What makes Visual Studio.NET an awesome XML Web Service environment is that there is functionality that lets you point at a WSDL and automatically you get handy dandy .NET classes generated for you. Elena owns the meat of this code, a lot of which resides in the XmlSerializer class

  • Denise Draper - an architect on our team who in a past life has been a member of the XQuery working group, worked an XML data integration suite for Nimble Technology and worked in the AI field.

  • Priya Lakshminarayanan - the developer for the W3C XML Schema validation technology in the .NET Framework. She's the most knowledgeable about the technology at Microsoft, I'm a distant second to her breadth of knowledge about this somewhat arcane and cryptic technology. She the first person I've seen implement a tool for generating sample XML documents from XML schemas that didn't suck.

  • Helena Kupkova - the developer for the XML parser in the .NET Framework. She completely gutted our old implementation and doubled the perf in some scenarios. A totally impressive developer. More impressive is that she ships stuff like the XML Diff and Patch demo on GotDotNet in her spare time.

  • Nithya Sampathkumar - the developer on the XML schema inference technology in the .NET Framework. Once I took over as the program manager for this technology I grew to understand the subtleties involved in trying to infer a schema for arbitrary XML documents. A presentation on the techniques used in her implementation and the limitations of XML schema inference would be quite interesting.

  • Neetu Rajpal - the program manager for XML tools in Visual Studio. I've overheard some interesting conversations involving her discussing some of the trickiness involved in implementing an XSLT debugger. An in-depth presentation about what the XML tools team is planning to ship and the issues they encountered would be killer.

  • Vinita - the program manager for MSXML which is the most widely deployed XML library on the planet. Even without shipping in Internet Explorer, Windows and Office they still get millions of downloads a year.

  • Tejal Joshi - works on the XML tools in Visual Studio. At last year's XML 2003 conference I enjoyed hearing James Clark discuss implementation strategies for his nxml-mode in Emacs. I'm sure Tejal would have similarly interesting stories to tell.

  • Lanqing Dai - used to be developer for the XmlDocument class but has moved on to WinFS. I'd love to hear a her thoughts on how working in an XML-centric world compares to living in the item-centric world of WinFS.

There are more women I know off in the XML field both within and outside Microsoft but these are the ones whose presentations I'd rather see than something like XML as a Better COM (for example). Maybe next time Chris Sells should look around the usual XML hang outs both online (like the xml-dev mailing list) and within Microsoft internally for conference speakers instead of announcing them in his blog. It may lead to a more diverse list of topics and speakers.

I need to go watch Berserk. Talk to you guys later.


 

Categories: XML

My issue of Playboy came in the mail so I got to read the the infamous Google interview. If you don't have a Playboy subscription or balk at buying the magazine from the newstands you can get the interview from Google's amended SEC filings. I didn't read the entire interview but there were no surprises in what I read.

I was recently talking to a coworker who's on the fence about whether to go to Google or stay at Microsoft and it was interesting talking about the pros and cons of both companies. As we talked Google began to remind me of Netscape in its heyday. A company full of bright, young guys who've built a killer application for the World Wide Web and is headed for a monster IPO. The question is whether Google will squander their lead like Netscape did (Yes, I realize my current employer may have had something to do with that) or whether they'll be the next Yahoo!

There are a couple of things Google has done over the past few years that have made me wonder whether the company has enough adult supervision and business acumen to rise above being a one trick pony in the constantly changing Internet landscape. Some of them are touched on by Larry and Sergey in their interview

  1. http://www.google.com is non-sticky: Nothing on the main Google site encourages the user to hang around the site or even return to the website besides the quality of the search results. According to the company's founders this is by design. The problem with this reasoning is that if and when its competitors such as MSN Search and Yahoo! Search get good enough there isn't anything keeping people tied to the site. It seems unfathomable now but there was a time that it seemed unfathomable that anyone would use anything besides AltaVista or Excite to search the Web. It's happened before and it can happen again. Google seems ill-prepared for this occurence.

  2. Inability to tie together disparate offerings: The one thing that has separated Yahoo! from all the Web portals that were all the rage a couple of years ago is that it managed to tie its many offerings into a single cohesive package with multiple revenue streams. The Yahoo! experience seamlessly ties in My Yahoo!, Yahoo! Groups (formerly eGroups), Yahoo! Calendar, Yahoo! Maps, Yahoo! Shopping, Yahoo! Finance, Yahoo! News, Yahoo! Movies, Yahoo! Messenger and the Yahoo! Companion. I use most of these Yahoo sites and tools on a daily basis and use all of them at least once a month. Besides advertising related to search there are several entry points for Yahoo! to get revenue from me.

    Compare this to Google which although has a number of other offerings available from the Google website has a number of offerings they haven't figured out how to make synergistic such as their purchase of Blogger or sites like Orkut. Yahoo! would have gotten a lot more mileage out of either site than Google currently has done. Another aspect of this issue is gleaned from this excerpt from a post by Dave Winer entitled Contact with Google

    Another note, I now have four different logins at Google: Orkut, AdSense, Blogger and Gmail. Each with a different username and password. Now here's an area where Google could be a leader, provide an alternative to Passport, something we really need, a Google-size problem.

    Yahoo! has a significantly larger number of distinct offerings yet I access all of them through a single login. This lack of cohesiveness indicates that either there isn't a unified vision as to how to unite this properties under a single banner or Google has been unable to figure out how to do so.

  3. GMail announced to quickly: Google announced GMail with its strongest selling point being that it gave you 100 times more space than competing free email services. However GMail is still in beta and not available to the general public while it's competitors such as Hotmail and Yahoo! Mail have announced upping their limits to 250MB and 100MB respectively with gigabytes of storage available and other features available to users for additional fees. This has basically stolen Google's thunder and halted a potential exodus of users from competing services while GMail isn't even out of beta yet.

  4. Heavy handed tactics in the Web syndication standards world: Recently Google decided to use a interim draft of a technology specification instead of a de facto industry standard for syndicating content from their Blogger website thus forcing users to upgrade or change their news aggregators as well as ensuring that there would be at least two versions of the Atom syndication format in the wild (the final version and the interim version supported by Google). This behavior upset a lot of users and aggregator developers. In fact, the author of the draft specification of the Atom syndication format that Google supported over RSS has also expressed dismay at the choice Google made and is encouraging others not to repeat their actions.

All of these are examples of less than stellar decision making at Google. Even though in previous entries such as What Is Google Building? and What is Google Building II: Thin Client vs. Rich Client vs. Smart Client I've implied that Google may be on the verge of a software move so bold it could upstage Microsoft the same way Netscape planned to with the browser upstaging the operating system as a development and user platform, it isn't a slam dunk that they have what it takes to get there.

It will be interesting watching the Google saga unfold.


 

Categories: Technology

According to the current version of the Chris Sells XML DevCon page (don't bother bookmarking it, Chris Sells doesn't believe in permalinks so all the content on that page is transient) I noticed that Chris Anderson is presenting the following

Developers Hate XML

Chris Anderson

While everyone is currently infatuated with XML, developers are constantly doing battle with trying to rationalize and leverage XML in their applications. Ill talk about having to balance correct XML-isms vs. usability in XAML, about the preponderance of XML reader/writer/DOM/serialization APIs, and about how all of this throws you into a horrible programming experience of loosely typed runtime errors. This reveals XML for what it is a data encoding. XML is the ASCII text file of the 2000s. While web services are often called "XML Web Services," the reality is that every web service API abstracts the developer from the XML view.

Nothing says vote of confidence like when the chief architects of one of teams you work closely with says your technology sucks. :)

Seriously though, I am curious as to what his presentation actually will be about. Reading the abstract, it seems like it is another iteration of a data-centric user of XML coming to the realization that for their scenarios XML is just CSV on steroids. People's behavior when they realize this usually follows a pattern similar to the five stages of grief. First there is denial, this usually takes the form of an initial disbelief that after all the hype they've heard about XML it isn't working out fantastically for them. Then there is bargaining, this usually manifests itself as attempts to not use XML but still use it. Often you here phrases like "binary XML", "XML subset" or "XML profile" at this point. Then there is anger at XML for being more complex and verbose than they need. At this point you get to either read a rant-filled email, blog post, conference paper or in this case conference presentation about how badly the technology is suited for its purpose. Then there is either despair or acceptance. One doesn't follow the other. If the next stage is despair in this case the person ends up not using XML to solve that particular problem. On the other hand if it is acceptance, XML is still used but in some cases it is in one of the forms that were mentioned in the bargaining stage such as a binary representation of an XML stream or uing some subset of XML.

Hopefully Chris Anderson will post his slides online.


 

Categories: XML

Anders Norås has an interesting blog entry entitled A JavaScript XmlSerializer where he shows how to build a class equivalent to the .NET Framework's System.Xml.Serialization.XmlSerializer class in Javascript. He writes

In ASP.NET 2.0 it is possible to invoke server events from client side script without posting back the page. This is supported through a new mechanism called script callbacks. For more information on this technology, read Dino Espositio’s excellent “Cutting Edge” article on the subject.

Mostly out-of-band calls have been used with fairly simple return values such as strings and numerals. The “advanced” uses have typically been passing arrays as comma separated lists. This has greatly limited the applicability of the technology and created a wide functionality gap between the object oriented programming environment of the server world and the more primitive environment in the browser...

A JavaScript XmlSerializer

One of the most celebrated classes in the .NET framework is the XmlSerializer class. This class enables you to serialize objects into XML documents and deserialize XML documents into objects. As we all know, XML documents are represented as strings, so it is simple to pass an XML document as either a parameter or a return value on an out-of-band call.

By implementing a client side XML based serialization and deserialization it would be possible to pass an object from a client script to a server method and vice versa. There are of course huge differences between the powerful .NET platform and the simple JavaScript language, but these have little impact on a client to server communications channel as it would only make sense to pass data transfer objects.

Definitely an interesting bit of code. What is also very interesting is that he has a previous article entitled Declarative JavaScript programming where he implements metadata annotations (akin to .NET Framework attributes) for Javascript. Excellent stuff.


 

One important lesson I've learned about designing software is that sometimes it pays to smother one's perfectionist engineer instincts and be less ambitious about the problems one is trying to solve. Put more succintly, a technology doesn't have to solve every problem just enough problems to be useful. Two examples come to mind which hammered this home to me; Tim Berners-Lee's World Wide Web and collaborative filtering which sites like Amazon use.

  1. The World Wide Web: Almost every history of the World Wide Web you find online mentions how Tim Berners-Lee was inspired by Ted Nelson's Xanadu. The current Web is a pale imitation of the what Ted Nelson described over forty years ago of what a rich hypertext system should be capable of doing. However you're reading these words of mine over Tim Berners-Lee's Web not Ted Nelson's. Why is this?

    If you read the descriptions of the Xandadu model you'll notice it has certain lofty goals. Some of these include the ability to create bi-directional links, links that do not break, and built-in version management. To me it doesn't seem feasible to implement all these features without ending up building a closed system. It seems Tim Berners-Lee came to a similar conclusion and greatly simplified Ted Nelson's dream thus making it feasible to implement and adopt on a global scale. Tim Berners-Lee's Web punts on all the hard problems. How does the system ensure that documents once placed on the Web are always retrievable? It doesn't. Instead you get 404 pages and broken links. How does the Web ensure that I can find all the pages that link to another page? It doesn't. Does the Web enable me to view old versions of a Web page and compare revisions of it side by side? Nope.

    Despite these limitations Tim Berners-Lee's Web sparked a global information revolution. Even more interestingly over time various services have shown up online that have attempted to add the missing functionality of the Web such as The Internet Archive, Technorati and the Google Cache.

  2. Collaborative Filtering on Amazon: The first place I ever bought CDs online was CDNow.com (now owned by Amazon). One feature of the site that blew my mind was the fact the ability to get a list of recommended CDs to buy based on your purchase history and the ratings you gave various albums. The suggestions were always quite accurate and many times it suggested CDs I already owned and liked a lot.

    This feature always seemed like magic to me. I imagined how difficult it must have been to come up with a categorization and ranking systems for music CDs that could accurately match people up with music based on their tastes. It wasn't until Amazon debuted this feature that I realize the magic algorithms were simply 'people who purchased X aldo purchased Y'. My magic algorithms were just a bunch of not very interesting SQL queries.  

    There are limitations to this approach ,you need a large enough user base and enough purchases of certain albums to make them statistically significant, but the system works for the most part.

Every once in a while I am part of endless discussions about how we need to complicate a technology to satisfy every use case when in truth we don't have to solve every problem. Edge cases should not dictate a software systems design but too often they do.  


 

Categories: Technology

August 12, 2004
@ 03:27 AM

From the post entitled XP Lite. Seriously... they cannot be serious....  on Gerry Steele's weblog

BBC News Xp Lite
Forbes.com XP Lite System
From the horse's mouth

Perhaps I'm a little late on picking this up, but at our last weekly meeting a colleague mentioned "XP Lite". My immediate reaction was: "Oh Dear, what have they schemed up now?!" Well, I sound like a Microsoft cynic I know, but I'd love to convince you that this is not because of a hate of Microsoft... I'm just a technology enthusiast who feels insulted an pained by pretty much every executive descision the company makes. Which I feel is a shame. A lot of smart people work there, but the mandate of their work is to create products which tie people to more of their expensive (in some way or other) products. Thus work that could go towards the creation of better software for everyone and for the benefit of all customers is lost. The recent "truce" between Sun & MS has bred a new submission to a standards body on web delivery services I cannot help but fear is just a token gesture to repreave the closed-shop image of Microsoft.

Anyway. XP Lite. Now I feared this would be bad. But no. It is much worse than i thought. A cut down version of XP aimed at the Asian market where Linux and ripped off copies of Windows abound. what's different I hear you ask? Well here's a few:

  • Cut down Networking capabilities
  • You won't be allowed To switch to higher resolutions (800x600 maximum!)
  • You may only run three programs at once, with each program opening only three windows at a time
  • No multiple user accounts

On the up side you get a locale specific patriotic desktop background included. Super duper eh? At the end of the press release is a list of positive quotes from distinguished Asian representatives (PhDs first! mortals, kids get in queue) who seems to have very little clue about anything to do with technology (indicated by their positive comments)

Perhaps the clearest indication that Microsoft are worried about the superior alternative open & free Linux products thus far. But unfortunately they will convince OEMs to ship this alleged software with the hardware (the OEM route is the key to printing money if the past is anything to go by). I don't see why Microsoft don't just go the whole hog and force hardware manufacturers to insert a coin slot on every box and have a trained monkey vist us each and collect the money (we would be so distracted by the cute ickle monkey visiting to notice the daylight robbery)

I could write all day about why XP Lite or XP starter or whatever it's called is bad for you, me, your children, the economy, free speech and liberty, but unlike Microsoft I'm hoping that your smart enough to know why this software is bad and why Microsoft should feel ashamed for their manipulation of the markets. I'm off to go and find my skin which has crawled off somewhere else in anger. I hope it hasn't hurt anyone

If you are unfamiliar with the phrase "two minute hate", see the Newspeak Dictionary


 

August 11, 2004
@ 06:26 PM

Working at a software company (i.e. around geeks with disposable income) and living in the yuppie part of downtown (i.e. around yuppies with disposable income) I see a lot of people driving around in expensive cars that have a popular brand name but not the quality associated with that brand. Such people would have gotten more value for their money by buying equivalently priced cars from other manufacturers but decided that riding around in a big name brand was better. Examples of such vehicles include the

  1. Jaguar X-Type

  2. Hummer H2

Some others come to mind but I won't mention so as not to risk offending people I know. I always shake my head when I see people driving around in these cars.


 

Categories: Ramblings

I have been remiss about talking about the ongoing and upcoming content on the MSDN XML Developer Center. The most interesting recent article has been Priya's Generating XML Documents from XML Schemas. In this article she provides a tool that allows you to generate sample documents that validate against a particular schema. This is very useful if you have a schema document but would like to see what a valid instance of that schema looks like. This is one of those tools I'd love to see added to the XML editor in Visual Studio. The one caveat is that her tool requires .NET Framework v2.0 beta 1 or higher to run.

The next couple of articles we have scheduled include an overview of how to use EXSLT to make one more productive as an XSLT developer by Oleg Tkachenko, an implementation of XPathNavigator over the ADO.NET DataSet by Arpan Desai, and an introduction Daniel Cazzulino's implementation of Schematron for the .NET Framework by myself.

I'm currently working on the content plan for the next quarter of the year and would like to know what articles people have liked and would like to see in the future. Also if you are interested in writing for MSDN about XML technologies on Microsoft platfortms go ahead and send me an email at my work address.


 

Categories: XML

According to the RSS Bandit Product Roadmap there should is a service pack for the current version of RSS Bandit scheduled for release at the end of the summer. The primary issues that  will be fixed by the service pack are

  1. There is a memory corruption bug that manifests itself by RSS Bandit crashing with the following error message, StartMainGui() exiting main event loop on exception.
  2. When RSS Bandit is configured to use proxy settings from Internet Explorer sometimes downloading certain feeds fails with the error (407) Proxy authentication required even though one can access these feeds directly using a web browser with no problem

There are also a number of smaller bug fixes such as better mouse wheel support and no longer producing duplicate entries when the title of an entry changes in the RSS feed. If any of these problems have affected you then download RssBandit1.2.0.114SP1_RC1.zip and give us feedback on whether these issues have been fixed or not.

I've started working on the Wolverine release more. The first thing I'll be implementing is NNTP support. So far I've figured out how to list newsgroups on a server, fetch posts, post responses aand the like. One of the questions I'm interested in users answering is whether newsgroups should be put in a seperate category from feeds in the tree view or whether we shouldn't differentiate between newsgroups and feeds in the tree view so one could have a category that contained both feeds and news groups. I'm also interested in finding public NNTP servers so I don't end up coding against just the pecularities of the news.microsoft.com server.


 

Categories: RSS Bandit