April 13, 2004
@ 05:26 PM

Yesterday, I decide to work from home since it's a nice day and I have a bunch of documentation to write. Halfway through the day I get a number of emails that require me to connect to work via the VPN. In trying to do soand spending some time with folks from the MSFT HelpDesk I managed to hose my Windows box and require a reinstall.

So I spent half of yesterday getting my machine back up to speed and I'm still not done. I just figured out the IIS permission issues and got my blog back online. I still need to figure out how to get both versions of Visual Studio back online. Then there's the fact I never got the stuff I needed to get done at work or finished writing the documentation I was planning to in the first place.

Mondays really suck.


 

Categories: Ramblings

Rory Blyth recently blogged that he was invited to attend the 2004 Microsoft MVP Summit without being an MVP. This has caused some outcry from some members of the .NET programming community. I've seen some complaints from Roy Osherove who wrote Non MVPs at the MVP summit - looks bad, smells bad and Ted Neward who's written MVP: What's in a name? that inviting people who aren't MVPs to the MVP Summit diminishes the value of being an MVP. As the community lead for my team I tend to share their sentiments. I'll reserve comment about this particular incident since all the details are not available.

I did feel drawn to post because Ted Neward wrote 

 Meanwhile, for those who've asked over blogs and mailing lists, becoming an MVP is not a mystical or mysterious process, but it is somewhat subjective and arbitrary: as I understand it, in essence, when it comes time to select new MVPs, existing MVPs nominate names they've seen active within the community (writing books, speaking, blogging, activity on the newsgroups and various community portals, and so on), and from there the rest of the MVPs in that "group" sort of hash out who's worthy and who's not.

I don't know about other products and technologies but that isn't how it's worked for the case of the “Visual Developer - XML” category. Various internal and external folks can nominate MVPs through myself or two people from the MVP program, Ben Miller is one of the other people, and we actually enter the nomination information. Then three of us vote on who should be an MVP based on what we know first hand or second hand about the individual. I'm sure some teams encourage the MVPs to be directly involved in the process but that doesn't mean that is the rule for how things work with regards to MVP nominations. Also I'm sure product teams and the MVP program have veto power on whoever is awarded as an MVP even for folks nominated by other MVPs.


 

Categories: Life in the B0rg Cube

I was reading Mark Pilgrim's ariticle entitled The Vanishing Image: XHTML 2 Migration Issues and stumbled on the following comment

You (the author of this article) have a valid point when you say people will want to upgrade to XHTML 2, against the HTML Working Group's expectation/intention. From the tone of this article, one would assume you find this a bad development; I however disagree: I think people should update their website to comply with the latest standards. Authors will have to rewrite their pages into XHTML 2.0, but, with server-side scripting and CSS in mind, this should be a not so very difficult task.

I'm always surprised when I see web design geeks advocating the latest and greatest standards with no indication of what the real benefits are besides the fact that they can place meaningless badges such as   on their website. Technology standards are a means to an end not an end in themselves. The purpose of having well specified technology standards is to guarantee interoperability between software and hardware produced by different vendors. In the case of XHTML, many who have looked at the situation objectively have failed to find any reasons why one should migrate to XHTML 1.0 from HTML 4.01. Arguing that people should migrate to a version of XHTML that isn't even backwards compatible with HTML seems highly illogical since it runs counter to the entire point of standardizing on a markup language in the first place, interoperability. 


 

Categories: Ramblings

Many people designing XML formats whether for application-specific configuration files, website syndication formats or new markup languages have to face the problem of how to design their formats to be extensible and yet be resilient to changes due to changes to versions of the format. One thing I have noticed in talking to various teams at Microsoft and some of our customers is that many people think about extensibility of formats and confuse that for being the same as the versioing problem.  I have written previously On Versioning XML Vocabularies in which I stated

At this point I'd like to note that this a versioning problem which is a special instance of the extensibility problem. The extensibility problem is how does one describe an XML vocabulary in a way that allows producers to add elements and attributes to the core vocabulary without causing problems for consumers that may not know about them. The versioning  problem is specific to when the added elements and attributes actually are from a subsequent version of the vocabulary (i.e. a version 2.0 server talking to a version 1.0 client).

The problem with the above paragraph is that it it focuses on a narrow aspect of the versioning problem. A versioning policy should not only be concerned with when new elements and attributes are added to the format but also when existing ones are changed or even removed.

The temptation to think about versioning as a variation of the extensibility problem is due to the fact that the focus of the XML family of technologies has been about extensibility. As I wrote in my previous posting

One of the primary benefits of using XML for building data interchange formats is that the APIs and technologies for processing XML are quite resistant to additions to vocabularies. If I write an application which loads RSS feeds looking for item elements then processes their link and title elements using any one of the various technologies and APIs for processing XML such as SAX, the DOM or XSLT it is quite straightforward to build an application that processes said elements which is resistant to changes in the RSS spec or extensions to the RSS spec as the link and title elements always appear in a feed.  

Similarly XML schema languages such as W3C XML Schema have a number of features that promote extensibility such as wildcards, substitution groups and xsi:type but few if any that target the versioning problem. I've written about a number of techniques for adding extensibility to XML formats using W3C XML Schema in my article W3C XML Schema Design Patterns: Dealing With Change but none so far on approaches to versioning in combination with your favorite XML schema language.

There are a number of things that could change about the constructs in a data transfer format including

  1. New concepts are added (e.g. new elements or attributes added to format or new values for enumerations)

  2. Existing concepts are changed (e.g. existing elements & attributes should be interpreted differently, added elements or attributes alter semantics of their parent/owning element)

  3. Existing concepts are deprecated (e.g. existing elements & attributes should now issue warning when consumed by an application)

  4. Existing concepts are removed (e.g. existing elements & attributes should no longer work when consumed by an application)

How all four of the above changes between versions of the XML format are handled should be considered when designing the format. Below are sample solutions for each of the aformentioned changes

  1. New concepts are added: In some cases the new concepts are completely alien to those in the existing format. For example, the second version of XQueryX will most likely have to deal with the additions of data update commands such as insert or delete while the existing format only has query constructs. In such cases it is most prudent to eschew backwards compatibility by either changing the version number or namespace of XML format. On the other hand, if the new additions are optional or ignorable and the format has extensibility rules for items from a different namespace than that of the format itself then the new additions (elements and attributes) can be placed in a different namespace from that of the format. In more complex cases, it may be likely that there are some additions that cannot be ignored by older processors while others can. In such cases, serious consideration should be made for adding a concept similar to the mustUnderstand attribute in SOAP 1.1 where one can indicate which additions to the format are backwards compatible and which ones are not.

    In the case of new possible values being added to an enumeration (e.g. a color attribute that had the option of being "red" or "blue" in version 1.0 of the format has "green" added as a possible value in future version of the format) the specification for the format needs to determine what the behavior of older processors should be when they see values they do not understand.

  2. Existing concepts are changed: In certain cases the interpretation of an element or attribute may be changed across versions of a vocabulary. For example, the current working draft of XSLT 2.0 has a list of incompatibilities between it and XSLT 1.0 when the same elements and attributes are used in a stylesheet. In such cases it is most prudent to change the major version number of the format if one exists or change the namespace of the format otherwise. This means the format will not be backwards compatible.

  3. Existing concepts are deprecated: Sometimes as a format evolves, one realizes that some concepts need to be reworked and replaced by improved implementations of these concepts. An example of this is the deprecation of the requiredRuntime element in favor of the supportedRuntime element in .NET Framework application configuration files. Format designers need to consider how to make such changes work in a backwards compatible manner. In the case of .NET Framework configuration files, both elements are used for applications targetting version 1.0 of the .NET Framework since the former is understood by the configuration engine while the latter is ignored.

  4. Existing concepts are removed: Constructs may sometimes be removed from formats because they prove to be inappropriate or insecure. For example, the most recent draft of XHTML 2.0 removes familiar elements like img and br (descriptions of the backwards incompatible changes in XHTML 2.0 from XHTML 1.1 are available in the following articles by Mark Pilgrim, All That We Can Leave Behind and The Vanishing Image: XHTML 2 Migration Issues). This approach removes forwards compatibility and in such cases it is most prudent to either change the version number or namespace of the XML format.

This blog post just scratches the surface of what can be written about the various concerns when designing XML formats to be version resilient. There are a couple of issues as to how best to represent such changes in an XML schema and if one should even bother trying in certain cases. I'll endeavor to put together an artricle about this on MSDN in the next month or two.


 

Categories: XML

Matevz Gacnik points out Serious bug in System.Xml.XmlValidatingReader, he writes

The schema spec and especially RFC 2396 state that xs:anyURI instance can be empty, but System.Xml.XmlValidatingReader keeps failing on such an instance.

To reproduce the error use the following schema:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="AnyURI" type="xs:anyURI">
  </xs:element>
</xs:schema>

And this instance document:

<?xml version="1.0" encoding="UTF-8"?>
<AnyURI/>

There is currently no workaround for .NET FX 1.0/1.1. Actually Whidbey is the only patch that fixes this. :)

The schema validation engine in the .NET Framework uses the System.Uri class for parsing URIs. This class doesn't consider an empty string to be a valid URI which is why our schema validation considers the above instance to be invalid according to its schema. However it isn't clear cut in the specs whether this is valid or not at least not without a bunch of sleuthing. As Micheal Kay (XSLT working group member) and C.M. Speilberg-McQueen (chairman of the XML Schema working group) wrote on XML-DEV

To: Michael Kay <michael.h.kay@ntlworld.com>
Subject: RE: [xml-dev] Can anyURI be empty?
From: "C. M. Sperberg-McQueen" <cmsmcq@acm.org>
Date: 07 Apr 2004 10:49:51 -0600
Cc: xml-dev@lists.xml.org

On Wed, 2004-04-07 at 03:47, Michael Kay wrote:
> > If it couldn't, it would be wrong. An empty string is a valid URI.
>
> On this, like so many other things, RFC 2396 is a total disaster. An empty
> string is not valid according to the BNF syntax, but the RFC gives detailed
> semantics for what it means (detailed semantics, though very imprecise
> semantics).
>
> And the schema REC doesn't help. It has the famous note saying that the
> definition places "only very modest obligations" on an implementation, and
> it doesn't say what those obligations are.

Yes.  This is a direct result of our realization that
we have as much trouble understanding RFC 2396 as anyone
else.  The anyURI type imposes the obligations of
RFC 2396, whatever those are.  Any attempt to paraphrase
them on our part would lead, I fear, to an unsatisfactory
result: either we would make some mistake (like believing
that since the BNF does not accept the empty string,
it must not be legal)
or we would make no mistakes.  In
the one case, we'd be misleading our readers, and in
either case, we'd find ourselves mired in a never-ending
effort to prove that our paraphrase was, or was not,
correct. 

RFC 2396 is one of the fundamental specifications of the World Wide Web yet it is vague and contradictory in a number of key places. Those of us implementing standards often have to go on gut feel or try and track the spec authors whenever we bump across issues like this but sometimes we miss them.

All I can do is apologize to people like Matevz Gacnik who have to bear the brunt of the lack of interoperability caused by vaguely written specifications implemented on our platform and for the fact that a fix for this problem won't be available until Whidbey.


 

Categories: XML

This week was the 2004 MVP Summit, where several hundred MVPs for various Microsoft technologies and products descended on Redmond to interact with the various product teams at Microsoft. It was a little hectic organizing things to ensure that the XML MVPs got enough face time but I think everyone was happy with the way things turned out.

On Monday, a number of Microsoft folks and MVPs had dinner at Rikki Rikki, a rather nice sushi restaurant in Kirkland, WA. The geeks in attendance included Tim Ewald, Ted Neward, Don Box, Rory Blyth, Kirk Allen Evans, Drew Marsh, Julia Lerman, Jeff Julian, Sam Gentile, Joshua AllenChris Anderson, Arpan Desai, Mark Fussell, DonXML Demsak, Daniel Cazzulino,  Aaron Skonnard , Christoph Schittko, Rick Strahl, Joe Fawcett, Peter Provost, Cathi Gero, Michael Rys, Bryant Likes, Jeffrey Richter and a number of others. The dinner was Kirk's idea and Don suggested the place, it was definitely a pleasant evening talking XML geekery. One of the things we talked about was why some of the APIs that were in the PDC preview won't make it to Whidbey such as the XmlAdapter and the XPathChangeNavigator.  In retrospect the functionality provided by both APIs was complex to implement yet could be satisfied through other mechanisms.

At the end of the dinner Kirk took a group photograph. Afterwards a couple of us stragglers saw the movie Hellboy which was an entertaining super hero movie although the ending could have been better.

On Wednesday, eight XML MVPs (DonXML Demsak, Joe Fawcett,  Daniel Cazzulino, Jeff Julian, Matevz Gacnik, Rolandas Gricius, Bryant Likes, and J. Michael Palermo IV) got to spend a day with the WebData XML team. Shortly after 9 AM there was an hour long open panel discussion with the MVPs on one side and a few dozen members of the WebData XML team on the other with questions flying back and forth. For many members of the team, getting candid feedback from an array of customers with different backgrounds was very illuminating. The rest of the day was filled with presentations and Q & A sessions with the MVPs. They got a preview of what we'll be doing in Whidbey and maybe Orcas in the area of XML tools, XML<->Relational mapping technologies, XQuery and core XML APIs. Since one of the complaints we'd heard was that a number of sessions they'd seen earlier in the week were just rehashed PDC slides I endeavored to ensure that MVPs would see newer content or at least get more in depth information about what we plan to do. Based on the feedback I got they were pleased with the experience.

On Friday, Robert Scoble and Charles Torre swung by my office and interviewed me for Channel 9. I gave them a tour of my office and showed them my budding collection of Spawn action figures and my demotivators hanging on the wall. I'm not sure if I gave a good interview or not but I guess that's part of the charm of Channel 9. I'll post a link to the interview whenever it shows up online.


 

Categories: Life in the B0rg Cube

April 6, 2004
@ 05:15 PM

I'd like to congratulate Robert Scoble , Jeff Sandquist and all the others involved on the launch of Channel 9. The doctrine of Channel 9 positions it as an avenue for Microsoft employees and their customers to interact in an honest manner. The Who We Are page states that it is an "attempt to move beyond the newsgroup, the blog, and the press release to talk with each other, human to human".

My personal take on Channel 9 is that it reminds me a lot VBTV, people either really liked it or really hated it. Already I've begun to see posts from both ends of the spectrum, there are posts like Channel 9 - a very commendable effort and then those like Why Channel 9 is stupid. I tend to agree with the latter post but do think it is an interesting experiment.

I think Microsoft has been doing a good job of providing avenues for its employees to interact directly with their customers from newsgroups to blogs to the various developer websites such as MSDN and ASP.NET. If anything I feel like there are probably too many options than too few. I daily check the microsoft.public.dotnet.xml newsgroup, microsoft.public.xml newsgroup, the Extreme XML message board on MSDN, various blogs I'm subscribed to, the comments in my work blog as well as various internal mailing lists for feedback on the technologies I am responsible for. Then there are days like yesterday when I got to hang out, drink beer and eat sushi with Tim Ewald, Ted Neward, Don Box, Rory Blyth, Kirk Allen Evans, Drew Marsh, Julia Lerman, Jeff Julian, Sam Gentile, Joshua AllenChris Anderson, Arpan Desai, Mark Fussell, DonXML Demsak, Daniel Cazzulino, Drew Marsh,  Aaron Skonnard , Christoph Schittko, Rick Strahl, Joe Fawcett and a bunch of others.

The thought that Microsoft needs to “beyond the newsgroup, the blog, and the press release” doesn't jibe with my experiences interacting with our customers in my daily experience. In fact, I know a number of our customers dislike the fact that there's a decision tree that needs to be traversed to figure out how to get information from Microsoft (do they got to newsgroups? MSDN? find the relevant blog? go to GotDotNet? call PSS? etc). 

However as I mentioned earlier, I think it is an interesting experiment which means I will participate to some degree. I'm already scheduled to do an interview with Scoble this Friday so I'll probably be hamming it up in one of those streaming videos in the next few weeks.

Work calls.


 

Categories: Life in the B0rg Cube

Joshua Allen has a post entitled RSS Last Mile where he complains about the lack of a clear story with regards to one click subscription to RSS/ATOM feeds. I wrote about the various approaches to achieving one click subscription to ATOM and RSS feeds a few months ago which led to drafting feed URI scheme. Three months later, one click subscription to syndication feeds is still as confused as it's always been. A lot of the major aggregators support the feed URI scheme but none of the major blogging tools has decided to support it yet. Instead a lot of folks still use the 127.0.0.1 hack popularized by Radio Userland but which is now utilized by a wide number of aggregators. However most websites just do nothing with regards to one click subscription and just have a hyperlinked image, such as , which points to the RSS feed. 

The only new thing I've seen is that yet another person has cooked up their own one click subscription scheme that is incompatible with all the others. Thanks to Joshua's post I found an RFC for one click subscription to syndication feeds which seems to me to be the least advantegeous of the approaches that have shown themselves thus far.

The author of the RFC wrote the following about existing approaches, I've annotated his comments with mine in red text

Current solutions:

  • have the aggregator clients register with some mime-type (for either RSS or OPML), I don't believe anyone's actually implemented this since most aggregator authors know this doesn't work for a variety of reasons listed in my post on one click subscription to ATOM and RSS feeds
  • have a new protocol (feed:), actually this is a URI scheme not a protocol, and the process is the same as the above, have the aggregator clients register with as the handler for some URI scheme
  • support as many clients as possible via javascript (see QuickSub),
  • transform the RSS with XSL in the browser to help newbies (no really a one-click subscription solution though).This could be a one click subscription option if the prettied up RSS feed shown to the user also displays a link that uses one of the other 3 techniques mentioned above. So this approach is really orthogonal to the others and in fact can be considered complimentary

The author of the RFC post then goes on to suggesting an Internet Explorer specific solution namely that

Replace the orange Feed button:
The orange feed button needs to be wrapped with an object tag:

<object classid="clsid:0123456789ABCDEF [1]">
  <param name="feedurl" value="http://feedurl [2]">
  <param name="description" value="blah blah [3]">
  <param name="imageurl" value="http://buttonimageurl [4]">

  <a href="http://feedurl [2]"><img src="http://buttonimageurl [4]" /></a>
<object>

If the ActiveX control with class ID [1] is installed, it displays a custom "subscribe" button. When you click on it, it uses the feedurl parameter [3] to subscribe.

Besides the fact that this approach is Internet Explorer specific since it requires an ActiveX object it doesn't offer anything that the other approaches don't.  I don't see why Joshua thinks it's a good idea, considering that all 3 of the other approaches work in a variety of browsers on a variety of platforms.

 


 

Categories: RSS Bandit

I finally got to take a look at the WS-MetadataExchange specification while hanging out in Don's office last week. The spec is fairly straightforward, it defines a mechanism for one to request the WSDL, Policy or XML Schema of a target namespace (i.e. a URI) from an XML Web Service endpoint. Basically one can ask what services an endpoint supports and what the messages the end point accepts should look like. 

Both Don and Omri have suggested that WS-MetadataExchange can solve a problem I had with the SOAP-based version of the ATOM API. The problem is how an ATOM client is supposed to know what services an ATOM end point supports. Here are three descriptions of ATOM-enabled sites that I might want to interact with as an RSS Bandit user.  

  1. A weblog that supports user comments posted anonymously and provides the ability to search the weblog archives. The user comments must use a subset of HTML. For example, Sam Ruby's weblog.

  2. A weblog that doesn't have comments enabled but does provide the ability to search the weblog archives. For example, Mark Pilgrim's weblog

  3. A weblog that only supports comments that have been authenicated with TypeKey and doesn't support search.  Again user comments must use a subset of HTML. Any Movable Type blog that supports TypeKey is an example.

All three would require a smart client to give the user visual hints and clues as to how they can interact with the site. At the very minimum a search box that is grayed out when the target weblog doesn't support search.

So far the only mechanism I've seen proposed for solving this problem in the case of the ATOM API is the link element used for locating service endpoints. This allows you to get the URI of service end points like where to post comments or where to send search results if they exist but do not answer finer grained questions. Questions such as “What subset of HTML can I use in comments?” or “Do I need to be authenticated before I post comments” are currently not answered by any of the draft ATOM specs.

So far WS-MetadataExchange or something like it look like the best way to support such scenarios for SOAP-enabled ATOM end points in a way that is consistent with the Global XML Web Services architecture. I would be interested in seeing an ATOM-specific solution evolve as well since some of this issues hurt usability of weblogs. I've lost count of the amount of times I've posted a comment or seen someone post a comment only to complain about the fact that the weblog doesn't support HTML or mangled some text. Having a way to inquire about this in a standard way would definitely improve the user experience.  


 

Categories: XML

This week Torsten figured out how to get the equivalent of “Subscribe in RSS Bandit” to the context menu in Internet Explorer and Firefox when you right-click on a link. Click below for a screenshot of what it looks like in Internet Explorer.
 

Categories: RSS Bandit