Shannon J Hager writes

Jeff Key wants to end default buttons on Focus-Stealing Dialogs but I think the problem is bigger than that. I don't think ANYTHING should be able to steal my focus while typing. I have ranted about this before both in places where it could help (emails with MS employees) and in places where it can't (certain blogs). Not only is it annoying to suddenly find myself typing in a IM conversation with someone on AOL when less than half a word ago I was typing an invoice for a client, it is DANGEROUS for programs to be able to steal focus like this

I agree, I didn't realize how much applications that steal focus irritate me until I used a friend's iBook which runs Mac OS X where instead of having applications steal your focus has them try to get your attention by hopping around at the bottom of the screen. I thought it was cute and a lot less intrusive than finding myself typing in a differentwWindow because some application decided that it was so important that it was going to interrupt whatever I was doing.

An operating system that enforces application politness, sweet.


Choosing a name for a product or software component that can stand the test of time is often difficult and can often be a source of confusion for users of the software if the its usage outgrows that implied by its name. I have examples from both my personal life and my professional life.

RSS Bandit

When I chose this name I never considered that there might one day be another popular syndication format (i.e. ATOM) which I'd end up supporting. Given that Blogger, Movable Type, and LiveJournal are going to provide ATOM feeds and utilize the ATOM API for weblog editing/management then it is a foregone conclusion that RSS Bandit will support ATOM the specifications are in slightly less flux which should be in the next few months.

One that happens the name "RSS Bandit" will be an anachronism given that RSS will no longer be the only format supported by the application. In fact, the name may become a handicap in the future once ATOM becomes popular because there is the implicit assumption that I support the "old" and "outdated" syndication format not the "shiny" and  "new" syndication format.


In version 1.0 of the .NET Framework we shipped three classes that acted as in-memory representations of an XML document

  1. XmlDocument - an implementation of the W3C Document Object Model (DOM) with a few .NET specific extensions [whose functionality eventually made it into later revisions of the spec]
  2. XmlDataDocument - a subclass of the XmlDocument which acts as an XML view of a DataSet
  3. XPathDocument - a read-only in-memory representation of an XML document which conforms to the XPath data model as opposed to the DOM data model upon which the XmlDocument is based. This class primarily existed as  a more performant data source for performing XSLT transformations and XPath queries

Going forward, various limitations of all of the above classes meant that we came up with a fourth class which we planned to introduce in Whidbey. After an internal review we decided that it would be two confusing to add yet another in-memory representation of an XML document to the mix and decided to instead improve on the ones we had. The XmlDataDocument is really a DataSet specific class so it doesn't really fall into this discussion. We were left with the XmlDocument and the XPathDocument. Various aspects of the XmlDocument made it unpalatable for a number of the plans we had in mind such as acting as a strongly typed XML data source and moving away from a tree based DOM model for interacting with XML.

Instead we decided to go forward with the XPathDocument and add a bunch of functionality to it such as adding the ability to bind it to a store and retrieved strongly typed values via integrated support for W3C XML Schema datatyping, change tracking and the write data to it using the XmlWriter.

The primary feedback we've gotten about the new improved XPathDocument from usability studies and WinFX reviews is that there is little chance that anyone who hasn't read our documentation would realize that the XPathDocument is the preferred in-memory representation of an XML document for certain scenarios and not the XmlDocument. In v1.0 we could argue that the class was only of interest to people doing advanced stuff with XPath (or XSLT which is significantly about XPath) but now the name doesn't jibe with its purpose as much. The same goes for the primary mechanism for interacting with the XPathDocument (i.e. the XPathNavigator) which should be the preffered mechanism for representing and passing data as XML in the .NET Framework going forward.

If only I had a time machine and could go back and rename the classes XmlDocument2 and XmlNavigator. :(


Categories: Life in the B0rg Cube | XML

December 23, 2003
@ 07:29 PM

I'm kind of embarassed to write this but last week was the first time I'd installed a build of Whidbey (the next version of the .NET Framework) in about 6 months. I used to download builds on a daily basis at the beginning of the year when I was a tester working on XQuery but fell off once I became a PM. Given that certain bits were in flux I decided to wait until things were stable before installing Whidbey on my machine and writing a number of sample/test applications.

Over the next couple of weeks I'll be driving refining some of the stuff that we've designing for the next version of System.Xml and will most likely be blogging about various design issues we've had to contend with as well as perhaps giving a sneak preview of some of our end user documentation which will include answers to questions raised by some of the stuff that was shown at PDC such as whether there is any truth to the claims that XmlDocument is dead.


Categories: Life in the B0rg Cube

Torsten and I (mostly Torsten) have been working on a feature which we hope will satisfy multiple feature requests at one shot. Screenshot and details available by clicking the link below.

Categories: RSS Bandit

I just spotted the following on the wiki Ward Cunningham set up requesting advice as a new hire to Microsoft.

Take a running start and don't look back

  1. Recognize that your wonderful inventiveness is the most valuable thing you will own in a culture that values its employees solely by their latest contributions. In a spartan culture like this, you will rise quickly.

  2. Keep spewing ideas, even when those ideas are repeatedly misunderstood, implemented poorly, and excised from products for reasons that have nothing to do with the quality of the idea. When you give up on communicating your new ideas, you will just go insane waiting to vest.

  3. Be patient, or better yet, don't even look back. Don't try to track and control what people do with your ideas. It will just make you jaded and cynical. (Like many of us who have gone before :)

  4. Communicate by writing things down in compact and considered form. The most senior people, who can take your ideas the furthest fastest, are very busy. As an added side-benefit, when random program managers who just don't get it come around for the fortieth time, begging for explanations, you can provide them references to your wiki, blog, or papers for the thirty-seventh time.

  5. Don't count on the research division for anything but entertaining politics.

Have a good time, and as Don said, plan for the long-haul!

I've been in the B0rg Cube just shy of two years but the above advice rings true in more ways than one. It is a very interesting culture and with the wrong attitude one could end up being very cynical. However as with all things, the best thing to do is learn how the system works and learn how to work it. The five points above are a good starting point.   

Categories: Life in the B0rg Cube

There were a number of sessions I found particularly interesting either because they presented novel ways to utilize and process XML or because they gave an insightful glance at how others view the XML family of technologies. 

Imperative Programming with Rectangles, Triangles, and Circles - Erik Meijer
This was a presentation about a research language called Xen that experiments with various ways to reduce the Relational<->Objects<->XML (ROX) impedance mismatch by adding concepts and operators from the relational and XML (specifically W3C XML Schema) world into an object oriented programming language. The main thesis of the paper was that heavily used APIs and programming idioms eventually tend to be likely candidates for including into the language. An example was given with the foreach operator in the C# language which transformed the following regularly used idiom

IEnumerator e = ((IEnumerable)ts).GetEnumerator();
  try {
     while(e.MoveNext()) {  T t = (T)e.Current; t.DoStuff(); }
  } finally {
     IDisposable d = e as System.IDisposable;
     if(d != null) d.Dispose();


foreach(T t in ts){ 

The majority of the presentation was about XML integration. Erik spent some time talking about the XML to object impedance mismatch and how cumbersome programming with XML could be.  Either you wrote a bunch of code for walking trees manually or you queried nodes with XPath but then you are embedding one language into another and don't get type safety, etc (if there is an error in my XPath query I can't tell until runtime). He pointed out that various XML<->object mapping technologies fall short because they either don't map a rich enough set of W3C XML Schema constructs to relevant object structures but even if they did one now looses the power of being able to do rich XPath queries or XSLT/XQuery transformations. The XML integration in Xen basically came in 3 flavors; the ability to initialize classes from XML strings, support for W3C XML Schema constructs like union types and  sequences into the language and the ability to do XPath-like queries over the contents fields and properties of a class.

There were also a few other things like adding the constraint "not null" into the language (which would be a handy modifier for parameter names in any language given how often one must check parameters for null in method bodies) and the ability to apply the same method to all the members of a collection which seemed like valuable additions to a programming language independent of XML integration.

Thinking about it I am unsure of the practicality of some features such as being able to initialize objects from an XML literal in the code especially since Xen only supported XML documents with schemas although in some cases I could imagine such an approach being more palatable than using XQuery or XSLT 2.0 for constructing or querying strongly typed XML documents. Also I was suspicious of the usefulness of being able to do wildcard queries (i.e. give me all the fields in class Foo) although this could potentially be used to get the string value of an XML element with mixed content.

The language also had integrated SQL like querying with a "select" operator but I didn't pay much attention to this since I was only really interested in XML.

The meat of this presentation is available online in the paper entitled Programming with Circles, Triangles and Rectangles. The presentation was well received although sparsely attended (about two or three dozen people) and the most noteworthy feedback was that from James Clark who was so impressed he kept saying "I'm speechless" in between asking questions about the language. Sam Ruby was also impressed by the fact that not only was there a presentation but the demo which involved compiling and running various samples showed that this you could implement such a language in the CLR and even integrate it into Visual Studio.

Namespace Routing Language (NRL) - James Clark
This was a presentation for a language for validating a single XML document with multiple schemas simultaenously. This was specifically aimed at validating documents that contained XML from multiple vocabularies (e.g. XML content embedded in a SOAP envelope, RDF embedded in HTML, etc).

The core processing model of NRL is that it divides an XML document into sections each containing elements from a single namespace then each section can be validated using the schema for its namespace. There is no requirement that the same schema language is used so one could validate one part of the document using RELAX NG and use W3C XML Schema for another. There also was the ability to to specify named modes like XSLT which allowed you to match against element names against a particular schema instead of just keying off the namespace name. This functionality could be used to validate interleaved documents (such as XHTML within an XSLT stylesheet) but I suspect that this will be easier said than done in practice.

All in all this was a very interesting talk and introduced some ideas I'd never have considered on my own.  

There is a spec for the Namespace Routing Language available online.


Categories: XML

December 16, 2003
@ 05:33 PM

The XML 2003 conference was a very interesting experience. Compared to the talks at XML 2002 I found the talks at XML 2003 to be of more interest and relevance to me as an developer building applications that utilize XML. The various hallway and lunchtime conversations I had with  various people were particularly valueable. Below are the highlights from the various conversations I had with some XML luminaries at lunch and over drinks. Tomorrow I'll post about the various talks I attended.

James Clark: He gave two excellent presentations, one on his Namespace Routing Language (NRL) and the other about some of implementation techniques used in his nxml-mode for Emacs. I asked whether the fact that he gave no talks about RELAX NG meant that he was no longer interested in the technology. He responded that there wasn't really anything more to do with the language besides shepherd it through the standardization process and evangelization. However given how entrenched support for W3C XML Schema was with major vendors evangelization was an uphill battle.

I pointed out that at Microsoft we use XML schema language technologies for two things;

    1. Describing and enforcing the contract between producers and consumers of XML documents: .
    2. Creating the basis for processing and storing typed data represented as XML documents:

The only widely used XML Schema language that fit the bill for both tasks is W3C XML Schema. However W3C XML Schema is too complex and yet doesn't have enough features for the former and has too many features which introduce complexity for the latter case. In my ideal world, people would use something like RELAX NG for the former and XML-Data Reduced (XDR) for the latter. James asked if I saw value in creating a subset of RELAX NG which also satisfied the latter case but I didn't think that there would be compelling argument for people who've already baked W3C XML Schema into the core of their being (e.g. XQuery, XML Web Services, etc) to find interest in such a subset.

In fact, I pointed out that in designing for Whidbey (next version of the .NET Framework) we originally had designed the architecture to have a pluggable XML type system so that one could potentially generate Post Schema Validation Infosets (PSVI) but realized that this was a case of YAGNI. First of all, only one XML schema language exists that can generate PSVIs so creating a generic architecture makes no sense if there was no other XML schema language that could be plugged in to replace W3C XML Schema. Secondly, one of the major benefits of this approach I had envisioned was that one would be able to plug their own type systems into XQuery. This turned out to be more complicated than I thought because XQuery has W3C XML Schema deeply baked into it and it would take more than genericizing at the PSVI level to make it work (we'd also have to genericize operators, type promotion rules, etc) and once then once all that effort would have been expended any language that could be plugged in would have to act a lot like W3C XML Schema anyway.  Basically if some RELAX NG subset suddenly came into existence, it wouldn't add much to that we don't already get from W3C XML Schema (except less complexity but you could get the same from coming up with a subset of W3C XML Schema or following my various W3C XML Schema Best Practices articles on

I did think that there would be some value to developers building applications on Microsoft platforms who needed more document validation features than W3C XML Schema in having access to RELAX NG tools. This would be nice to have but isn't a showstopper preventing development of XML applications on Microsoft platforms (translation: Microsoft won't be building such tools in the forseeable future). However if such tools existed I definitely would evangelize them to our users who needed more features than W3C XML Schema provides for their document validation needs.  

Sam Ruby: I learned that Sam is on one of "emerging technologies" groups at IBM. Basically he works on stuff that's about to become mainstream in big way and helps them along the way. In the past this has included PHP, Open Source and Java (i.e. the Apache project), XML Web Services and now weblogging technologies. Given his track record I asked him to give me a buzz whenever he finds some new technology to work on. : )

I told him that I felt syndication formats weren't the problem with weblogging technologies and he seemed to agree but pointed out that some of the problems they are trying to solve with ATOM make more sense in the context of using the same format for your blog editing/management API and archival format. There were also the various interpersonal conflicts & psychological baggage which needs to be discarded to move the technology forward and a clean break seems to be the best way. On reflection, I agreed with him.

I did point out that the top 3 problems I'd like to fix in syndication were one click subscription, subscription harmonization and adding calendar events to feeds. I mentioned that I should have RFCs for the first two written up over the holidays but the third is something I haven't thought about hard. Sam pointed out that instead of going the route of coming up with a namespaced extension element to describe calendar events in an RSS feed that perhaps a better option is the ATOM approach that uses link tags. Something like

   <link type="text/calendar" href="...">

In fact he seemed to have liked this idea so much it ended up in his presentation.

As Sam and I were finishing our meals, Sam talked about the fact that the effect that blogging has had on his visibility is immense. Before blogging he was well known in tight-knit technical circles such as amongst the members of the Apache project but now he knows people from all over the world working at diverse companies and regularly has people go "Wow, you're Sam Ruby, I read your blog". As he said, this the guy sitting across from us at the table said "Wow, you're Sam Ruby, I read your blog", Sam turned to me and said "See what I mean?"

The power of blogging...

Eve Maler: I spoke to her about a talk I'd seen on UBL given by Eduardo Gutentag and Arofan Gregory where they talked about the benefits of using the polymorphic features of W3C XML Schema to good use in business applications. The specific scenario they described was the following

Imagine a small glue supplier that provides glue to various diverse companies such as a shoe manufacturer, an automobile manufacturer and an office supplies company. This company uses UBL to talk to each of its customers who also use UBL but since the types for describing purchase orders and the like are not specific enough for them they use the type derivation features of W3C XML Schema to create specific types (e.g. a hypothetical LineItem type from UBL is derived to AutomobilePart or ShoeComponent by the various companies). However the small glue company can handle all the new types with the same code if they use type aware processing such as the following path XPath 2.0 or XQuery expression  which matches all instances of the LineItem type

element(*, LineItem)

The presenters then pointed out  that there could be data loss if one of the customers extended the LineItem type by adding information that was pertinent to their business (e.g. priority, pricing information, prefeerred delivery options, etc) since such code would not know about the extensions.

This seems like a horrible idea and yet another reason why I view all the "object oriented" features of W3C XML Schema with suspicion.

Eve agreed that it probably was a bad idea to recommend that people process XML documents this way then stated that she felt that calling such processing "polymorphic" didn't sit right with her since true polymorphism doesn't require subtype relationships. I agreed and disagreed with her. There are at least four types of polymorphism in programming language parlance and the kind used above is subtype polymorphism. This is just one of the four types of polymorphism (the others being coercion, overloading and parametric polymorphism) but the behavior above is polymorphism. From talking to Eve it seemed that she was more interested in parametric polymorphism because it subtype polymorphism is not a loosely coupled approach. I pointed out that just using XPath expressions to match on predicates could be considered to be parametric polymorphism since you are treating instances similarly even though they are of different types but satisfy the same constraints. I'm not sure she agreed with me. :)    

Jon Udell: We discussed the online exchange we had about WinFS types and W3C XML Schema types. He apologized if he seemed to be coming on too strong in his posts and I responded that of the hundreds of articles and blog posts I'd read about the technologies unveiled at the recent Microsoft Professional Developer's Conference (PDC) that I'd only seen two people provide insightful feedback; his was the first and Miguel de Icaza's PDC writeup was the second. 

Jon felt that WinFS would be more valuable as an XML database as opposed to an object oriented database (I think the terms he used were "XML store" and "CLR store") especially given his belief that XML enables the "Universal Canvas". I agreed with him but pointed out that Microsoft isn't a single entity and even though some parts may think that XML is one step closer to giving us a universal data interchange format and thus universal data access which there are others who see XML as "that format you use for config files" and express incredulity when they here about things like XQuery because they wonder why anyone would need a query language for their config files. :)

Reading Jon's blog post about Word 11, XML and the Universal Canvas it seems he's been anticipating a unified XML storage model for a while which explains his disappointment that the WinFS unveiled at PDC was not it.

He also thought that the fact that so many people at Microsoft were blogging was fantastic. 


Categories: XML

December 16, 2003
@ 06:52 AM

Robert Scoble writes

Here's what I'd do if I were at Harvard and in charge of the RSS spec:

1) Announce there will be an RSS 3.0 and that it will be the most thought-out syndication specification ever.

2) Announce that RSS 3.0 will ship on July 1, 2005. That date is important. For one, 18 months is long enough to really do some serious work. For two, RSS 3.0 should be positioned as "the best way to do syndication on Microsoft's Longhorn." ...

3) Open up a mailing list, a wiki, and a weblog to track progress on RSS 3.0 and encourage community inclusion.

4) Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS)...

5) Make sure RSS 3.0 is simply the best-of-breed syndication protocol. Translation: don't let Microsoft or Google come up with a better spec that has more features.

I'm terribly amused by the fact that Robert Scoble likes to claim that he doesn't represent Microsoft in his blog then posts items where he basically acts like he does. An RSS 3.0 that competes with Atom is probably the worst possible proposal to resolve the current conflict in the website syndication space and a very clear indication that this is all about personality conflicts. The problem  with the Atom syndication format is that it is an incompatible alternative of RSS 1.0/RSS 2.0 which provides little if any benefit to content producers or news aggregators consumers. Coming up with another version of RSS doesn't change this fact unless it is backwards compatible and even then besides clarifications to the original spec I'm unsure what could be added to the core although I can think of a number of potential candidates. However this still would be a solution looking for a problem.

While talking to Tim Bray and Sam Ruby at XML 2003 last week I stated that a number of the problems with syndication have little to do with the core spec and most aggregator authors wouldn't consider any of the problems harped upon on the Atom lists as a big deal. The major problems with syndication today have little to do with the syndication format and more to do with it's associated technologies. 

As little interest I have in an Atom syndication format I have an order of magnitude less interest in a new version of RSS that exists solely to compete with Atom..

PS: Am I the only one who caught the trademark Microsoft arrogance (which really comes from working on Windows[0]) in Scoble's post? I especially liked

"Here's what I'd do if I were at Harvard and in charge of the RSS spec...Work with Microsoft to ensure that RSS 3.0 will be able to take advantage of Longhorn's new capabilities (in specific, focus on learning Indigo and WinFS). Build a prototype (er, have MSN build one) that would demonstrate some of the features of RSS 3.0 -- make this prototype so killer that it gets used on stage at the Longhorn launch

I literally guffawed out loud. So if Harvard doesn't tie RSS to Windows then all is lost? I guess this means that NetNewsWire and Straw should get ready to be left behind in the new Microsoft-controlled RSS future. Hilarious. 

[0] When you work on the most popular piece of software in the world you tend to have a different perspective from most other software developers in the world including within Microsoft.


Categories: Ramblings

December 15, 2003
@ 05:04 PM

James Robertson writes

Ed Foster points out that MS - like many other vendors - is forbidding benchmarks as part of their standard contracts:

Is it possible Microsoft has something to hide about the performance of its server and developer products? It's hard to escape that conclusion when you see how many of its license agreements now contain language forbidding customers of those products from disclosing benchmark results.

So what are MS and the other vendors afraid of?

I'm not sure what the official line is on these contracts but I've come to realize why the practice is popular among software vendors. A lot of the time people who perform benchmarks are familiar with one or two of the products they are testing and know how to tune those for optimal performance but not the others which leads to skewed results. I know that at least on the XML team at Microsoft we don't block people from publishing benchmarks if they come to us, we just ensure that their tests are apples-to-apples comparisons and not unfairly skewed to favor the other product(s) being tested.

Just a few days ago I attended a session at XML 2003 entitled A Comparison of XML Processing in .NET and J2EE where the speaker stated that push based XML parsers like SAX was more performant than pull-based XML parsers like the .NET Framework's XmlReader when dealing with large documents. He didn't give any details and implied that they were lacking because of the aforementioned EULA clauses.  Without any details, sample code or definition of what document size is considered "large" (1MB, 10MB, 100MB, 1GB?)  it's difficult to agree or disagree with his statement. Off the top of my head there aren't any inherrent limitations of pull-based XML parsing that come to mind that should make it perform less than push based parsing of XML documents although differences in implementations makes all the difference. I suspect that occurences like this are why many software  vendors tend to have clauses that limit the disclosure of benchmark information in their EULAs.

Disclaimer: The above is my personal opinion and is in no way, shape or form an official statement of the position of my employer.


Categories: Life in the B0rg Cube

I'm now experimenting with various Windows CVS clients to see which best suits my needs for RSS Bandit development. So far I have tried WinCVS which seems OK and I'm about to evaluate Tortoise CVS which Torsten seems very happy with.

Later on I'll experiment with CVS plugins that are integrated into Visual Studio such as Jalindi Igloo or the SCC Plugin for Tortoise CVS. I never got to use the Visual Studio plugin for GotDotNet workspaces when RSS Bandit was hosted there because the original IDE I started developing RSS Bandit (Visual C# 2002 standard edition) with did not support said plugin so I am curious as to what development with source repository access as part of the IDE feels like.


Categories: RSS Bandit