In less than a week we'll be launching the XML Developer Center on MSDN and replacing the site at The main differences between the XML Developer Center and what exists now will be

  1. The XML Developer Center will provide an entry point to working with XML in Microsoft products such as Office and SQL Server.

  2. The XML Developer Center will have an RSS feed.

  3. The XML Developer Center will pull in content from my work weblog.

  4. The XML Developer Center will provide links to recommended books, mailing lists and weblogs.

  5. The XML Developer Center will have content focused on explaining the fundamentals of the core XML technologies such as XML Schema, XPath, XSLT and XQuery.

  6. The XML Developer Center will provide sneak peaks at advances in XML technologies at Microsoft that will be shipping future releases of the .NET Framework, SQL Server and Windows.

During the launch the feature article will be the first in a series by Mark Fussell detailing the changes we've made to the System.Xml namespaces in Whidbey. His first article will focus on the core System.Xml classes like XmlReader and XPathNavigator. A follow up article is scheduled that will talk about additions to System.Xml since the last version of the .NET Framework such as XQuery. Finally, either Mark or Matt Tavis will write an article about the changes coming to System.Xml.Serialization such as the various hooks for allowing custom code generation from XML schemas such as IXmlSerializable (which is no longer an unsupported interface) and SchemaImporterExtensions.

I'll also be publishing our guidelines for exposing XML in .NET applications as well during the launch. If there is anything else you'd like to see on the XML Developer Center let me know.


Categories: XML

Both Dave Walker and Tim Bray state their aggregators of choice barfed when trying to read a post entitled because their aggregators of choice didn't know how to deal with tags in content. Weird. RSS Bandit dealt with it fine. Click below for the screenshot.

Categories: RSS Bandit

I just noticed that Arve Bersvendsen has written a post entitled 11 ways to valid RSS where he states he has seen 11 different ways of providing content in an RSS feed namely

Content in the description element

I have so far identified five different variants of content in the <description> element:

  1. Plaintext as CDATA with HTML entities - Validate
  2. HTML within CDATA - Validate
  3. HTML escaped with entities - Validate
  4. Plain text in CDATA - Validate
  5. Plaintext with inline HTML using escaping - Validate


I have encountered and identified two different ways of using <content:encoded>:

  1. Using entities - Validate
  2. Using CDATA - Validate

XHTML content

Finally, I have encountered and identified four different ways in which people has specified XHTML content:

  1. Using <xhtml:body> - Validate
  2. Using <xhtml:div> - Validate
  3. Using <body> with default namespace - Validate
  4. Using <div> with default namespace - Validate

At first these seem like a lot until you actually try to program against this using an XML parser. In which case, the first thing you notice is that there is no difference programming against CDATA vs. escaped entities since they are both syntactic sugar.  For example, the XML infoset and data models compatible with it such as the XPath data model do not differentiate character content that is written as character references, CDATA sections or entered directly. So the following

    <test><![CDATA[ ]]>2</test>
    <test> 2</test>

are all equivalent. More directly if you loaded all three into an instance of System.Xml.XmlDocument and checked their InnerText property they'd all return the same result. So this reduces Arve's first two elements to

Content in the description element

I have so far identified five two different variants of content in the <description> element:

  1. HTML
  2. Plain text


I have encountered and identified two different ways one way of using <content:encoded>:

  1. Containing escaped HTML content

If your code makes any distinctions other than these then it is a sign that you have (a) misunderstood how to process RSS or (b) are using a crappy XML parser. When I first started working on RSS Bandit I also was confused by these distinctions but after a while things became clearer. The only problem here is the description element since you can't tell whether it is HTML or not without guessing. Since RSS Bandit always provides the content to an embedded web browser this isn't a problem but I can see how it could be one for aggregators that don't know how to process HTML (although I've never seen one before).

Another misunderstanding by Arve seems to be how namespaces work in XML. A few years ago I wrote an XML Namespaces and How They Affect XPath and XSLT where I wrote

A qualified name, also known as a QName, is an XML name called the local name optionally preceded by another XML name called the prefix and a colon (':') character...The prefix of a qualified name must have been mapped to a namespace URI through an in-scope namespace declaration mapping the prefix to the namespace URI. A qualified name can be used as either an attribute or element name.

Although QNames are important mnemonic guides to determining what namespace the elements and attributes within a document are derived from, they are rarely important to XML aware processors. For example, the following three XML documents would be treated identically by a range of XML technologies including, of course, XML schema validators.

<xs:schema xmlns:xs="">
        <xs:complexType id="123" name="fooType"/>
<xsd:schema xmlns:xsd="">
        <xsd:complexType id="123" name="fooType"/>
<schema xmlns="">
        <complexType id="123" name="fooType"/>

Bearing this information in mind this reduces Arve's example to

XHTML content

Finally, I have encountered and identified four two different ways in which people has specified XHTML content:

  1. Using <xhtml:body>
  2. Using <xhtml:div>

Thus with judicious use of an XML parser (which makes sense since RSS is an XML format), Arve's list of eleven ways of providing content in RSS is actually whittled down to five. I assume Arve is unfamiliar with XML processing which led to his initial confusion.

NOTE: Before anyone bothers to start pointing out that Atom somehow frees aggregator author from this myriad of options I'll point out that Atom has more ways of encoding content than these. Even ignoring the inconsequential differences in syntactic sugar in XML (escaped tags vs. unescaped tags in CDATA sections) the various combinations of the <summary> and <content> elements, the mode attribute (escaped vs. xml) and MIME types (text/plain, text/html, application/xhtml+xml) more than double the number of variations possible in RSS.


Categories: XML

March 16, 2004
@ 05:10 PM

While hanging around I discovered a series on Naked Objects. It's an interesting idea that eschews separating application layers in GUIs (via MVC) or server applications (presentation/business logic/data access layers) and instead only coding domain model objects which then have a standard GUI autogenerated for them. There are currently five articles in the series which are listed below with my initial impressions of each article provided below.

Part 1: The Case for Naked Objects: Getting Back to the Object-Oriented Ideal
Part 2: Challenging the Dominant Design of the 4-Layer Architecture
Part 3: Write an application in Java and deploy it on .Net
Part 4: Modeling simultaneously in UML, Java, and User Perspectives
Part 5: Fat is the new Thin: Building Rich Internet Applications with Naked Objects

Part 1 points out that in many N-tier server-side applications there are four layers; persistence, the domain model, the controller and presentation. The author points out that object-relational mapping frameworks are now popular as a mechanism for collapsing the domain model and persistence layer. Naked objects comes from the other angle and attempts to collapse the domain model, control and presentation layers. The article also argues that the current practices in application development efforts such as web services and component based architectures which separate data access from the domain model reduce many of the benefits of object oriented programming.

In a typical naked objects application, the framework uses reflection to determine the methods of an object and render them using a generic graphical user interface (screenshot). This encourages objects to be 'behaviourally complete' all significant actions that can be performed on the object must exist as methods on the object.

The author states that there are six benefits of using naked objects

  • Higher development productivity through not having to write a user interface
  • More maintainable systems through the enforced use of behaviourally-complete objects.
  • Improved usability deriving from a pure object-oriented user interface
  • Easier capture of business requirements because the naked objects would constitute a common language between developers and users.
  • Improved support for test-driven development
  • It facilitates cross-platform development

What I found interesting about the first article in the series is that the author rails against separating the domain model from data access layer but it seems naked objects are more about blending the GUI layer with the domain model. There seem to be some missing pieces to the article. Perhaps the implication is that one should use object-relational mapping technologies in combination with naked objects to collapse an application from 4 layers to a single 'behaviorally complete' domain model?

Part 2 focuses on implementing the functionality of a 4 layer application using naked objects. One of the authors had written a tutorial application for a book that which was software for running  an auto servicing shop which performed tasks like booking-in cars for service and billing the customer. The conclusion after the rewrite was that the naked objects implementation took less lines of code and had less classes than the previous implementation which had 4 layers. Also it took less time to add new functionality such as obtaining the customer's sales history to the application in the naked objects implementation than in the 4 layer implementation. 

There are caveats, one was that the user interface was not as rich as the one where the developer had an explicit presentation layer as opposed to relying on a generic autogenerated user interface. Also complex operations such as 'undoing' a business action were not supported in the naked objects implementation.  

Part 3 points out that if you write a naked objects implementation targetting Java 1.1 then you can compile it using J# without modification. Thus porting from Java to .NET should be a cinch as long as you use only Java 1.1. Nothing new here.

Part 4 points out that naked objects encourages “code first design” which the authors claim is a good thing. They also point out if one really wants to get UML diagrams out of a naked objects application they can use tools like Together which can generate UML from source code.

I'm not sure I agree that banging out code first and writing use cases or design documents afterwards is a software development methodology worth encouraging.

Part 5 trots out the old saw about rich internet applications and how much better they are than the limiting HTML-based browser applications. The author points out that with the writing a Java applet which uses the naked objects framework gives a richer user experience than an HTML-based application. However as mentioned in previous articles you could build an even richer client interface with an explicit presentation layer instead of relying on the generic user interface provided by the naked objects framework. 

Interesting ideas. I'm not sure how well they'd scale up to building real-world applications but it is always good to challenge assumptions so developers don't get complacent. 


Categories: Technology

March 15, 2004
@ 04:35 PM

It what seems to be the strangest news story I've read this year I find out Sun snatches up XML guru what I found particularly interesting in the story was the following excerpt

One of the areas Bray expects to work on is developing new applications for Web logs, or "blogs," and the RSS (Resource Description Framework Site Summary) technology that grew out of them. "I think that this is potentially a game-changer in some respects, and there are quite a few folks at Sun who share that opinion," he said.

Though RSS is traditionally thought of as a Web publishing tool, it could be used for much more than keeping track of the latest posts to blogs and Web sites, Bray said. "I would like to have an RSS feed to my bank account, my credit card, and my stock portfolio," he said.

Personally I think it's a waste of Tim Bray's talents having him work on RSS or it's competitor du jour, Atom, but it should be fun seeing whether he can get Sun out of it's XML funk as well stop them from spreading poisonous ideas like replacing XML with ASN.1.

Update: Tim Bray has a post about his new job entitled Sunny Boy where he writes

That aside, I’m comfy being officially a direct competitor of Microsoft. On the technical side, I find the APIs inelegant, the UI aesthetics juvenile, and the neglect of the browser maddening.

Sounds like fighting words. This should be fun. :)


Categories: XML

My homegirl, Gretchen Ledgard (y'know Josh's wife), has helped start the Technical Careers @ Microsoft weblog. According to her introductory post you'll find stuff like

  • Explanation of technical careers Microsoft.  What do people really do at Microsoft?  What does a “typical” career path look like?  What can you do to prepare yourself for a career at Microsoft?
  • Sharing of our recruiting expertise.  Learn “trade secrets” from Microsoft recruiters!  What does a good resume look like?  How can you get noticed on the internet?  How should you best prepare for an interview?
  • Information on upcoming Microsoft Technical Recruiting events and programs. 
  • I hope Gretchen dishes up the dirt on how the Microsoft recruiters deal with competition for a candidate such as when a prospective hire also has an offer from another attractive company such as Google. Back in my college days, the company that was most competitive with Microsoft was Trilogy (what a difference a few years make). 

    I remember when I first got my internship offer and I told my recruiter I also had an offer from i2 technologies, she quickly whipped out a pen and did the math comparing the compensation I'd get at Microsoft to that I'd get from i2. I eventually picked Microsoft instead of i2 for that summer internship which definitely turned out to be a life altering decision. Ahhh, memories.    


    After lots of procrastination we now have online documentation for RSS Bandit. As usual, The current table of contents is just a place holder and the real content is Phil Haack's Getting Started with RSS Bandit. The table of contents for the documentation I plan to write [once the MSDN XML Developer Center launches in about a week or so] is laid out below.

    • Bandit Help
      • Getting Started
        • What is an RSS feed?
        • What is an Atom feed?
        • The RSS Bandit user interface
        • Subscribing to a feed
        • Locating new feeds
        • Displaying feeds
        • Changing the web browser security settings
        • Configuring proxy server settings
      • Using RSS Bandit from Multiple Computers
        • Synchronization using FTP
        • Synchronization using a dasBlog weblog
        • Synchronization using a local or network file
      • Advanced Topics
        • Customizing the Way Feeds Look using XSLT
        • Creating Search Folders
        • Adding Integration with your Favorite Search Engine
        • Building and Using RSS Bandit plugins
      • Frequently Asked Questions
      • How to Give Provide Feedback
      • Contributing to RSS Bandit

    If you are an RSS Bandit user I'd love to get your feedback


    Categories: RSS Bandit

    I'm not sure which takes the cake for geekiest weeding proposal, popping  the question on a customized PC case or on a custom Magic: The GatheringTM card.

    Anyone else have some particularly geeky wedding proposals to share?


    It is now possible to use RSS Bandit to read protected Live Journal feeds now that they support HTTP authentication. Brad Fitzpatrick wrote

    Digest auth for RSS
    Aparently this was never announced:

    Get RSS feeds (including protected entries) by authenticating with HTTP Digest Auth. Good for aggregators.

    Good indeed. Given that I agitated for this in a previous post I'd like to thank the LiveJournal folks for implementing this feature.


    Categories: RSS Bandit

    I've written about this before but a recent mail from David Stutz and rumblings about slipped dates pushed this topic to the forefront of my mind today. If you have competition whose mantra is to ship "little components that can be combined together" and "release early, release often" is it wise to counter this with a strategy that involves integrating monolithic applications into even larger applications those multiplying the complexity and dealing with integration issues?

    On the one hand, no one can argue that the success of Microsoft Office isn't related to the fact that it is a suite of programs that work well together but on the other hand as David Stutz wrote in his farewell email

    As the quality of this software improves, there will be less and less reason to pay for core software-only assets that have become stylized categories over the years: Microsoft sells OFFICE (the suite) while people may only need a small part of Word or a bit of Access. Microsoft sells WINDOWS (the platform) but a small org might just need a website, or a fileserver. It no longer fits Microsoft's business model to have many individual offerings and to innovate with new application software. Unfortunately, this is exactly where free software excels and is making inroads. One-size-fits-all, one-app-is-all-you-need, one-api-and-damn-the-torpedoes has turned out to be an imperfect strategy for the long haul.

    Digging in against open source commoditization won't work - it would be like digging in against the Internet, which Microsoft tried for a while before getting wise. Any move towards cutting off alternatives by limiting interoperability or integration options would be fraught with danger, since it would enrage customers, accelerate the divergence of the open source platform, and have other undesirable results. Despite this, Microsoft is at risk of following this path, due to the corporate delusion that goes by many names: "better together," "unified platform," and "integrated software." There is false hope in Redmond that these outmoded approaches to software integration will attract and keep international markets, governments, academics, and most importantly, innovators, safely within the Microsoft sphere of influence. But they won't .

    Exciting new networked applications are being written. Time is not standing still. Microsoft must survive and prosper by learning from the open source software movement and by borrowing from and improving its techniques. Open source software is as large and powerful a wave as the Internet was, and is rapidly accreting into a legitimate alternative to Windows. It can and should be harnessed. To avoid dire consequences, Microsoft should favor an approach that tolerates and embraces the diversity of the open source approach, especially when network-based integration is involved.

    I don't agree with the general implication of David's comments but I do believe there is a grain of truth in what he writes. The issues aren't as black and white as he paints them but his opinions can't be written off either. The writing is definitely on the wall, I just wonder if anyone is reading it.


    Categories: Life in the B0rg Cube