Bryan Kam has reviewed a couple of free RSS aggregators for Windows. Below are excerpts of his reviews including his final choice  

I began with the aptly-named and small FeedReader 2.5. While it has all the basic features covered, it lacks a lot of things I like...Score: 3/10. Fast but featureless.

Next I tried Sharpreader This is a pretty good one, which features different sorting, various update times, alerts, inherited properties, can import/export OPML...It would take 40+ MB RAM on my desktop computer, and sometimes would take 100% cycles for no reason...Score: 5/10. Full of features, but slow as hell!

Another one I tried a while back was Syndirella 0.9b. While I was not a big fan of the Windows 3.1-esque interface, it does have a rudimentary scraper... This is great for sites that don't offer feeds. Other than that, though, this reader is pretty lacking, not even having categories which are a necessity in my opinion. Score: 5/10. Nice scraper, the rest kinda sucks.

Currently I'm using Abilon 2.0, which has many of the features I like...The interface is divided into three vertical columns: the far left is the list of feeds, the middle is the items in the selected feed, the right is the detail for the selected item. I find this very weird. Score: 7/10. It's got the goods, it's small, but it's not fun to use.

Okay, another brief RSS reader review. This one is called RSS Bandit and I've discarded Abilon in favor of it...Feature-wise it's pretty standard. The little slide-up alerts, which many of these readers have, is actually reliably click-able in this program...Another good feature is its "Locate RSS" feeds which attempts to find a feed for whatever websites or keywords you enter.8/10. Decent, but lacks that extra something.

It's good to read first hand accounts of what people like or dislike about RSS Bandit especially when compared to other RSS aggregators. I tend to agree with Bryan that RSS Bandit currently leads the pack amongst the major free RSS aggregators for Windows. The next release will aim at being competitive with commercial aggregators such as FeedDemon and NewzCrawler.

This should be a fun summer.  


Mark Pilgrim has a blog post entitled how to make a linkblog in Atom which shows one technique for syndicating a list of links in an Atom feed.  Unfortunately there is one problem with Mark's article, the technique it recommends violates the ATOM 0.3 specification and generates an invalid feed.

There are two problem sections in Mark's article. In the first How to link to an article he writes

But what about the super-fascinating thing we're actually linking to? That goes in its own element.

<link rel="related" type="text/html"
     title="Setting up an iTunes server in FreeBSD"/>

and in the section entitled How to credit people whose links you republish he writes

Simply put, a "via" link is a link back to where you found the link you're posting. In this example, I discovered the article on setting up a FreeBSD iTunes server via Jeffrey Veen, so let's give him some credit:

<link rel="via" type="text/html" href="" title="Jeffrey Veen"/>

The problem with both sections is that Mark uses values for the rel attribute that are not considered valid by the Atom 0.3 specification. In Section 3.4.1 of the Atom specification it states

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification

On navigating to the provided URL and reading Section 5.4.1 of the Atom specification which defines the valid values of the rel attribute of the link element there is the following list

5.4.1  rel

This attribute describes the relationship from the current document, be it HTML or Atom, to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types. Note that these values are case insensitive. With type="application/x.atom+xml" we have the following interpretations of the relations.

The URI in the href attribute points to an alternate representation of the containing resource.
The Atom feed at the URI supplied in the href attribute contains the first feed in a linear sequence of entries.
The Atom feed at the URI supplied in the href attribute contains the next N entries in a linear sequence of entries.
The Atom feed at the URI supplied in the href attribute contains the previous N entries in a linear sequence of entries.
The URI given in the href attribute is used to edit a representation of the referred resource.
The URI in the href attribute is used to create new resources.
The URI given in the href attribute is a starting point for navigating content and services.

As can be seen neither related nor via which are used in Mark's article are in the above list. I had expected the Feed Validator written by Mark Pilgrim and Sam Ruby to flag this error but currently when one validates Mark's b-links feed it validates as Valid Atom. I have filed a filed bug# 963354 in the Feed Validator's Bug Database about this issue. Hopefully this error will be resolved soon.

On a final note, it is bad enough that we are going to have to deal with two versions of Atom in the wild (Atom 0.3 and whatever comes out of the standards process) it would be unfortunate to further fragment this by deploying intermediate versions of the format based on mailing list discussions. One of the benefits of Atom is supposed to be that it will usher in an era of rigorously defined specifications in the syndication space, that won't be worth much if people ignore the specifications and go their own way.


Yesterday I went to the Apple store in the Bellevue mall to replace the headphones on my iPod which had begun to fray. When I walked up to the counter and told the girl there what I wanted she ushered me to a customer service desk claiming that if my iPod was under warranty I could get the headphones replaced for free. I was highly skeptical of this since I didn't buy the iPod at the Apple Store but at Best Buy and didn't even have my receipt anyway.

Waiting at the customer service desk I got to soak in some of the ambiance of Apple Store. It is definitely a cool place, I liked the flat screen TV over the customer service desk with quotes from luminaries across history such as

  • Plato is my friend, Aristotle is my friend but my best friend is truth - Sir Isaac Newton
  • We must be the change we wish to see in the world - Mahatma Ghandi

When it was finally my turn, my name was displayed on the flat screen TV above the customer support desk and I walked up to be served. I told the guy behind the desk that I needed some new headphones and the girl behind the counter had directed me to him to see if I could get them replaced by the warranty. I explained that I thought this would be unlikely given that I bought the iPod at Best Buy not the Apple Store and didn't have a receipt. To which he replied “It's an Apple product right? I'll just check the serial number”. To my surprise he did just that and I walked out of there with brand new head phones. To cap the experience he also fixed some weird issues I'd been having with my iPod by pointing me to the recent iPod firmware update.

That's what I call fantastic customer service. I felt so good about Apple afterwards I felt like going back to the store and buying some Apple stuff but there's nothing I need right now.  


Categories: Ramblings

May 28, 2004
@ 06:52 PM

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model  that appeared on Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it  is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

  1. Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data.
  2. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.  
  3. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.


Categories: XML recently ran an article entitled Document-Centric .NET, that highlights the various technologies for working with XML that exist in the .NET Framework. The article provides a good high level overview of the various options you have for processing XML in the .NET Framework. The article includes an all important caveat which I wish more people knew about and which I keep wanting to write an article about but never get around to doing. The author writes 

However, keep in mind that there are W3C XML Schema features that are not directly compatible with .NET's XML-to-database and XML-to-object mapping tools.

This is very true. Besides our schema validation technologies, most Microsoft technologies or products that utilize W3C XML Schema support a subset of the language due to impedance mismatches between the language and the underlying data model or type system of the target environment.

In fact the only complaint I have about the article is a nitpick about its title. In XML circles, document-centric implies a usage of XML that isn't borne out by his article. If you are interested in the difference between data-centric XML and document-centric XML you should read my article Can One Size Fit All? in XML Journal. In that article I talk about the differences between XML that is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The former is data-centric XML while the latter is document-centric.



Categories: Mindless Link Propagation | XML

I recently stumbled on an entry by Lucas Gonze where he complains about the RSS <enclosure> element. He writes

Problems with the enclosure element:

  • It causes users to download big files that they will never listen to or watch, creating pointless overload on web hosts.
  • It doesn't allow us to credit the MP3 host, so we can't satisfy the netiquette of always linking back.
  • For broadband users, MP3s are not big enough to need advance caching in the first place.
  • The required content-type attribute is a bad idea in the first place. Mime settings are already prone to breakage, adding an intermediary will just create another source of bugs. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.
  • The required content-length attribute should not be there. It requires people who link to MP3s to HEAD them and calculate the length, which is sometimes not practical. It makes variable-length MP3s illegal. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.

The primary problem with the <enclosure> element is that it is overspecified. Having an element that says, here is a pointer to some data that is related to this entry that is too large to fit in the feed is a good idea. Similarly providing a hint at what the MIME type is so the reader knows whether it can handle that MIME type or can display something specific to that media type in the user interface without making an additional request to the server is very useful. The description of the enclosure element in RSS 2.0 states

<enclosure> sub-element of <item> 

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="" length="12216320" type="audio/mpeg" />

Syndication geeks might notice that this is akin to the <link> element in the ATOM 0.3 syndication format which is described as

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification <eref></eref>.

3.4.2  "type" Attribute

The "type" attribute indicates an advisory media type; it MAY be used as a hint to determine the type of the representation which should be returned when the URI in the href attribute is dereferenced. Note that the type attribute does not override the actual media type returned with the representation.

Link constructs MUST have a type attribute, whose value MUST be a registered media type [RFC2045].

3.4.3  "href" Attribute

The "href" attribute contains the link's URI. Link constructs MUST have a href attribute, whose value MUST be a URI [RFC2396].

xml:base [W3C.REC-xmlbase-20010627] processing MUST be applied to the atom:url element.

3.4.4  "title" Attribute

The "title" attribute conveys human-readable information about the link. Link constructs MAY have a title attribute, whose value MUST be a string.

So the ideas behind the <enclosure> element were good enough that they appear in ATOM with some additional niceties and a troublesome bit (the length attribute) removed. So if the concepts behid the <enclosure> element are so good that they are first class members of the ATOM syndication format. Why does Lucas not like it? The big problem with RSS enclosures is how Dave Winer expected them to be used. An aggregator was supposed to act like a TiVo, automatically downloading files in the background and presenting them to you when it's done. The glaring problem with doing this is that it means lots of people are automatically downloading large files that they didn't request which is a significant waste of bandwidth. In fact, most aggregators either do not support enclosures or simply show them as links which is what FeedDemon and RSS Bandit (with the Outlook 2K3 skin) do. The funny thing is that the actual RSS specification doesn't describe this behavior, instead this behavior is implied by Dave Winer's descriptions of use cases.

Lucas also complains about the required length attribute which is problematic if you are pointing to a file on a server you don't own because you have to first download the file or perform a HTTP HEAD to get its size. The average blogger isn't going to go through that kind of trouble. Although tools could help it makes sense for the  length attribute  to have been an optional hint.

I have to disagree with Lucas's complaints about putting the MIME type in the <enclosure> element. He complains that the MIME type in the <enclosure> could be wrong and in fact that in many cases web servers  serve a file with the wrong MIME type. Thus he concludes that putting the MIME type in the enclosure is wrong. Client software should be able, to decide how to react to the enclosure [e.g. if it is audio/mpeg display a play button] without having to make additional HTTP requests especially since as Lucas points out it is not a 100% guaranteed that performing an HTTP HEAD of the linked file will actually get you the correct MIME type from the web server.

In conclusion, I agree that the <enclosure> element is problematic but most of the problems are due to the implied use case suggested by the spec author, Dave Winer, as opposed to the actual information provided by the element. The ATOM approach of describing the information provided by each element in a feed but not explicitly describing the expected behavior of clients is a welcome approach. Of course, there will always be developers who require structure or take an absence of explicit guidelines to mean do stupid things (like aggregators that fetch your feed every 5 minutes)  but these are probably better handled in "Best Practices" style documents or test suites than in the actual specification.


Categories: XML

May 26, 2004
@ 05:22 PM

One of the hardest problems in software development is how to version software and data formats. One of the biggest problems for Windows for years has been DLL Hell which is a versioning problem. One of the big issues I have to deal with at work is how to deal with versioning issues when adding or removing functionality from classes.

For a few weeks, I've been planning to write up some guidelines and concerns for versioning XML formats based on my experiences and those of others at Microsoft. I've got some folks on the XML Web Services team interested in riding shotgun such as Gudge and Doug. It also looks like Edd Dumbill is interested in the abstract for the article, so it with any luck it should end up on when it is done.

I was reminded of the importance of writing this article when I saw a post on the atom-syntax list by Google's Steve Jensen which implied that it just occured to the folks at Google that they'd have to support multiple versions of ATOM. This is excarberated by the fact that they are deploying implementations based on draft specs. Like I said before, never ascribe to malice that which can be explained by incompetence


Categories: XML

When I first got to Microsoft a few years ago, there was an acknowledgement from upper management that Microsoft technologies tended not to attract the vibrant sense of community that existed in Java or Open Source communities. This began a push, first in the Developer Division which soon spread across the company for Microsoft employees to become more involved and help nurture the developer communities surrounding our technologies and products. Two or three years later, I am heartened to read posts such as this entry from Norman Alex Rupp from his first day at TechEd 2004, note that I have italicized some of the text from the entry to emphasize some key points I found interesting

The User Group Summit was headed up by the International .NET Association (INETA). From what I can tell, INETA User Groups are analogous to the Java User Groups. They're an independent organization, and their founder goes to great lengths to maintain a comfortable operating distance from Microsoft's PR machine, while simultaneously being careful not to alienate them. It strikes me that the INETA groups highly value their independence and don't want to come across as a Microsoft vendorfest to their members. They focus on C# development topics and although they thankfully accept Microsoft's sponsorship, they do maintain a good degree of independence. That's a difficult balance to strike.

What really fascinated me about the UG Leaders Summit was that the .NET Group Leaders from around the country knew each other, had their own community structure, and genuinely seemed to enjoy being around each other. These guys were rowdy. They were having a good time. And it wasn't just because we each got a 30 oz bottle of Tequila at the end of the meeting. People were really positive and nice. This was a slight cultural change for me, because all too often I find the Open Source Java community to be extremely high strung and competitive--sometimes to the point of being vicious. I like to think of the dynamic of our community as an extreme form of tough love. I haven't worked a lot with the Java User Group communities from around the country, and I have an inkling that things are a bit different in those circles than they are in the Jakarta / JBoss / TSS / Bile Blog OS Javasphere that used to form my only umbilical link to our community. (For the record, I don't think this "tough love" culture extends into the community--the folks from Sun's "shining city on the hill" are pretty amiable).

It was just a different vibe--not necessarily better, just different. I can see more of that in the future of the Javasphere. We live in a pressure cooker, but as the language and platform mature and we continue to carve out our niche, gain credibility in the industry and grow as developers, I think we'll see less of the infighting and more of the cooperation typified by last year's OpenEJB / Geronimo alliance and by the general good will surrounding

One surprising thing I learned at today's Summit is that in the last 3 years, INETA and Microsoft have built up a 200,000 to 250,000 member developer community, and they're continuing to push forward, doing everything they can to make sure that .NET technologies take off at the local community level. They're hyperactively heading up programs to develop high school and college students, and they recognize the long term importance of bringing fresh blood into the industry. They are investing time, software and significant amounts of money into their evangelism efforts.

Essentially, what INETA and Microsoft are trying to do is outgrok the ASF on community building. And from what I just saw, they're way ahead of the curve. In their words, "we're trying to get it. You can help us REALLY get it." And by "get it" I think they mean to figure out how to have a successful user community in every city and on every major college campus in the world. I'm speculating, but it's hard not to smell ambition this raw.

This is one of the reasons I like working at Microsoft. When the hive mind in the b0rg cube decides to do something, it goes all out and does it well. The challenge is figuring out what the right thing to do is and then convincing the right people. I am continually amazed at how much things have changed in the past few years with regards to the degree of openness and the higher level of interaction between product groups and their customers. 


Categories: Life in the B0rg Cube

I've mentioned in the past why I think XML 1.1 was a bad idea in my post XML 1.1: The W3C Gets It Wrong. It seems at least one W3C working group, the XML Protocols working group to be exact, has now realized why XML 1.1 is a bad idea a few months later. Mark Nottingham recently posted a message to the W3C Technical Architecture Group's mailing list entitled Deployment and Support of XML 1.1 where he writes

In the Working Group's view, this highlights a growing misalignment in
the XML architecture. Until the advent of XML 1.1, XML 1.0 was a single
point of constraint in the XML stack, with all of the benefits (e.g.,
interoperability, simplicity) that implies. Because XML 1.1 has
introduced variability where before there was only one choice, other
standards now need to explicitly identify what versions of XML they are
compatible with. This may lead to a chicken-and-egg problem; until
there is a complete stack of XML 1.1-capable standards available, it is
problematic to use it.

Furthermore, XML-based applications will likewise need to identify
their capabilities and constraints; unfortunately, there is no
consistent way to do this in the Web architecture (e.g., RFC3023 does
not provide a means of specifying XML versions in media types).

As I mentioned in my previous post about the topic, XML 1.1 hurts the interoperability story of XML which is one of the major reasons of using it in the first place. Unfortunately, the cat is already out of the bag, all we can do now is try to contain or avoid it without getting our eyes clawed out. I tend to agree with my coworker Michael Rys, the day XML 1.1 became a W3C recommendation was a day of mourning.


Categories: XML

This is mostly a bugfix release. Major features will show up in the next release scheduled for the end of the summer or later.

Download the installer from here. Differences between v1.2.0.112 and v1.2.0.114 below

  • FEATURE: Local search now supports boolean operators so one can write queries like "IBM | Microsoft & !Java" which means search for entries containing Microsoft or IBM but not Java. Queries can also be grouped with parenthesis such as "iPod & (iTunes | (Apple & !fruit))". Thanks to Brian Leonard for the patch.

  • FEATURE: Tree view and list view now support the scroll wheel on Microsoft Intellimouse. 

  • FIXED: "My Feeds" root node displays incorrect unread messages count after remote synchronization.

  • FIXED: Installed version doesn't support Windows XP themes.

  • FIXED: home/end key pressed in the listview don't refresh the detail pane

  • FIXED: Changes on the Options|Feeds default refresh rate dropdown inputbox are not immediatly validated.

  • FIXED: Locating feeds by keyword caused an exception on searches that contains non-ASCII characters.

  • FIXED: Internal browser is not able to display web pages with frames.

  • FIXED: Synchronizing state through WebDAV doesn't use proxy information.

  • FIXED: Temporary search results are no longer persisted (or synchronized).


Categories: RSS Bandit

Myriad shots of Donald Rumsfeld during press briefings have unearthed his deadly secret. Behold, the Rumsfeld Fighting Technique.


May 25, 2004
@ 04:37 PM

The next version of SQL Server will have a significant amount of functionality related to storing, querying and extracting XML from the database. To accompany the information being imparted at TechEd 2004, myself and rest of the folks behind the XML Developer Center on MSDN decided to run a series of articles on the XML features of SQL Server 2005. The articles will run through the month of June.

The first article in the series is XML Support in Microsoft SQL Server 2005. Read this article if you are interested in learning how SQL Server 2005 has become an fully XML-aware database including the addition of the XML datatype, support for XML Schemas, indexing  of XML data, XQuery, querying XML views of relational data using XPath and much more.


Categories: XML

May 22, 2004
@ 06:02 PM

Joshua Allen has a post entitled RSS Politics which does a good job of properly framing the growing Microsoft and RSS vs. Google and Atom silliness spurred by Joi Ito that I've been seeing in the comments on Robert Scoble's weblog. Joshua writes

First, be very clear.  The “debate“ over Atom vs. RSS is a complete non-issue for Microsoft.  We use RSS to serve thousands of customers right now, and most of the people setting up RSS feeds have never heard of the political “debates“.  RSS works for them, and that's all they care about.  On the other hand, if Atom ever reaches v1.0 and we had a business incentive to use it, we would use it.  No need for debate.

Now, of the three or four people at Microsoft who know enough about Atom to have said anything about it, I wouldn't say that anyone has trashed the format.  I and others have pointed out that it's just fine for what it does; just like RSS.  If anything, I have asked hard questions about why I or any business decision maker should be spending resources on the whole debate right now.  If a business has deployed using RSS, what financial motive would they have to switch to a new, nearly identical, format once it ships?  I've got nothing against the Atom people inventing new syndication formats, but I just don't see why *I* should be involved right now.  There's no good reason.

The other comment I've made before is that the Atom community is not being served by the polarizing attitudes of some participants.  The “us vs. them“ comments are not helpful, especially when untrue, and the constant personalization (”Support Atom because I hate Dave Winer!”) just damages the credibility of the whole group (many of whom might have good motives for being involved).

I totally echo his sentiments. In the past couple of months more and more folks at Microsoft have pinged me about syndication and blogging technologies once they learn I wrote RSS Bandit. Every single time I've given them the same advice I gave in my post, Mr. Safe's Guide to the RSS vs. ATOM debate. If you are a feed consumer you'll need to support the various flavors of RSS and the various flavors of ATOM (of which there'll at least be two, ATOM 0.3 and whatever is produced from the IETF/W3C process). If you are a feed producer, you should stick with RSS 0.91/2.0 since this is the widest supported format and the most straightforward.

Although no one has asked yet, I'm also going to give my advice on Mr. Safe at Microsoft should consider adopting the ATOM API. In my personal opinion, the current draft of the ATOM API seems better designed and falls more inline with Microsoft's technologies than the existing alternatives (Blogger API/MetaWeblog API/LiveJournal API), etc. However the API lacks lots of functionality and in fact already there are extensions to the ATOM API showing up in the wild. Currently, these "innovations" are being lauded but given the personalities behind ATOM it is likely that if Microsoft products supported the API and extended it there could be a negative backlash. In which case perhaps going with a product specific API may be the best option if there is sensitivity to such feedback or the ATOM API has to be significantly extended to fit the product's needs.


Categories: Life in the B0rg Cube | XML

I've posted a few entries in the past questioning the value of the Semantic Web as currently envisioned by the W3C along with its associated technologies like RDF and OWL. My most recent post about this was On Semantic Integration and XML. It seems I'm not the only XML geek who's been asking the same questions after taking a look at the Semantic Web landscape. Elliotte Rusty Harrold is at WWW2004 and wrote the following opinions of the Semantic Web on Day 4 of WWW2004

This conference is making me think a lot about the semantic web. I'm certainly learning more about the details (RDF, OWL etc.). However, I still don't see the point. For instance what does RDF bring to the party? The basic idea of RDF is that a collection of URIs forms a vocabulary. Different organizations and people define different vocabularies, and the URIs sort out whose name, date, title, etc. property you're using at any given time. Remind you of anything? It reminds me a lot of XML + namespaces. What exactly does RDF bring to the party? OWL (if I understand it) lets you connect different vocabularies. But so does XSLT. I guess the RDF model is a little simpler. It's all just triples, that can be automatically combined with other triples, and thereby inferences can be drawn. Does this actually produce anything useful, though? I don't see the killer app. Theoretically a lot of people are talking about combining RDF and ontologies from mulktiple sources too find knowledge that isn't obvious from any one source. However, no one's actually publishing their RDF. They're all transforming to HTML and publishing that.

I've written variations of the same theme over the past couple of months. It's just hard to point at any practical value that RDF/OWL/etc provide over XML/XSLT/etc for semantic integration.


Categories: XML

Every couple of months someone asks me why I haven't written up my thoughts about the current and future trends in social software, blogging and syndication as part of a Bill Gates "Think Week" paper. I recently was asked this again and I'm now considering whether to spend some time doing so or not. If you are unfamiliar with a "Think Week", below is a description of one taken from an interview with Bill Gates

I actually do this thing where I take a week and I call it "Think Week" where I just get to go off and read the latest Ph.D. theses, try out new technologies, and try and write down my thoughts about where the market is going. Things are going fast enough that instead of doing one think a year, last year I started doing two a year. And that’s one of the most fun parts of my job. So, you know, not only trying things out, but seeing how the pieces fit together and thinking ahead what kind of software will that require, that’s a big part of my job. And I get lots of great ideas coming from the people inside Microsoft, whether it’s sending e-mail, or meeting with me, and it’s important for me to synthesize that and so there’s a lot of thinking that I’ve got to do. And, you know, that’s fun.

I have been balking at writing one for a few reasons. The first was that it seems like a bunch of effort for relatively small return [the people I know who've written one first hand got the equivalent of a "virtual pat in the back"], the second was that I didn't think this topic would be interesting enough to get past the layer of VPs and technical assistants that probably screen these papers before Bill Gates reads them.

After thinking about this some more it seems that I was wrong about whether BillG would be interested in this topic given his recent endorsement of blogging and syndication. I still don't think much would come out of it but I've now see myself bursting with a lot of ideas about the current and future landscape of blogging and syndication technologies that I definitely want to write something down anyway regardless of who reads it. If I write this paper I plan to make it available online along with my other writings. The question is whether there are any folks out there interested in reading such a paper? If not, it is easier for me to just keep notes on the various ideas and blog bits & pieces of the ideas as I have been doing thus far.

So what do you guys think?


Categories: Ramblings | RSS Bandit

Scoble has a misleading post entitled Microsoft attending Atom meeting

Microsoft attending Atom meeting

Some people have already tried to paint me into a corner when it comes to RSS vs. Atom. Just to be clear. Microsoft's Chris Sells and George Bullock, of Microsoft, are attending the June 2 Atom group meeting.

These post reads like official representatives of Microsoft are attending the Atom conference. Considering that Chris Sells and George Bullock are MSDN folks it is highly unlikely that they are going to be representative of all of Microsoft or of the major parts of Microsoft that would be interested in Atom. I work with standards and product groups every day and I always try to make the distinction between official Microsoft position and personal positions. Even then official Microsoft opinion may vary from product group to product group (it is really a bunch of small companies in here).


Categories: Ramblings

I've been reading the various pieces of feedback on my recent blog post on Why You Won't See XSLT 2.0 or XPath 2.0 in the Next Version of the .NET Framework including the 40 comments in response to the post and the "Microsoft is killing XSLT" thread on xsl-list. Most of it has been flames witrh little useful feedback but there was an interesting response by Norm Walsh entitled XQuery 1.0 or XSLT 2.0? which I've been drawn to respond to. Norm writes

Dare Obasanjo argues that “XQuery is strongly and statically typed while XPath 2.0 is weakly and dynamically typed.” What’s not clear from his post is that he is comparing XQuery 1.0 to XPath 2.0 in backwards compatibility mode (Michael Rys did provide a clarification). That’s an odd comparison to make. XPath 2.0 needs a backwards compatibility mode so that it stands some chance of doing the right thing when used in the context of an XSLT 1.0 stylesheet, but that’s not the expected mode for long-term use.

I thought my point was self evident here but if Norm missed it then it means most of the people who read my original blog post did as well. XPath 2.0 is a subset of XQuery 1.0, the parts of XQuery missing are XML construction, the query prolog, the let-where-orderby parts of the FLWOR expression, typeswitch and a few other things.  XPath 2.0 has a backwards compatibility mode which has different semantics from regular XPath 2.0 and XQuery. When I talked about Microsoft not implementing XPath 2.0 I meant XPath 2.0 in backwards compatibility mode since implementing XQuery means you already have regular XPath 2.0. After all, everything you can do in XPath 2.0 you can do in XQuery. 

Norm also writes

The funniest arguments are the ones that imply that XQuery is a competitor in the same problem space as XSLT, that users will use XQuery instead of XSLT. I say that’s funny because there are so many problems that you simply cannot solve with XQuery. If your data is regular and especially if it’s all stored in a database already so that your XQuery implementation can run really fast, then XQuery absolutely makes sense, but didn’t the database folks already have a query language? Nevermind. If your customers don’t need to solve the kinds of problems for which XSLT was designed, or if you want to sell them some sort of proprietary system to solve them, then implementing XSLT 2.0 probably doesn’t make sense.

I've seen variations of the above theme (XSLT is for transformation, XQuery is for query) in various responses to my original post. Taking away the words query and transformation out of the picture both XQuery and XSLT are designed to reshape XML data. SQL is primarily a query language but you can use it to reshape relational data, this is exactly how SQL views work. For most people, the transformations they want to perform using XSLT also be expressed using XQuery. Per Bothner wrote an article over a year ago on about Generating XML and HTML using XQuery showing how you could use XQuery to transform an XML document to another XML format or HTML. There are a few niceties in XSLT 2.0 that don't exist in XQuery such as the ability to write to multiple output streams but in general most of the things you can do in XSLT 2.0 can also be done in XQuery. In fact this leads me to something else Norm wrote

If you want to transform documents that aren’t regular, especially documents that have a lot of mixed content, XSLT is clearly the right answer. I’ll wager dinner at your favorite restaurant that XQuery cannot be used to implement the functionality of the DocBook XSLT Stylesheets. (You produce the XQuery that does the job, I buy you dinner.)

First of all XSLT is actually very bad at dealing with XML that isn't regular and has lots of mixed content. This is why a number of XSLT gurus got together to created EXSLT and why I started the EXSLT.NET project (grab the latest version from the download servers here). As for transforming DocBook with XQuery, as I mentioned before Per Bothner wrote an article about using XQuery for transformations. In fact, he specifically writes about Transforming DocBook to HTML using XQuery.

The bottom line is that XQuery is as much a "transformation language" as XSLT. XSLT may have some functionality that XQuery does not have but there isn't much I've seen that couldn't be implemented using extension functions. Perhaps I should start an EXQuery.NET project? :)



Categories: XML

A few months ago Mark Fussell wrote an article entitled What's New in System.Xml for Visual Studio 2005 and the .NET Framework 2.0 Release. Mark Ihimoyan has a followup series of blog posts which mentions which of the new features of System.Xml mentioned in Mark's article will actually be in the .NET Compact Framework. The blog posts are listed below

  1. System.Xml in NETCF v2.0 part I
  2. System.Xml in NETCF v2.0 part II
  3. System.Xml in NETCF v2.0 part III



Categories: XML

I recently stumbled on Mike Padula's website. Mike is a Cornell student who's analyzing blogging at Microsoft as part of his course work. We've exchanged some email, and I've offered to answer some questions he raised and clarify some points in my blog. Below are links to Mike's writings thus far

  • Expanding on Interesting Developments
  • Interesting Developments
  • Typical Features of a Blog and their Use
  • Blog What? That's What.
  • Blog What? Part 2
  • Blog What? Part 1
  • Let's start things off: A little intro
  • Proposal
  • One of Mike's main questions is whether blogging at Microsoft is a concerted effort or not. To answer this I'll give a brief history lesson. A couple of years ago, there was one visible Microsoft blogger, Joshua Allen. Back in 2001, Joshua was blogging about life at Microsoft, XML and current affairs way before blogging was a hip buzzword that every new age marketing department is trying to adopt. Joshua took a lot of the early heat for being a Microsoft blogger from arguments with Open Source advocates such as Eric Raymond to having coworkers try to get him fired for revealing too much about working at Microsoft in his blog to having to deal with PR and HR folks concerned about blogging. Through all this Joshua stuck it through and was an example to other Microsoft folks who later showed up with an interest in blogging who wondered if it was kosher to do so. I was one of them. The catalyst for much of the growth of blogging at Microsoft today are twofold, the first being Chris Anderson starting to blog. Chris was gung ho about blogging and not only wrote his own software but created an internal server for hosting blogs which became moderately popular. Many Microsoft folks who blog today started with internal blogs hosted on Chris's machine. The other catalyst is the changing climate internally towards interacting with customers and being 'community' focused. I believe this was directly pushed by execs like Eric Rudder and Steve Balmer. Once Microsoft people saw highly internally and externally visible people like Don Box and Robert Scoble blogging, the rest was inevitable and now we have the current situation today where there are almost six hundred Microsoft folks blogging at

    Mike assumes that there is a concerted effort at Microsoft to blog. This isn't the case as far as I've seen. I know that many product teams now require that people engage in some form of customer interaction and blogs are one way of doing so but there's never been a formal edict. In many cases, some coworkers see a colleague's blog think blogging is a neat idea then start blogging themselves.

    Since we're going down memory lane I should point out that kinda happened by accident. About a year ago most Microsoft bloggers were hosted on disparate locations until was launched with Betsy Aoki being in charge. Around the same time Scott Watermasysk had launched .NET Weblogs which was aimed at being a community for developers interested in the .NET Framework. After a while, the bandwidth costs for .NET Weblogs got too high and Scott needed help. The Microsoft ASP.NET came to the rescue and Weblogs@ASP.NET was born. Eventually, also couldn't take the traffic and Betsy was getting overloaded with feature requests and bug reports since Blogs@GotDotNet was running Chris Anderson's BlogX software which he'd stopped maintaining and I'd offered to take over but never actually did. Again the ASP.NET team came to the rescue and the plan was to migrate the Blogs@GotDotNet to Weblogs@ASP.NET. The original plan was simply to have all Microsoft blogs just merge with those on the Weblogs@ASP.NET site without demarcating who worked for Microsoft and who didn't. This was met with some resistance by the existing users of Weblogs@ASP.NET as well as criticism from folks like myself and Josh Ledgard on internal mailing lists. Eventually the folks at MSDN recanted and decided to go with a plan where the Microsoft bloggers were merged with Weblogs@ASP.NET but one could filter for Microsoft employees by going to Folks like Sara Williams, Betsy, Scott and the ASP.NET folks deserve the praise for getting this done. This solution seemed to satisfy everyone involved. Now is basically where most Microsoft people (not just employees of MSDN) who want to blog in the context of their jobs have their blogs hosted.

    This should answer a bunch of Mike's open questions about blogging at Microsoft. There are two questions specific to my blog I should answer as well. Mike wonders what kind of traffic I get. Based on tracking unique IPs, I'd say I have about a thousand or so regular readers of my personal weblog. Since my work weblog is syndicated on the MSDN XML Developer Center its readership is in the tens of thousands. Mike also wondered how I noticed his project. Three words, Technorati Link Cosmos.


    Categories: Life in the B0rg Cube

    In his blog posting entitled On probation Eric Gunnerson writes

    I got an email today from the owner of all the MSDN columns, telling me that if I wasn't able to produce a column every other month, my column would be put on probation, and then cancelled.

    I'm frankly surprised it took this long - the whole essence of a column or any other periodical is that it is just that - periodical. The combination of me writing a blog and spending a lot of time doing PM stuff has meant that my column has been neglected. June, September, and February does not a periodical column make.

    I got the same email that Eric got since I'm the author of the Extreme XML column on MSDN. From looking at the history of the column since I took it over about two years ago I have averaged an article every two months so the new schedule actually accurately reflects my rate of output and I no longer have to make excuses about publishing missing an article every other month or so.

    Eric asks whether he should keep blogging or keep the column. I believe this is a false dichotomy and shouldn't be an EITHER-OR choice. Comparing an MSDN column to the typical Microsoft blog at I see a number of key differences. A column on MSDN hits different needs and different audiences from a blog. A column is read by tens of thousands of developers online and several thousands more who get it as part of the MSDN library CDs/DVDs. A blog is read by a couple of hundred people or few thousand people. A blog posting typically contains quick tips, interesting tidbits about future technological directions or answers to an FAQ question. A column provides insight into a particular technology or feature and usually provides significant code samples that developers can build upon which helps them significantly in their daily lives. Up until I started RSS Bandit I had never written a GUI application but learned most of what I know about building multithreaded GUI apps in Winforms from Chris Sells' Wonders of Windows Forms column. Then there is the information on how system tray applications work that I learned from Eric Gunnerson's article Creating a System Tray Application. Oleg Tkachenko was inspired to build XInclude.NET by Chris Lovett's article XInclude, Anyone?. The list goes on...

    I think Microsoft's developer customers should get both and don't think one necessarily replaces the other. If given the choice between less postings from Eric about things he wishes were different in C# and why C# doesn't have const methods or articles about how to get Exchange to talk to Excel using C# or building a system tray application complete with code samples, I definitely could do with less blog postings and more articles.

    As blogging became the vogue at Microsoft I've worried that people will think blogging should replace traditional, more useful means of providing information to our customers. It's one thing that blogging lets out the voices that couldn't surmount the high barriers to producing official content (MSDN articles, product documentation, KB articles, etc) but it's another when it is expected to replace official documentation. By the way, don't get me started on the email discussion I saw where some product team wanted to use a link to some blog postings in lieu of product documentation. *sigh*


    Categories: Life in the B0rg Cube

    The Microsoft Pattern and Practices folks have produced an excellent guide to Improving .NET Application Performance and Scalability with a chapter on Improving XML Performance. If you build .NET Framework applications that utilize XML then you owe it to yourself to take a look at the guidelines in that document. There is also a handy, easily printable XML Performance checklist which can be used as a quick way to check that your application is doing the right thing with regards to getting the best performance for XML applications.

    On a similar note, Mark Fussell has posted XmlNameTable: The Shiftstick of System.Xml and XmlNameTable Revisited which provide some tips about how to use the XmlNameTable class to improve processing speed by up to 10% when processing XML documents.


    Categories: XML

    I collect about half a dozen comic book titles and I've noticed a growing trend in blurring the line between the secret identity of a super hero and their super hero identity. In the old days, a super hero had a regular dayjob with coworkers, girlfriends and bills to pay and put on his or her tights at night to fight crime. Nowadays I read comics about the same characters I used to read about as a child who now either no longer hide their non-super hero identity as a secret or have had their secret identity revealed to so many people that it might as well not be a secret.

    This trend is particularly true in Marvel's Ultimate universe. If you are unfamiliar with the Marvel Ultimate universe here is a brief description of it from the Wikipedia entry for Marvel Universe

    A greater attempt has been made with the Ultimate titles; this series of titles is in a universe unrelated to the main Marvel continuity, and essentially is starting the entire Marvel Universe over again, from scratch. Ultimate comics now exist for the X-Men, the Avengers, Spider-Man, and the Fantastic Four. Sales of these titles are strong, and indications are that Marvel will continue to expand the line, effectively creating two Marvel Universes existing concurrently. (Some rumors exist that if sales continue to increase and more titles are added, Marvel may consider making the Ultimate universe its main universe.)

    In the Marvel Ultimate universe the Avengers (now known as the Ultimates) are government agents who treated as celebrities by the tabloids and whose non-super hero identities are known to the public. The Ultimate X-Men appear on the cover of Time magazine and have met with the president several times. The anti-mutant hysteria that is a mainstay of the regular Marvel universe is much more muted in the Ultimate Marvel universe (Thank God for that, they had gone overboard with it although classics like God Loves, Man Kills will always have a special place in my heart). The identity of Ultimate spider-man isn't known to the general public but it is known by his girlfriend (Mary Jane), an orphan adopted by his aunt (Gwen Stacy), the Ultimates, all of major villains spidey has met (Doc Ock, Green Goblin, Kraven the Hunter, Sandman & Electro) as well as most of the staff of S.H.I.E.L.D. 

    This has also spread to the regular Marvel universe, most noticeably with Daredevil. His secret identity was known by the Kingpin for a long time and eventually was an open secret to most of the Kingpin's criminal organization. In recent issues, Daredevil has been outed as Matt Murdock in the tabloids and has to deal with assassination attempts on him in his regular life as well as when he is Daredevil.

    DC Comics is also playing along somewhat with Batman. Although it isn't common knowledge that Batman is Bruce Wayne there are now so many heroes (the entire Justice League, Robin, Nightwing, Spoiler, Batgirl, Huntress, Oracle) and villains (the Riddler, Hush, Bane, Ra's al Ghul) that it might as well be public.

    I suspect that one of the reasons for this trend is a point that the character Bill makes in Kill Bill vol.2 towards the end of the movie. He points out that most super heroes are regular people with regular lives that have a secret identity as a super hero while Superman was actually a super hero who had a secret identity as a regular person. Getting rid of the artificial division between super hero and alter ego makes sense because we tend to look at them as different people (Bruce Wayne is nothing like Batman) when in truth they are different facets of the same character. The increased connectedness of society as a whole has also made it easier to blur the lines between various aspects of one's character that used to be kept separate. I think comic book authors are just reflecting this trend.

    Speaking of reflecting current trends in comics I was recently disappointed and then impressed by statements made the Ultimate version of Captain America. In Ultimates #12, Cap is fighting the apparently indestructible leader of the alien invasion army who's just survived getting half his head blown off by an assault rifle when this exchange takes place

    Alien Leader: Now let's get back to business, eh, Captain? The world was about to go up and you were about to surrender in these few brief moments we've got left. Let me hear you say it. “I surrender Herr Kleiser! Make it quick!”.

    Captain America: *head butts and then starts to beat up the alien leader while saying* - Surrender? Surrender? You think the letter A on my head stands for France?

    This issue came out when the "freedom fries" nonsense was still somewhat fresh in people's minds and I was very disappointed to read this in a comic book coming from a character I liked. However he recently redeemed himself with his line from a conversation with Nick Fury in Ultimate Six #7

    Captain America: You know, being a veteran of war it occured to me, that really it's men of influence and power that decide what these wars will be about. They decide who we are going to fight and how we will fight them. And then they go about planning the fight. In a sense, really, these people will the war into existence.

    I remember thinking the same thoughts as a preteen in military school trying to decide whether to follow in my dad's footsteps and join the military or not. I fucking love comics.


    Categories: Ramblings

    Ryan Farley has a blog post entitled In Search of the Perfect RSS Reader where he compares a number of the most popular desktop aggregators for Windows in search of the best of breed application. Ryan compared RSS Bandit, SharpReader, Newsgator, FeedDemon, SauceReader and Gush. The application that Ryan rated as the best was RSS Bandit. He writes

    RSSBandit has the best of everything. One of the things that I was wanting in an aggregator was support for the CommentAPI so I could read and post comments from it. RSSBandit has a nice interface and has a really clean and professional look to it. I like nice looking software. For me, that was one of the biggest things in the favor of RSSBandit. I love the “auto-discover” feeds, where you can scan a given URI for feeds. Search folders and some cool searching features. Written in .NET (I love to support the cause). Also, when a post is updated, it just updates the content of the post (seems to pull it each time you view it instead of caching it?). I like that it does not pull down a second copy of the post, however I do wish it would somehow indicate that the contents of the post has changed. The only gripes I had about RSSBandit are very small (and they're not really gripes, just small things I'd change if it were mine). I hate the splash screen. It is ugly and does not match the rest of the clean and XP/Office/VS/etc look of the application. Also, I don't like the icon. The smiley-face with the eye-patch. Give me a break. I don't really care for silly looking software (at least since it is open source I can change that myself if I really want to). But overall, a completly awesome job Dare (and other sourceforge team members)

    Mad props go to Torsten and Phil Haack who contributed a great deal to the most recent release. We've tried to address a lot of user pain points in dealing with RSS Bandit and there's still a lot of stuff we can do to make information management with an aggregator even better.

    Thanks to all the users who gave us feedback and helped us improve the application. Expect more.


    Categories: RSS Bandit

    I just saw this in Wired and have been surprised that it hasn't been circulating around the blogs I read. It looks like the Mexican Air Force Captured a UFO Encounter on VideoTape. According to Wired

    Mexican Air Force pilots filmed 11 unidentified flying objects in the skies over southern Campeche state, a spokesman for Mexico's Defense Department confirmed Tuesday.A videotape made widely available to the news media on Tuesday shows the bright objects, some sharp points of light and others like large headlights, moving rapidly in what appears to be a late-evening sky.

    The lights were filmed on March 5 by pilots using infrared equipment. They appeared to be flying at an altitude of about 11,500 feet, and reportedly surrounded the jet as it conducted routine anti-drug trafficking vigilance in Campeche. Only three of the objects showed up on the plane's radar.

    That is pretty amazing. I haven't been able to locate the entire video online. That definitely would be interesting to watch.


    One of the problems I have at work is that I have 3 applications open for participating in online community. I have Outlook for email, RSS Bandit for blogs & news sites and Outlook Express for USENET newsgroups. My plan was to collapse this into two applications by adding the ability to read USENET newsgroups to RSS Bandit. However recently I discovered, via a post by Nick Bradbury that

    Hey, just noticed that the Google Groups 2 BETA offers Atom feeds for each group. To see feeds for a specific group, use this format:

    Here are a few examples:

    So I'm now at a cross roads. On the one hand I could abandon my plans for implementing USENET support in RSS Bandit but there is functionality not exposed by Google Groups ATOM feeds currently. I can't view newsgroups that aren't public nor can I actually post to the newsgroups from the information in the feed currently.  I see two options

    1. Implement a mechanism for posting to newsgroup posts viewed via the Google Groups ATOM feeds from RSS Bandit. Don't worry about adding USENET support for the case of people who want to read newsgroups not indexed by Google.
    2. Implement the ability to view and post to any USENET newsgroups using RSS Bandit. This may also include figuring out how to deal with password protected newsgroups. 

    The Program Manager in me says to do (1) because it leverages the work Google has done, it requires minimal effort in me and doesn't require significantly complicating the RSS Bandit code base. The hacker in me says to do (2) since it provides maximum functionality and would be fun to code as well. What do RSS Bandit users think?

    PS: Does anyone know how Google plans to deal with the deployment nightmare of moving people off of the ATOM 0.3 syndication format? Given the recent political wranglings between the W3C and IETF over ATOM it is clear that we won't see a spec rubber stamped by a standards body this year. There'll be thousands of users who are subscribed to Blogger & Google Groups feeds who may be broken whenever Google moves to the finalized version of ATOM. Will Google abandon the feeds thus forcing thousands of people to either change their aggregator or upgrade? Will they support two potentially incompatible versions ATOM? 


    Categories: RSS Bandit

    Charles Cook has a blog posting on XML and Performance where he writes

    XML-based Web Services look great in theory but I had one nagging thought last week while on the WSA course: what about performance? From my experience with VoiceXML over the last year it is obvious that processing XML can soak up a lot of CPU and I was therefore interested to see this blog post by Jon Udell in which he describes how Groove had problems with XML:

    Sayonara, top-to-bottom XML I don't believe that I pay a performance penalty for using XML, and depending on how you use XML, you may not believe that you do either. But don't tell that to Jack Ozzie. The original architectural pillars of Groove were COM, for software extensibility, and XML, for data extensibility. In V3 the internal XML datastore switches over to a binary record-oriented database.

    You can't argue with results: after beating his brains out for a couple of years, Jack can finally point to a noticeable speedup in an app that has historically struggled even on modern hardware. The downside? Debugging. It was great to be able to look at an internal Groove transaction and simply be able to read it, Jack says, and now he can't. Hey, you've got to break some eggs to make an omelette.

    Is a binary representation of the XML Infoset a useful way of improving performance when handling XML? Would it make a big enough difference?

    For the specific case of Groove I'd be surprised if they used a binary representation of the XML infoset as opposed to a binary representation of their application object model. Lots of applications that utilize XML for data storage or configuration data immediately populate this data into application objects. This is a layer of unnecessary processing since one could skip the XML reading and writing step and directly read and write serialized binary objects. If performance is that important to your application and there are no interoperability requirements it is a better choice to serialize binary objects instead of going through the overhead of XML serialization/deserialization. The main benefit of using XML in such scenarios is that in many cases there is existing infrastructure for working with XML such as parsers, XML serialization toolkits and configuration handlers. If your performance requirements are so high that the overhead of going from XML to application objects is too high then getting rid of the step in the middle is a wise decision. Although as pointed out by Jon Udell you loose the ease of debugging that comes with using a text based format.

    If you are considering using XML in your applications always take the XML Litmus Test


    Categories: XML

    The XML team at Microsoft has recently started getting questions about our position on XQuery 1.0, XPath 2.0 and XSLT 2.0. My boss, Mark Fussell, posted about why we have decided to implement XQuery 1.0 but not XSLT 2.0 in the next version of the .NET Framework. Some people misinterpreted his post to mean that the we chose to implement XQuery 1.0 over XSLT 2.0 because we prefer the syntax of the former over that of the latter. However decisions of such scale aren't made that lightly.

    There are several reasons why we aren't implementing XSLT 2.0 and XPath 2.0

    It takes a lot of effort and resources to implement all 3 technologies (XQuery, XSLT 2.0 & XPath 2.0). Our guiding principle was that we believe creating a proliferation of XML query technologies is confusing to end users. We'd rather implement one more language that we push people to learn than have to support and explain three more XML query and transformation languages, in addition to XPath 1.0 & XSLT 1.0 which already exist in the .NET Framework. Having our customers and support people have to deal with the complexity of 3 sophisticated XML query languages two of which are look similar but behave quite differently in the case of XPath 2.0 and XQuery seemed to us not to be that beneficial. 

    XPath 2.0 has different semantics from XQuery, XQuery is strongly and statically typed while XPath 2.0 is weakly and dynamically typed. So it isn't simply a case that if you implementing XQuery means that you can simply flip some flag and disable a feature or two to turn it into an XPath 2.0 implementation. However all of the use cases satisfied by XPath 2.0 can be satisfied by XQuery. In the decision to go with XQuery over XSLT 2.0, Mark is right that we felt that developers would prefer the familiar procedural model and syntax of XQuery to the template based model and syntax of XSLT 2.0. Most developers working with XSLT try to use it as a procedural language anyway, and don't really harness the power of templates. There's always the steep learning curve until you get to the “Aha“ moment and everything clicks. XQuery with its FLWOR construct and user defined functions fits more naturally with how both programmers and database administrators access and manipulate data than does XSLT 2.0. Thus we feel XQuery and not XSLT is the future of XML based query and transformation. 

    This doesn't mean that we will be removing XSLT 1.0 or XPath 1.0 support from the .NET Framework. It just means that our innovation and development efforts will be focused around XQuery going forward. 


    Categories: Life in the B0rg Cube | XML

    I'm sure many have already seen the Google blogs. Below are a few suggestions to the various Google folks suggesting ways they can improve the blogging experience for themselves and their readers

    1. The blogs currently don't have names attached to each post nor do they have a way to post comments in response to each entry. The power of blogs is that they allow you to have a conversation with your users. An anonymous weblog that doesn't provide a mechanism for readers to provide feedback is little more than a corporate PR website. Currently I don't see much difference between the Google Press Center and Google Blog besides the fact that the Press Center actually has the email addresses of real people at Google on the page while the blogs do not.

    2. Pick better URLs for your articles. An example of a bad URL is which provides a link to Google's explanation of the Jew Watch fiasco. Why is this URL bad? Well do you think this is the only explanation Google will ever give? Does the URI provide any hint to what the content is actually about? I suggest the folks in charge of managing the Google URI namespace take a gander at the W3C's Choose URIs wisely and Tim Berners-Lee's excellent Cool URIs Don't Change

    3. Either explain to your bloggers that they should use their common sense when blogging or come up with a policy where blog posts are reviewed before being posted. Google is about to become a multibillion dollar company whose every public utterance will be watched by thousands of customers and hundreds of journalists, it can't afford PR gaffes like the one's described in C|Net's article Google blog somewhat less than 'bloggy'.

    4. Provide an RSS feed. I understand that Evan and the rest of Blogger have had their beefs with Dave Winer but this is getting ridiculous. Dave Winer has publicly flamed me on more than one occassion but I don't think that means I shouldn't use RSS on the MSDN XML Developer Center or remove support for it from RSS Bandit. If an evil Microsoft employee can turn the other cheek and rise above holding grudges, I don't see why Google employees whose company motto is “Do No Evil” can't do the same.

    5. Let us know if working at Google is really as cool as we all think it is. :)


    Categories: Ramblings

    May 11, 2004
    @ 06:19 PM

    Jon Udell recently wrote

    In a recent column on how we use and abuse email, I mentioned the idea of passing attachments "by reference" rather than "by value." Unfortunately I overlooked a product recently reviewed by InfoWorld that does exactly that. The Xythos WebFile Server has a companion WebFile Client that hooks File Attach (in Notes and Outlook) and replaces attachments with secure links to an access-controlled and versioned instance of the document. Cool!

    The $50K price tag, as our reviewer noted, "may keep smaller companies away." But other implementations of the idea are clearly possible.

    SharePoint provides the ability to provide shared workspaces for documents and even integrates nicely with the rest of Microsoft Office such as Word. I know quite a few people who've gone from sending documents back and forth over email to sending links to documents in shared workspaces. Of course, old habits die hard and even at Microsoft lots of people tend to send documents in email as attachments.  


    May 11, 2004
    @ 03:21 PM

    In Steve Burnap's blog post entitled Infamy he writes

    I've been opposed to the Iraq war since the beginning, but until this, I didn't feel disgust.
    And that makes me sad. There are only villians here. Villians who have made the US no better than Saddam Hussein. We got into this war with lies, but even if there were no WMDs and no Al Qaeda connection, we could say "well, at lest we got rid of a tyrant". Now all we can say is that we've replaced a tyrant.

    I share the exact same sentiments. The Iraq War has hit every worst case scenario I could have come up with; no weapons of mass destruction, no concrete ties to Al Qaeda shown, militant resistance from members of the local populace, and US troops actively violating the Geneva Convention. The funny thing is that polls show that 44 per cent of people polled in the US think the war is worthwhile.  Sad.


    The folks behind the InfoPath team blog have posted a short series on how to programmatically modify InfoPath form templates. The example in the series shows how to change the URL of a XML Web Service end point used by the InfoPath form using a script instead of having to do it manually by launching InfoPath. The posts in the series are linked below

    1. Modifying InfoPath manifest.xsf file from script (1/5)
    2. Modifying InfoPath manifest.xsf file from script (2/5)
    3. Modifying InfoPath manifest.xsf file from script (3/5)
    4. Modifying InfoPath manifest.xsf file from script (4/5)
    5. Modifying InfoPath manifest.xsf file from script (5/5) 

    The posts highlight that the InfoPath XSN format is really a CAB file, and the files that make up the template are XML files which can easily be modified programmatically.


    Categories: XML

    I recently submitted the Design Guidelines for Exposing XML Data as part of the WinFX design guidelines. You can read the guidelines in Krzysztof Cwalina's weblog. These are also the application guidelines that developers working with XML should follow for working with XML in the Whidbey timeframe. I'll be working with the FxCop team to get some rules written to check for compliance to these guidelines in the Whidbey base class library over the next few weeks.


    Categories: Life in the B0rg Cube | XML

    I recently stumbled on blog posting by Phil Ringnalda called a little chip in the concept where he notes

    Still, I was a bit surprised when Xiven linked to a post to the validator mailing list, pointing out that the utterly wrong HTML <a href=""><b><a href=""></a></b></a>, which is reported as invalid in HTML, is ignored in XHTML. Nesting links is one of those basic, there's absolutely no way you can ever do this, things, but in XHTML if you put a nested link inside an inline element, the validator won't catch it. According to Hixie's answer, it's because the validator uses an XML DTD for XHTML, and an SGML DTD for HTML, and while you can say that a/b/a is wrong in an SGML DTD, you can't in an XML DTD. As he puts it, in XHTML it's XML-valid but non-compliant.

    Phil has just stumbled on just one of many limitations of XML schema languages. At first, when people see an XML schema language they expect that they will be able to use it to declaratively describe all the rules of their vocabulary. However this is rarely the case, every XML schema language has limitations in the constraints it can express. For example, W3C XML Schema can't express constraints such as a choice between attributes (either an uptime or downtime attribute appears on an element), DTDs can't express constraints on the range a text value can be (must be an integer between 5 and 10), RELAX NG can't express identity constraints on numeric values (e.g. each book in the inventory must have a unique ISBN) , and so on.

    This means that developers using an XML schema language should be very careful when designing XML applications or XML vocabularies about what rules they can validate when they receive an input document. In some cases, the checks performed by schema validation may be so limited for a vocabulary that it is better to check the constraints using custom code or at the very least augment schema validation with some custom checks as well.

    The fact is that many XML vocabularies are complex enough that their constraints aren't easily be expressible using a conventional XML schema language. XML vocabulary designers and developers of XML applications should always be on the look out for such cases else incorrect decisions be made in choosing a validation framework for incoming XML documents.


    Categories: XML

    Based on some feedback from yesterday, the RSS Bandit installer has been refreshed and the following fixes made.

    • FEATURE: RSS Bandit now comes with a documentation helpfile so one doesn't need to go online to access the user documentation.
    • FIXED: During synchronization a feedlist may become corrupted and fail to load with the following error 'Keyref fails to refer some key'.
    • FIXED: When launched on a Brazilian Porteguese configured computer some text in dialog boxes is in English.
    • FIXED: If installing over an old version of RSS Bandit, existing search folders are replaced with the 'Unread Items' search folder.

    If you downloaded it yesterday you should still get the latest installer because a few of the bugs fixed may affect you.  


    Categories: RSS Bandit

    Download the installer from here. Differences between v1.2.0.90 and v1.2.0.112 below

    • FEATURE: Support for Atom 0.3 syndication format
    • FEATURE: Instances of RSS Bandit can now be synchronized using WebDAV, FTP or a file share. The data is transfered in a ZIP file containing information about the current state of your search folders, flagged items, replied items, subscribed feeds and read/unread messages. This is extremely useful for people who use RSS Bandit on different computers like from home and work or from school and home.
    • FEATURE: Ability to configure appearance of MSN Messenger style popup windows in the system tray when new items show up on a per feed basis. Also the ability to turn this feature on and off now available from the context menu in the system tray.
    • FEATURE: Browser address bar has autocomplete functionality
    • FEATURE: RSS Bandit now translated to the following languages; English, German, Russian, Simplified Chinese, Brazilian Portuguese and Polish. If the language configured on your machine is one of these then all text should appear in your target language.
    • FEATURE: Online documentation now provided and is available from Help menu. This is a work in progress but all the information for figuring out the basics of using RSS Bandit is available now.
    • FEATURE: Logos and other distinguishing images from websites that provide them in their feed now displayed in the details pane.
    • FEATURE: Feeds imported from an OPML file can now be imported into a specific category.
    • FEATURE: Export to OPML now contains htmlUrl and description attributes in elementsfor each feed.
    • FEATURE: Custom stylesheets for rendering feeds can now reference external .css or image file(s)
    • FEATURE: Added one more commandline option '-c[ulture]:', e.g. '-culture:"ru-RU"' forces to start the russian UI.
    • FIXED: If executing/downloading ActiveX controls is disabled, warning message(s) no longer displayed about sites with ActiveX controls.
    • FIXED: Speed improvements in behavior of [Next Unread Item] button.
    • FIXED: Alert balloon in system tray pops up too frequently.
    • FIXED: UI hangs temporarily if a tree node is clicked for a feed that hasn't been downloaded before.
    • FIXED: Clicking on links with mailto: or news: URL schemes opened a new browser tab.
    • FIXED: Certain feed URLs not being updated if they had mixed case (e.g. or were malformed (e.g. http:/
    • FIXED: Issue with screen font resolutions other than 96dpi.
    • FIXED: Exception on accessing feeds secured with windows authentication. Now the default credentials are correctly used.
    • FIXED: Closing a web browser tab on slow sites could cause hundrets of new browser tabs opened, or exception.
    • FIXED: startup issue on Windows 98/ME systems
    • FIXED: DTDs not used for resolving entities in RSS feeds. This mostly affected RSS 0.91 feeds
    • FIXED: Memory footprint reduced (5 to 15 percent), some speed improvements
    • FIXED: Various issues with deleting search folders.


    Categories: RSS Bandit

    Every once in a while I see a developer of a news aggregator that decides to add a 'feature' that unnecessarily chomps down the bandwidth of a web server in a manner one could classify as rude. The first I remember was Syndirella which had a feature that allowed you to syndicate an HTML page then specify regular expressions for what parts of the feed you wanted it to treat as titles and content. There are three reasons I consider this rude,

    • If a site hasn't put up an RSS feed it may be because they don't want to deal with the bandwidth costs of clients repeatedly hitting their sites on behalf of a few users
    • An HTML page is often larger than the corresponding RSS feed. The Slashdot RSS feed is about 2K while just the raw HTML of the front page of slashdot is about 40K
    • An HTML page could change a lot more often than the RSS feed [e.g. rotating ads, trackback links in blogs, etc] in situations where an RSS feed would not 

    For these reasons I tend to think that the riught thing to do if a site doesn't support RSS is to send them a request that they do highlighting its benefits instead of eating up their bandwidth.

    The second instance I've seen of what I'd call rude bandwidth behavior is a feature of NewsMonster that Mark Pilgrim complained about last year where every time it finds a new RSS item in your feed, it will automatically download the linked HTML page (as specified in the RSS item's link element), along with all relevant stylesheets, Javascript files, and images. Considering that the user may never click through to web site from the RSS view this is potentially hundreds of unnecessary files being downloaded by the aggregator a day. This is not an exaggeration, I'm subscribed to a hundred feeds in my aggregator and there are is an average of two posts a day to each feed so downloading the accompanying content and images is literally hundreds of files in addition to the RSS feeds being downloaded.

    The newest instance of unnecessary bandwidth hogging behavior I've seen from a news aggregator was pointed out by Phil Ringnalda's comments about excessive hits from NewsCrazwler which I'd also seen in my referrer logs and had been puzzled about. According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.

    Someone really needs to set up an aggregator hall of shame.


    Categories: Technology

    In response to a post Greg Robinson about cancelling his MSDN subscriptions due to excessive focus on unreleased technologies by Robert Scoble writes

     MSDN, by the way, is like a PDC. It focuses on newer stuff and is produced by Microsoft. The other magazines are not tasked with covering the new stuff as much.

    This assumption is wrong on so many levels I don't know where to start. It is telling that MSDN has strayed from being where developers get up to speed on existing Microsoft technologies to being a primarily a monger of vaporware. If MSDN is to be likened to a conference it should be TechEd (focus on existing technologies and best practices with a smattering of sneak peaks at the future) not PDC (focus on technology that won't be released for 2 to 4 years or that may not make it past the chopping block).  I've pinged various content strategists at MSDN and Sara Williams about this in the past with the responses being that most of the content is still about existing technologies.

    I suspect that what people resent is the fact that the releases are so far off (Whidbey is 1 year away, Longhorn is 2 years away) and there aren't any publiclly generally available releases that are easy to get into which makes the frustration at articles about MSBuild, Avalon, Indigo, etc more difficult to stomach. It's one thing to see articles about beta technology you can download easily off the 'net and another to see articles about technology that is still in alpha which isn't publiclly available.  I don't intend for the MSDN XML Developer Center to go this route and so far have tended to focus the content appearing on the site about working with existing technologies.


    Categories: Life in the B0rg Cube

    May 2, 2004
    @ 10:07 PM

    I just finished fixing some bugs in the synchronization code in RSS Bandit and now the feature should work as expected. All I can say is, "Wow". I've just taken it for granted that I can open an instance of my mail client on different machines and get the same state but the same isn't the case for my news aggregator. Being able to click 'Download Feeds' on startup and have everything I read/flagged/replied to at home or at work synced up is totally sweet.

    The only thing preventing a release now is that I'd like Torsten and I to come up with a way to improve the responsiveness of the GUI when loading feeds with thousands of items. On my machine where I have 3 months of posts from blogs like Weblogs @ ASP.NET it definitely seems like the user interface is creaking and groaning. This shouldn't take more than a few days so we should be on track for a release this week.


    Categories: RSS Bandit

    Ted Neward writes

    So if I need a custom build task to do ASP.NET deployment, do I build a NAnt task knowing its lifetime is probably scoped to this Whidbey beta cycle? Or do I build a MSBuild task knowing that it'll probably be two years before it's really useful, and that it's entirely likely that I'll have to rewrite it at least once or twice in the meantime? Recommendation: Don't try to "build a better tool" originating out of the open-source community; choose instead to support the community by adopting it into the IDE or toolchain and commit resources to develop on it if it doesn't do what you want.

    I tend to disagree with Ted. The big problem that Microsoft faces in this space is that developers tend to want all their tools to come from a single vendor. Ted shows this mentality himself by implying that even though an existing 3rd party tool to perform the tasks he wants exists (Nant) he'd rather wait for vaporware from Microsoft (MSBuild). Here I use the term vaporware to describe any product that has been announced and demoed but not released.

    This problem means that Microsoft is constantly being hit with a barrage of requests by developers to implement technologies when existing 3rd party products can satisfy the needs of our developers. For example, our team constantly gets requests from developers who'd like the equivalent of a full fledged XML development environments like XML Spy in Visual Studio.

    My personal opinion is that Microsoft should build what it thinks the essentials are for developers into their development tools and the .NET Framework then provide extensibility hooks for others to plugin and extend the existing functionality as needed. Contrary to the tone of Ted's post the Visual Studio team isn't averse to shipping third party plugins in the box as witnessed by its inclusion of Dotfuscator Community Edition with Visual Studio.NET 2003. Ironically, the first review of Visual Studio.NET I could find  that mentions Dotfuscator contains this text in the conclusion

     I was kind of hoping that Microsoft was developing an obfuscator in-house that would be completely integrated into Visual Studio .NET

    Yet another request for Microsoft to produce a tool when third party tools exist and one is even included in the box. Even if the Visual Studio team considered Nant another third party tool they'd want to ship in this manner I'm sure the fact that it is licensed under the GNU Public Licence (GPL) would give them cause for pause.

    The other reason I disagree with Ted is that I believe in competition. I don't think the existence of WinAmp and Quicktime should mean Windows Media player shouldn't exist, the existence of Borland's IDEs mean Visual Studio shouldn't exist or the existence of Yahoo! Mail means Hotmail shouldn't exist. I definitely don't think that the existence of a developer tool or API being produced by Microsoft or a third party precludes anyone else from developing one. The mentality that only one tool or one technology should exist in a given space is what is stifling to competition and community building not the fact that Microsoft develops some technology that overlaps with something similar some other entity is building.


    Categories: Life in the B0rg Cube