Bryan Kam has reviewed a couple of free RSS aggregators for Windows. Below are excerpts of his reviews including his final choice  

I began with the aptly-named and small FeedReader 2.5. While it has all the basic features covered, it lacks a lot of things I like...Score: 3/10. Fast but featureless.

Next I tried Sharpreader 0.9.4.1. This is a pretty good one, which features different sorting, various update times, alerts, inherited properties, can import/export OPML...It would take 40+ MB RAM on my desktop computer, and sometimes would take 100% cycles for no reason...Score: 5/10. Full of features, but slow as hell!

Another one I tried a while back was Syndirella 0.9b. While I was not a big fan of the Windows 3.1-esque interface, it does have a rudimentary scraper... This is great for sites that don't offer feeds. Other than that, though, this reader is pretty lacking, not even having categories which are a necessity in my opinion. Score: 5/10. Nice scraper, the rest kinda sucks.

Currently I'm using Abilon 2.0, which has many of the features I like...The interface is divided into three vertical columns: the far left is the list of feeds, the middle is the items in the selected feed, the right is the detail for the selected item. I find this very weird. Score: 7/10. It's got the goods, it's small, but it's not fun to use.

Okay, another brief RSS reader review. This one is called RSS Bandit and I've discarded Abilon in favor of it...Feature-wise it's pretty standard. The little slide-up alerts, which many of these readers have, is actually reliably click-able in this program...Another good feature is its "Locate RSS" feeds which attempts to find a feed for whatever websites or keywords you enter.8/10. Decent, but lacks that extra something.

It's good to read first hand accounts of what people like or dislike about RSS Bandit especially when compared to other RSS aggregators. I tend to agree with Bryan that RSS Bandit currently leads the pack amongst the major free RSS aggregators for Windows. The next release will aim at being competitive with commercial aggregators such as FeedDemon and NewzCrawler.

This should be a fun summer.  


 

Mark Pilgrim has a blog post entitled how to make a linkblog in Atom which shows one technique for syndicating a list of links in an Atom feed.  Unfortunately there is one problem with Mark's article, the technique it recommends violates the ATOM 0.3 specification and generates an invalid feed.

There are two problem sections in Mark's article. In the first How to link to an article he writes

But what about the super-fascinating thing we're actually linking to? That goes in its own element.

<link rel="related" type="text/html"
     href="http://home.introweb.nl/~dodger/itunesserver.html"
     title="Setting up an iTunes server in FreeBSD"/>

and in the section entitled How to credit people whose links you republish he writes

Simply put, a "via" link is a link back to where you found the link you're posting. In this example, I discovered the article on setting up a FreeBSD iTunes server via Jeffrey Veen, so let's give him some credit:

<link rel="via" type="text/html" href="http://www.veen.com/jeff/archives/000545.html" title="Jeffrey Veen"/>

The problem with both sections is that Mark uses values for the rel attribute that are not considered valid by the Atom 0.3 specification. In Section 3.4.1 of the Atom specification it states

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification http://bitworking.org/projects/atom/draft-gregorio-09.html.

On navigating to the provided URL and reading Section 5.4.1 of the Atom specification which defines the valid values of the rel attribute of the link element there is the following list

5.4.1  rel

This attribute describes the relationship from the current document, be it HTML or Atom, to the anchor specified by the href attribute. The value of this attribute is a space-separated list of link types. Note that these values are case insensitive. With type="application/x.atom+xml" we have the following interpretations of the relations.

alternate
The URI in the href attribute points to an alternate representation of the containing resource.
start
The Atom feed at the URI supplied in the href attribute contains the first feed in a linear sequence of entries.
next
The Atom feed at the URI supplied in the href attribute contains the next N entries in a linear sequence of entries.
prev
The Atom feed at the URI supplied in the href attribute contains the previous N entries in a linear sequence of entries.
service.edit
The URI given in the href attribute is used to edit a representation of the referred resource.
service.post
The URI in the href attribute is used to create new resources.
service.feed
The URI given in the href attribute is a starting point for navigating content and services.

As can be seen neither related nor via which are used in Mark's article are in the above list. I had expected the Feed Validator written by Mark Pilgrim and Sam Ruby to flag this error but currently when one validates Mark's b-links feed it validates as Valid Atom. I have filed a filed bug# 963354 in the Feed Validator's Bug Database about this issue. Hopefully this error will be resolved soon.

On a final note, it is bad enough that we are going to have to deal with two versions of Atom in the wild (Atom 0.3 and whatever comes out of the standards process) it would be unfortunate to further fragment this by deploying intermediate versions of the format based on mailing list discussions. One of the benefits of Atom is supposed to be that it will usher in an era of rigorously defined specifications in the syndication space, that won't be worth much if people ignore the specifications and go their own way.


 

Yesterday I went to the Apple store in the Bellevue mall to replace the headphones on my iPod which had begun to fray. When I walked up to the counter and told the girl there what I wanted she ushered me to a customer service desk claiming that if my iPod was under warranty I could get the headphones replaced for free. I was highly skeptical of this since I didn't buy the iPod at the Apple Store but at Best Buy and didn't even have my receipt anyway.

Waiting at the customer service desk I got to soak in some of the ambiance of Apple Store. It is definitely a cool place, I liked the flat screen TV over the customer service desk with quotes from luminaries across history such as

  • Plato is my friend, Aristotle is my friend but my best friend is truth - Sir Isaac Newton
  • We must be the change we wish to see in the world - Mahatma Ghandi

When it was finally my turn, my name was displayed on the flat screen TV above the customer support desk and I walked up to be served. I told the guy behind the desk that I needed some new headphones and the girl behind the counter had directed me to him to see if I could get them replaced by the warranty. I explained that I thought this would be unlikely given that I bought the iPod at Best Buy not the Apple Store and didn't have a receipt. To which he replied “It's an Apple product right? I'll just check the serial number”. To my surprise he did just that and I walked out of there with brand new head phones. To cap the experience he also fixed some weird issues I'd been having with my iPod by pointing me to the recent iPod firmware update.

That's what I call fantastic customer service. I felt so good about Apple afterwards I felt like going back to the store and buying some Apple stuff but there's nothing I need right now.  


 

Categories: Ramblings

May 28, 2004
@ 06:52 PM

C.J. Date, one of the most influential names in the relational database world, had some harsh words about XML's encroachment into the world of relational databases in a recent article entitled Date defends relational model  that appeared on SearchDatabases.com. Key parts of the article are excerpted below

Date reserved his harshest criticism for the competition, namely object-oriented and XML-based DBMSs. Calling them "the latest fashions in the computer world," Date said he rejects the argument that relational DBMSs are yesterday's news. Fans of object-oriented database systems "see flaws in the relational model because they don't fully understand it," he said.

Date also said that XML enthusiasts have gone overboard.

"XML was invented to solve the problem of data interchange, but having solved that, they now want to take over the world," he said. "With XML, it's like we forget what we are supposed to be doing, and focus instead on how to do it."

Craig S. Mullins, the director of technology planning at BMC Software and a SearchDatabase.com expert, shares Date's opinion of XML. It can be worthwhile, Mullins said, as long as XML is only used as a method of taking data and putting it into a DBMS. But Mullins cautioned that XML data that is stored in relational DBMSs as whole documents will be useless if the data needs to be queried, and he stressed Date's point that XML is not a real data model.

Craig Mullins points are more straightforward to answer since his comments don't jibe with the current state of the art in the XML world. He states that you can't query XML documents stored in databases but this is untrue. Almost three years ago, I was writing articles about querying XML documents stored in relational databases. Storing XML in a relational database doesn't mean it has to be stored in as an opaque binary BLOB or as a big, bunch of text which cannot effectively be queried. The next version of SQL Server will have extensive capabilities for querying XML data in relational database and doing joins across relational and XML data, a lot of this functionality is described in the article on XML Support in SQL Server 2005. As for XML not having a data model, I beg to differ. There is a data model for XML that many applications and people adhere to, often without realizing that they are doing so. This data model is the XPath 1.0 data model, which is being updated to handled typed data as the XQuery and XPath 2.0 data model.

Now to tackle the meat of C.J. Date's criticisms which is that XML solves the problem of data interchange but now is showing up in the database. The thing first point I'd like point out is that there are two broad usage patterns of XML, it  is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The latter type of data will only grow now that office productivity software like Microsoft Office have enabled users to save their documents as XML instead of proprietary binary formats. In many cases, these documents cannot simply shredded into relational tables. Sure you can shred an Excel spreadsheet written in spreadsheetML into relational tables but is the same really feasible for a Word document written in WordprocessingML? Many enterprises would rather have their important business data being stored and queried from a unified location instead of the current situation where some data is in document management systems, some hangs around as random files in people's folders while some sits in a database management system.

As for stating that critics of the relational model don't understand it, I disagree. One of the major benefits of using XML in relational databases is that it is a lot easier to deal with fluid schemas or data with sparse entries with XML. When the shape of the data tends to change or is not fixed the relational model is simply not designed to deal with this. Constantly changing your database schema is simply not feasible and there is no easy way to provide the extensibility of XML where one can say "after the X element, any element from any namespace can appear". How would one describe the capacity to store “any data” in a traditional relational database without resorting to an opaque blob?

I do tend to agree that some people are going overboard and trying to model their data hierarchically instead of relationally which experience has thought us is a bad idea. Recently on the XML-DEV mailing list entitled Designing XML to Support Information Evolution where Roger L. Costello described his travails trying to model his data which was being transferred as XML in a hierarchical manner. Micheal Champion accurately described the process Roger Costello went through as having "rediscovered the relational model". In a response to that thread I wrote "Hierarchical databases failed for a reason".

Using hierarchy as a primary way to model data is bad for at least the following reasons

  1. Hierarchies tend to encourage redundancy. Imagine I have a <Customer> element who has one or more <ShippingAddress> elements as children as well as one or more <Order> elements as children as well. Each order was shipped to an address, so if modelled hierarchically each <Order> element also will have a <ShippingAddress> element which leads to a lot of unnecessary duplication of data.
  2. In the real world, there are often multiple groups to which a piece of data belongs which often cannot be modelled with a single hierarchy.  
  3. Data is too tightly coupled. If I delete a <Customer> element, this means I've automatically deleted his entire order history since all the <Order> elements are children of <Customer>. Similarly if I query for a <Customer>, I end up getting all the <Order> information as well.

To put it simply, experience has taught the software world that the relational model is a better way to model data than the hierarchical model. Unfortunately, in the rush to embrace XML many a repreating the mistakes from decades ago in the new millenium.


 

Categories: XML

XML.com recently ran an article entitled Document-Centric .NET, that highlights the various technologies for working with XML that exist in the .NET Framework. The article provides a good high level overview of the various options you have for processing XML in the .NET Framework. The article includes an all important caveat which I wish more people knew about and which I keep wanting to write an article about but never get around to doing. The author writes 

However, keep in mind that there are W3C XML Schema features that are not directly compatible with .NET's XML-to-database and XML-to-object mapping tools.

This is very true. Besides our schema validation technologies, most Microsoft technologies or products that utilize W3C XML Schema support a subset of the language due to impedance mismatches between the language and the underlying data model or type system of the target environment.

In fact the only complaint I have about the article is a nitpick about its title. In XML circles, document-centric implies a usage of XML that isn't borne out by his article. If you are interested in the difference between data-centric XML and document-centric XML you should read my article Can One Size Fit All? in XML Journal. In that article I talk about the differences between XML that is used to represent both rigidly structured tabular data (e.g., relational data or serialized objects) and semi-structured data (e.g., office documents). The former is data-centric XML while the latter is document-centric.

 


 

Categories: Mindless Link Propagation | XML

I recently stumbled on an entry by Lucas Gonze where he complains about the RSS <enclosure> element. He writes

Problems with the enclosure element:

  • It causes users to download big files that they will never listen to or watch, creating pointless overload on web hosts.
  • It doesn't allow us to credit the MP3 host, so we can't satisfy the netiquette of always linking back.
  • For broadband users, MP3s are not big enough to need advance caching in the first place.
  • The required content-type attribute is a bad idea in the first place. Mime settings are already prone to breakage, adding an intermediary will just create another source of bugs. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.
  • The required content-length attribute should not be there. It requires people who link to MP3s to HEAD them and calculate the length, which is sometimes not practical. It makes variable-length MP3s illegal. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.

The primary problem with the <enclosure> element is that it is overspecified. Having an element that says, here is a pointer to some data that is related to this entry that is too large to fit in the feed is a good idea. Similarly providing a hint at what the MIME type is so the reader knows whether it can handle that MIME type or can display something specific to that media type in the user interface without making an additional request to the server is very useful. The description of the enclosure element in RSS 2.0 states

<enclosure> sub-element of <item> 

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

Syndication geeks might notice that this is akin to the <link> element in the ATOM 0.3 syndication format which is described as

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification <eref>http://bitworking.org/projects/atom/draft-gregorio-09.html</eref>.

3.4.2  "type" Attribute

The "type" attribute indicates an advisory media type; it MAY be used as a hint to determine the type of the representation which should be returned when the URI in the href attribute is dereferenced. Note that the type attribute does not override the actual media type returned with the representation.

Link constructs MUST have a type attribute, whose value MUST be a registered media type [RFC2045].

3.4.3  "href" Attribute

The "href" attribute contains the link's URI. Link constructs MUST have a href attribute, whose value MUST be a URI [RFC2396].

xml:base [W3C.REC-xmlbase-20010627] processing MUST be applied to the atom:url element.

3.4.4  "title" Attribute

The "title" attribute conveys human-readable information about the link. Link constructs MAY have a title attribute, whose value MUST be a string.

So the ideas behind the <enclosure> element were good enough that they appear in ATOM with some additional niceties and a troublesome bit (the length attribute) removed. So if the concepts behid the <enclosure> element are so good that they are first class members of the ATOM syndication format. Why does Lucas not like it? The big problem with RSS enclosures is how Dave Winer expected them to be used. An aggregator was supposed to act like a TiVo, automatically downloading files in the background and presenting them to you when it's done. The glaring problem with doing this is that it means lots of people are automatically downloading large files that they didn't request which is a significant waste of bandwidth. In fact, most aggregators either do not support enclosures or simply show them as links which is what FeedDemon and RSS Bandit (with the Outlook 2K3 skin) do. The funny thing is that the actual RSS specification doesn't describe this behavior, instead this behavior is implied by Dave Winer's descriptions of use cases.

Lucas also complains about the required length attribute which is problematic if you are pointing to a file on a server you don't own because you have to first download the file or perform a HTTP HEAD to get its size. The average blogger isn't going to go through that kind of trouble. Although tools could help it makes sense for the  length attribute  to have been an optional hint.

I have to disagree with Lucas's complaints about putting the MIME type in the <enclosure> element. He complains that the MIME type in the <enclosure> could be wrong and in fact that in many cases web servers  serve a file with the wrong MIME type. Thus he concludes that putting the MIME type in the enclosure is wrong. Client software should be able, to decide how to react to the enclosure [e.g. if it is audio/mpeg display a play button] without having to make additional HTTP requests especially since as Lucas points out it is not a 100% guaranteed that performing an HTTP HEAD of the linked file will actually get you the correct MIME type from the web server.

In conclusion, I agree that the <enclosure> element is problematic but most of the problems are due to the implied use case suggested by the spec author, Dave Winer, as opposed to the actual information provided by the element. The ATOM approach of describing the information provided by each element in a feed but not explicitly describing the expected behavior of clients is a welcome approach. Of course, there will always be developers who require structure or take an absence of explicit guidelines to mean do stupid things (like aggregators that fetch your feed every 5 minutes)  but these are probably better handled in "Best Practices" style documents or test suites than in the actual specification.


 

Categories: XML

May 26, 2004
@ 05:22 PM

One of the hardest problems in software development is how to version software and data formats. One of the biggest problems for Windows for years has been DLL Hell which is a versioning problem. One of the big issues I have to deal with at work is how to deal with versioning issues when adding or removing functionality from classes.

For a few weeks, I've been planning to write up some guidelines and concerns for versioning XML formats based on my experiences and those of others at Microsoft. I've got some folks on the XML Web Services team interested in riding shotgun such as Gudge and Doug. It also looks like Edd Dumbill is interested in the abstract for the article, so it with any luck it should end up on XML.com when it is done.

I was reminded of the importance of writing this article when I saw a post on the atom-syntax list by Google's Steve Jensen which implied that it just occured to the folks at Google that they'd have to support multiple versions of ATOM. This is excarberated by the fact that they are deploying implementations based on draft specs. Like I said before, never ascribe to malice that which can be explained by incompetence


 

Categories: XML

When I first got to Microsoft a few years ago, there was an acknowledgement from upper management that Microsoft technologies tended not to attract the vibrant sense of community that existed in Java or Open Source communities. This began a push, first in the Developer Division which soon spread across the company for Microsoft employees to become more involved and help nurture the developer communities surrounding our technologies and products. Two or three years later, I am heartened to read posts such as this entry from Norman Alex Rupp from his first day at TechEd 2004, note that I have italicized some of the text from the entry to emphasize some key points I found interesting

The User Group Summit was headed up by the International .NET Association (INETA). From what I can tell, INETA User Groups are analogous to the Java User Groups. They're an independent organization, and their founder goes to great lengths to maintain a comfortable operating distance from Microsoft's PR machine, while simultaneously being careful not to alienate them. It strikes me that the INETA groups highly value their independence and don't want to come across as a Microsoft vendorfest to their members. They focus on C# development topics and although they thankfully accept Microsoft's sponsorship, they do maintain a good degree of independence. That's a difficult balance to strike.

What really fascinated me about the UG Leaders Summit was that the .NET Group Leaders from around the country knew each other, had their own community structure, and genuinely seemed to enjoy being around each other. These guys were rowdy. They were having a good time. And it wasn't just because we each got a 30 oz bottle of Tequila at the end of the meeting. People were really positive and nice. This was a slight cultural change for me, because all too often I find the Open Source Java community to be extremely high strung and competitive--sometimes to the point of being vicious. I like to think of the dynamic of our community as an extreme form of tough love. I haven't worked a lot with the Java User Group communities from around the country, and I have an inkling that things are a bit different in those circles than they are in the Jakarta / JBoss / TSS / Bile Blog OS Javasphere that used to form my only umbilical link to our community. (For the record, I don't think this "tough love" culture extends into the Java.net community--the folks from Sun's "shining city on the hill" are pretty amiable).

It was just a different vibe--not necessarily better, just different. I can see more of that in the future of the Javasphere. We live in a pressure cooker, but as the language and platform mature and we continue to carve out our niche, gain credibility in the industry and grow as developers, I think we'll see less of the infighting and more of the cooperation typified by last year's OpenEJB / Geronimo alliance and by the general good will surrounding Java.net.

One surprising thing I learned at today's Summit is that in the last 3 years, INETA and Microsoft have built up a 200,000 to 250,000 member developer community, and they're continuing to push forward, doing everything they can to make sure that .NET technologies take off at the local community level. They're hyperactively heading up programs to develop high school and college students, and they recognize the long term importance of bringing fresh blood into the industry. They are investing time, software and significant amounts of money into their evangelism efforts.

Essentially, what INETA and Microsoft are trying to do is outgrok the ASF on community building. And from what I just saw, they're way ahead of the curve. In their words, "we're trying to get it. You can help us REALLY get it." And by "get it" I think they mean to figure out how to have a successful user community in every city and on every major college campus in the world. I'm speculating, but it's hard not to smell ambition this raw.

This is one of the reasons I like working at Microsoft. When the hive mind in the b0rg cube decides to do something, it goes all out and does it well. The challenge is figuring out what the right thing to do is and then convincing the right people. I am continually amazed at how much things have changed in the past few years with regards to the degree of openness and the higher level of interaction between product groups and their customers. 


 

Categories: Life in the B0rg Cube

I've mentioned in the past why I think XML 1.1 was a bad idea in my post XML 1.1: The W3C Gets It Wrong. It seems at least one W3C working group, the XML Protocols working group to be exact, has now realized why XML 1.1 is a bad idea a few months later. Mark Nottingham recently posted a message to the W3C Technical Architecture Group's mailing list entitled Deployment and Support of XML 1.1 where he writes

In the Working Group's view, this highlights a growing misalignment in
the XML architecture. Until the advent of XML 1.1, XML 1.0 was a single
point of constraint in the XML stack, with all of the benefits (e.g.,
interoperability, simplicity) that implies. Because XML 1.1 has
introduced variability where before there was only one choice, other
standards now need to explicitly identify what versions of XML they are
compatible with. This may lead to a chicken-and-egg problem; until
there is a complete stack of XML 1.1-capable standards available, it is
problematic to use it.

Furthermore, XML-based applications will likewise need to identify
their capabilities and constraints; unfortunately, there is no
consistent way to do this in the Web architecture (e.g., RFC3023 does
not provide a means of specifying XML versions in media types).

As I mentioned in my previous post about the topic, XML 1.1 hurts the interoperability story of XML which is one of the major reasons of using it in the first place. Unfortunately, the cat is already out of the bag, all we can do now is try to contain or avoid it without getting our eyes clawed out. I tend to agree with my coworker Michael Rys, the day XML 1.1 became a W3C recommendation was a day of mourning.


 

Categories: XML

This is mostly a bugfix release. Major features will show up in the next release scheduled for the end of the summer or later.

Download the installer from here. Differences between v1.2.0.112 and v1.2.0.114 below

  • FEATURE: Local search now supports boolean operators so one can write queries like "IBM | Microsoft & !Java" which means search for entries containing Microsoft or IBM but not Java. Queries can also be grouped with parenthesis such as "iPod & (iTunes | (Apple & !fruit))". Thanks to Brian Leonard for the patch.

  • FEATURE: Tree view and list view now support the scroll wheel on Microsoft Intellimouse. 

  • FIXED: "My Feeds" root node displays incorrect unread messages count after remote synchronization.

  • FIXED: Installed version doesn't support Windows XP themes.

  • FIXED: home/end key pressed in the listview don't refresh the detail pane

  • FIXED: Changes on the Options|Feeds default refresh rate dropdown inputbox are not immediatly validated.

  • FIXED: Locating feeds by keyword caused an exception on searches that contains non-ASCII characters.

  • FIXED: Internal browser is not able to display web pages with frames.

  • FIXED: Synchronizing state through WebDAV doesn't use proxy information.

  • FIXED: Temporary search results are no longer persisted (or synchronized).


 

Categories: RSS Bandit