I recently stumbled on an entry by Lucas Gonze where he complains about the RSS <enclosure> element. He writes

Problems with the enclosure element:

  • It causes users to download big files that they will never listen to or watch, creating pointless overload on web hosts.
  • It doesn't allow us to credit the MP3 host, so we can't satisfy the netiquette of always linking back.
  • For broadband users, MP3s are not big enough to need advance caching in the first place.
  • The required content-type attribute is a bad idea in the first place. Mime settings are already prone to breakage, adding an intermediary will just create another source of bugs. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.
  • The required content-length attribute should not be there. It requires people who link to MP3s to HEAD them and calculate the length, which is sometimes not practical. It makes variable-length MP3s illegal. There are no usecases for this attribute that can't be more easily and robustly satisfied by having clients HEAD the URL for themselves.

The primary problem with the <enclosure> element is that it is overspecified. Having an element that says, here is a pointer to some data that is related to this entry that is too large to fit in the feed is a good idea. Similarly providing a hint at what the MIME type is so the reader knows whether it can handle that MIME type or can display something specific to that media type in the user interface without making an additional request to the server is very useful. The description of the enclosure element in RSS 2.0 states

<enclosure> sub-element of <item> 

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

Syndication geeks might notice that this is akin to the <link> element in the ATOM 0.3 syndication format which is described as

3.4  Link Constructs

A Link construct is an element that MUST NOT have any child content, and has the following attributes:

3.4.1  "rel" Attribute

The "rel" attribute indicates the type of relationship that the link represents. Link constructs MUST have a rel attribute, whose value MUST be a string, and MUST be one of the values enumerated in the Atom API specification <eref>http://bitworking.org/projects/atom/draft-gregorio-09.html</eref>.

3.4.2  "type" Attribute

The "type" attribute indicates an advisory media type; it MAY be used as a hint to determine the type of the representation which should be returned when the URI in the href attribute is dereferenced. Note that the type attribute does not override the actual media type returned with the representation.

Link constructs MUST have a type attribute, whose value MUST be a registered media type [RFC2045].

3.4.3  "href" Attribute

The "href" attribute contains the link's URI. Link constructs MUST have a href attribute, whose value MUST be a URI [RFC2396].

xml:base [W3C.REC-xmlbase-20010627] processing MUST be applied to the atom:url element.

3.4.4  "title" Attribute

The "title" attribute conveys human-readable information about the link. Link constructs MAY have a title attribute, whose value MUST be a string.

So the ideas behind the <enclosure> element were good enough that they appear in ATOM with some additional niceties and a troublesome bit (the length attribute) removed. So if the concepts behid the <enclosure> element are so good that they are first class members of the ATOM syndication format. Why does Lucas not like it? The big problem with RSS enclosures is how Dave Winer expected them to be used. An aggregator was supposed to act like a TiVo, automatically downloading files in the background and presenting them to you when it's done. The glaring problem with doing this is that it means lots of people are automatically downloading large files that they didn't request which is a significant waste of bandwidth. In fact, most aggregators either do not support enclosures or simply show them as links which is what FeedDemon and RSS Bandit (with the Outlook 2K3 skin) do. The funny thing is that the actual RSS specification doesn't describe this behavior, instead this behavior is implied by Dave Winer's descriptions of use cases.

Lucas also complains about the required length attribute which is problematic if you are pointing to a file on a server you don't own because you have to first download the file or perform a HTTP HEAD to get its size. The average blogger isn't going to go through that kind of trouble. Although tools could help it makes sense for the  length attribute  to have been an optional hint.

I have to disagree with Lucas's complaints about putting the MIME type in the <enclosure> element. He complains that the MIME type in the <enclosure> could be wrong and in fact that in many cases web servers  serve a file with the wrong MIME type. Thus he concludes that putting the MIME type in the enclosure is wrong. Client software should be able, to decide how to react to the enclosure [e.g. if it is audio/mpeg display a play button] without having to make additional HTTP requests especially since as Lucas points out it is not a 100% guaranteed that performing an HTTP HEAD of the linked file will actually get you the correct MIME type from the web server.

In conclusion, I agree that the <enclosure> element is problematic but most of the problems are due to the implied use case suggested by the spec author, Dave Winer, as opposed to the actual information provided by the element. The ATOM approach of describing the information provided by each element in a feed but not explicitly describing the expected behavior of clients is a welcome approach. Of course, there will always be developers who require structure or take an absence of explicit guidelines to mean do stupid things (like aggregators that fetch your feed every 5 minutes)  but these are probably better handled in "Best Practices" style documents or test suites than in the actual specification.


 

Categories: XML

May 26, 2004
@ 05:22 PM

One of the hardest problems in software development is how to version software and data formats. One of the biggest problems for Windows for years has been DLL Hell which is a versioning problem. One of the big issues I have to deal with at work is how to deal with versioning issues when adding or removing functionality from classes.

For a few weeks, I've been planning to write up some guidelines and concerns for versioning XML formats based on my experiences and those of others at Microsoft. I've got some folks on the XML Web Services team interested in riding shotgun such as Gudge and Doug. It also looks like Edd Dumbill is interested in the abstract for the article, so it with any luck it should end up on XML.com when it is done.

I was reminded of the importance of writing this article when I saw a post on the atom-syntax list by Google's Steve Jensen which implied that it just occured to the folks at Google that they'd have to support multiple versions of ATOM. This is excarberated by the fact that they are deploying implementations based on draft specs. Like I said before, never ascribe to malice that which can be explained by incompetence


 

Categories: XML

When I first got to Microsoft a few years ago, there was an acknowledgement from upper management that Microsoft technologies tended not to attract the vibrant sense of community that existed in Java or Open Source communities. This began a push, first in the Developer Division which soon spread across the company for Microsoft employees to become more involved and help nurture the developer communities surrounding our technologies and products. Two or three years later, I am heartened to read posts such as this entry from Norman Alex Rupp from his first day at TechEd 2004, note that I have italicized some of the text from the entry to emphasize some key points I found interesting

The User Group Summit was headed up by the International .NET Association (INETA). From what I can tell, INETA User Groups are analogous to the Java User Groups. They're an independent organization, and their founder goes to great lengths to maintain a comfortable operating distance from Microsoft's PR machine, while simultaneously being careful not to alienate them. It strikes me that the INETA groups highly value their independence and don't want to come across as a Microsoft vendorfest to their members. They focus on C# development topics and although they thankfully accept Microsoft's sponsorship, they do maintain a good degree of independence. That's a difficult balance to strike.

What really fascinated me about the UG Leaders Summit was that the .NET Group Leaders from around the country knew each other, had their own community structure, and genuinely seemed to enjoy being around each other. These guys were rowdy. They were having a good time. And it wasn't just because we each got a 30 oz bottle of Tequila at the end of the meeting. People were really positive and nice. This was a slight cultural change for me, because all too often I find the Open Source Java community to be extremely high strung and competitive--sometimes to the point of being vicious. I like to think of the dynamic of our community as an extreme form of tough love. I haven't worked a lot with the Java User Group communities from around the country, and I have an inkling that things are a bit different in those circles than they are in the Jakarta / JBoss / TSS / Bile Blog OS Javasphere that used to form my only umbilical link to our community. (For the record, I don't think this "tough love" culture extends into the Java.net community--the folks from Sun's "shining city on the hill" are pretty amiable).

It was just a different vibe--not necessarily better, just different. I can see more of that in the future of the Javasphere. We live in a pressure cooker, but as the language and platform mature and we continue to carve out our niche, gain credibility in the industry and grow as developers, I think we'll see less of the infighting and more of the cooperation typified by last year's OpenEJB / Geronimo alliance and by the general good will surrounding Java.net.

One surprising thing I learned at today's Summit is that in the last 3 years, INETA and Microsoft have built up a 200,000 to 250,000 member developer community, and they're continuing to push forward, doing everything they can to make sure that .NET technologies take off at the local community level. They're hyperactively heading up programs to develop high school and college students, and they recognize the long term importance of bringing fresh blood into the industry. They are investing time, software and significant amounts of money into their evangelism efforts.

Essentially, what INETA and Microsoft are trying to do is outgrok the ASF on community building. And from what I just saw, they're way ahead of the curve. In their words, "we're trying to get it. You can help us REALLY get it." And by "get it" I think they mean to figure out how to have a successful user community in every city and on every major college campus in the world. I'm speculating, but it's hard not to smell ambition this raw.

This is one of the reasons I like working at Microsoft. When the hive mind in the b0rg cube decides to do something, it goes all out and does it well. The challenge is figuring out what the right thing to do is and then convincing the right people. I am continually amazed at how much things have changed in the past few years with regards to the degree of openness and the higher level of interaction between product groups and their customers. 


 

Categories: Life in the B0rg Cube

I've mentioned in the past why I think XML 1.1 was a bad idea in my post XML 1.1: The W3C Gets It Wrong. It seems at least one W3C working group, the XML Protocols working group to be exact, has now realized why XML 1.1 is a bad idea a few months later. Mark Nottingham recently posted a message to the W3C Technical Architecture Group's mailing list entitled Deployment and Support of XML 1.1 where he writes

In the Working Group's view, this highlights a growing misalignment in
the XML architecture. Until the advent of XML 1.1, XML 1.0 was a single
point of constraint in the XML stack, with all of the benefits (e.g.,
interoperability, simplicity) that implies. Because XML 1.1 has
introduced variability where before there was only one choice, other
standards now need to explicitly identify what versions of XML they are
compatible with. This may lead to a chicken-and-egg problem; until
there is a complete stack of XML 1.1-capable standards available, it is
problematic to use it.

Furthermore, XML-based applications will likewise need to identify
their capabilities and constraints; unfortunately, there is no
consistent way to do this in the Web architecture (e.g., RFC3023 does
not provide a means of specifying XML versions in media types).

As I mentioned in my previous post about the topic, XML 1.1 hurts the interoperability story of XML which is one of the major reasons of using it in the first place. Unfortunately, the cat is already out of the bag, all we can do now is try to contain or avoid it without getting our eyes clawed out. I tend to agree with my coworker Michael Rys, the day XML 1.1 became a W3C recommendation was a day of mourning.


 

Categories: XML

This is mostly a bugfix release. Major features will show up in the next release scheduled for the end of the summer or later.

Download the installer from here. Differences between v1.2.0.112 and v1.2.0.114 below

  • FEATURE: Local search now supports boolean operators so one can write queries like "IBM | Microsoft & !Java" which means search for entries containing Microsoft or IBM but not Java. Queries can also be grouped with parenthesis such as "iPod & (iTunes | (Apple & !fruit))". Thanks to Brian Leonard for the patch.

  • FEATURE: Tree view and list view now support the scroll wheel on Microsoft Intellimouse. 

  • FIXED: "My Feeds" root node displays incorrect unread messages count after remote synchronization.

  • FIXED: Installed version doesn't support Windows XP themes.

  • FIXED: home/end key pressed in the listview don't refresh the detail pane

  • FIXED: Changes on the Options|Feeds default refresh rate dropdown inputbox are not immediatly validated.

  • FIXED: Locating feeds by keyword caused an exception on searches that contains non-ASCII characters.

  • FIXED: Internal browser is not able to display web pages with frames.

  • FIXED: Synchronizing state through WebDAV doesn't use proxy information.

  • FIXED: Temporary search results are no longer persisted (or synchronized).


 

Categories: RSS Bandit

Myriad shots of Donald Rumsfeld during press briefings have unearthed his deadly secret. Behold, the Rumsfeld Fighting Technique.


 

May 25, 2004
@ 04:37 PM

The next version of SQL Server will have a significant amount of functionality related to storing, querying and extracting XML from the database. To accompany the information being imparted at TechEd 2004, myself and rest of the folks behind the XML Developer Center on MSDN decided to run a series of articles on the XML features of SQL Server 2005. The articles will run through the month of June.

The first article in the series is XML Support in Microsoft SQL Server 2005. Read this article if you are interested in learning how SQL Server 2005 has become an fully XML-aware database including the addition of the XML datatype, support for XML Schemas, indexing  of XML data, XQuery, querying XML views of relational data using XPath and much more.


 

Categories: XML

May 22, 2004
@ 06:02 PM

Joshua Allen has a post entitled RSS Politics which does a good job of properly framing the growing Microsoft and RSS vs. Google and Atom silliness spurred by Joi Ito that I've been seeing in the comments on Robert Scoble's weblog. Joshua writes

First, be very clear.  The “debate“ over Atom vs. RSS is a complete non-issue for Microsoft.  We use RSS to serve thousands of customers right now, and most of the people setting up RSS feeds have never heard of the political “debates“.  RSS works for them, and that's all they care about.  On the other hand, if Atom ever reaches v1.0 and we had a business incentive to use it, we would use it.  No need for debate.

Now, of the three or four people at Microsoft who know enough about Atom to have said anything about it, I wouldn't say that anyone has trashed the format.  I and others have pointed out that it's just fine for what it does; just like RSS.  If anything, I have asked hard questions about why I or any business decision maker should be spending resources on the whole debate right now.  If a business has deployed using RSS, what financial motive would they have to switch to a new, nearly identical, format once it ships?  I've got nothing against the Atom people inventing new syndication formats, but I just don't see why *I* should be involved right now.  There's no good reason.

The other comment I've made before is that the Atom community is not being served by the polarizing attitudes of some participants.  The “us vs. them“ comments are not helpful, especially when untrue, and the constant personalization (”Support Atom because I hate Dave Winer!”) just damages the credibility of the whole group (many of whom might have good motives for being involved).

I totally echo his sentiments. In the past couple of months more and more folks at Microsoft have pinged me about syndication and blogging technologies once they learn I wrote RSS Bandit. Every single time I've given them the same advice I gave in my post, Mr. Safe's Guide to the RSS vs. ATOM debate. If you are a feed consumer you'll need to support the various flavors of RSS and the various flavors of ATOM (of which there'll at least be two, ATOM 0.3 and whatever is produced from the IETF/W3C process). If you are a feed producer, you should stick with RSS 0.91/2.0 since this is the widest supported format and the most straightforward.

Although no one has asked yet, I'm also going to give my advice on Mr. Safe at Microsoft should consider adopting the ATOM API. In my personal opinion, the current draft of the ATOM API seems better designed and falls more inline with Microsoft's technologies than the existing alternatives (Blogger API/MetaWeblog API/LiveJournal API), etc. However the API lacks lots of functionality and in fact already there are extensions to the ATOM API showing up in the wild. Currently, these "innovations" are being lauded but given the personalities behind ATOM it is likely that if Microsoft products supported the API and extended it there could be a negative backlash. In which case perhaps going with a product specific API may be the best option if there is sensitivity to such feedback or the ATOM API has to be significantly extended to fit the product's needs.


 

Categories: Life in the B0rg Cube | XML

I've posted a few entries in the past questioning the value of the Semantic Web as currently envisioned by the W3C along with its associated technologies like RDF and OWL. My most recent post about this was On Semantic Integration and XML. It seems I'm not the only XML geek who's been asking the same questions after taking a look at the Semantic Web landscape. Elliotte Rusty Harrold is at WWW2004 and wrote the following opinions of the Semantic Web on Day 4 of WWW2004

This conference is making me think a lot about the semantic web. I'm certainly learning more about the details (RDF, OWL etc.). However, I still don't see the point. For instance what does RDF bring to the party? The basic idea of RDF is that a collection of URIs forms a vocabulary. Different organizations and people define different vocabularies, and the URIs sort out whose name, date, title, etc. property you're using at any given time. Remind you of anything? It reminds me a lot of XML + namespaces. What exactly does RDF bring to the party? OWL (if I understand it) lets you connect different vocabularies. But so does XSLT. I guess the RDF model is a little simpler. It's all just triples, that can be automatically combined with other triples, and thereby inferences can be drawn. Does this actually produce anything useful, though? I don't see the killer app. Theoretically a lot of people are talking about combining RDF and ontologies from mulktiple sources too find knowledge that isn't obvious from any one source. However, no one's actually publishing their RDF. They're all transforming to HTML and publishing that.

I've written variations of the same theme over the past couple of months. It's just hard to point at any practical value that RDF/OWL/etc provide over XML/XSLT/etc for semantic integration.


 

Categories: XML

Every couple of months someone asks me why I haven't written up my thoughts about the current and future trends in social software, blogging and syndication as part of a Bill Gates "Think Week" paper. I recently was asked this again and I'm now considering whether to spend some time doing so or not. If you are unfamiliar with a "Think Week", below is a description of one taken from an interview with Bill Gates

I actually do this thing where I take a week and I call it "Think Week" where I just get to go off and read the latest Ph.D. theses, try out new technologies, and try and write down my thoughts about where the market is going. Things are going fast enough that instead of doing one think a year, last year I started doing two a year. And that’s one of the most fun parts of my job. So, you know, not only trying things out, but seeing how the pieces fit together and thinking ahead what kind of software will that require, that’s a big part of my job. And I get lots of great ideas coming from the people inside Microsoft, whether it’s sending e-mail, or meeting with me, and it’s important for me to synthesize that and so there’s a lot of thinking that I’ve got to do. And, you know, that’s fun.

I have been balking at writing one for a few reasons. The first was that it seems like a bunch of effort for relatively small return [the people I know who've written one first hand got the equivalent of a "virtual pat in the back"], the second was that I didn't think this topic would be interesting enough to get past the layer of VPs and technical assistants that probably screen these papers before Bill Gates reads them.

After thinking about this some more it seems that I was wrong about whether BillG would be interested in this topic given his recent endorsement of blogging and syndication. I still don't think much would come out of it but I've now see myself bursting with a lot of ideas about the current and future landscape of blogging and syndication technologies that I definitely want to write something down anyway regardless of who reads it. If I write this paper I plan to make it available online along with my other writings. The question is whether there are any folks out there interested in reading such a paper? If not, it is easier for me to just keep notes on the various ideas and blog bits & pieces of the ideas as I have been doing thus far.

So what do you guys think?


 

Categories: Ramblings | RSS Bandit