December 15, 2003
@ 05:04 PM

James Robertson writes

Ed Foster points out that MS - like many other vendors - is forbidding benchmarks as part of their standard contracts:

Is it possible Microsoft has something to hide about the performance of its server and developer products? It's hard to escape that conclusion when you see how many of its license agreements now contain language forbidding customers of those products from disclosing benchmark results.

...
So what are MS and the other vendors afraid of?

I'm not sure what the official line is on these contracts but I've come to realize why the practice is popular among software vendors. A lot of the time people who perform benchmarks are familiar with one or two of the products they are testing and know how to tune those for optimal performance but not the others which leads to skewed results. I know that at least on the XML team at Microsoft we don't block people from publishing benchmarks if they come to us, we just ensure that their tests are apples-to-apples comparisons and not unfairly skewed to favor the other product(s) being tested.

Just a few days ago I attended a session at XML 2003 entitled A Comparison of XML Processing in .NET and J2EE where the speaker stated that push based XML parsers like SAX was more performant than pull-based XML parsers like the .NET Framework's XmlReader when dealing with large documents. He didn't give any details and implied that they were lacking because of the aforementioned EULA clauses.  Without any details, sample code or definition of what document size is considered "large" (1MB, 10MB, 100MB, 1GB?)  it's difficult to agree or disagree with his statement. Off the top of my head there aren't any inherrent limitations of pull-based XML parsing that come to mind that should make it perform less than push based parsing of XML documents although differences in implementations makes all the difference. I suspect that occurences like this are why many software  vendors tend to have clauses that limit the disclosure of benchmark information in their EULAs.

Disclaimer: The above is my personal opinion and is in no way, shape or form an official statement of the position of my employer.


 

Categories: Life in the B0rg Cube

I'm now experimenting with various Windows CVS clients to see which best suits my needs for RSS Bandit development. So far I have tried WinCVS which seems OK and I'm about to evaluate Tortoise CVS which Torsten seems very happy with.

Later on I'll experiment with CVS plugins that are integrated into Visual Studio such as Jalindi Igloo or the SCC Plugin for Tortoise CVS. I never got to use the Visual Studio plugin for GotDotNet workspaces when RSS Bandit was hosted there because the original IDE I started developing RSS Bandit (Visual C# 2002 standard edition) with did not support said plugin so I am curious as to what development with source repository access as part of the IDE feels like.


 

Categories: RSS Bandit

December 15, 2003
@ 04:32 PM

It seems my recent post about moving the RSS Bandit from GotDotNet Workspaces to SourceForge has lead to some discussion about the motivations for the move. I've seen this question asked on Daniel Cazzulino's weblog and on the RSS Bandit message board on GotDotNet. Below is my answer to the question phrased in the form of a top 10 list which I posted in response to the question on the GotDotNet message board and also sent to Andy Oakley

Top 10 reasons why we moved to SourceForge

1. Doesn't require people have Passport accounts to download the RSS Bandit installer.

2. We get download and page load statistics.

3. Bug reports can have file attachments. This is great since a lot of the time we end up wishing people would attach their error.log or feedlist.xml file with their bug reports.

4. We can get a mailing list if we want.

5. Separate databases for features vs. bugs.

6. Source code can be browsed over HTTP via ViewCVS without having to install any software

7. Larger quotas on how much you can store on their servers.

8. Bug tracker remembers your queries and the default query is more useful to me (all open bugs) than GDN's (all bugs assigned to me even closed ones).

9. Activity score more accurately reflects activity of the project (on GDN, BlogX is scored at having 99% activity score even though the project has been dead for all intents and purposes for several months).

10. With SourceForge we get to use the BSD licence.

I hope this satisfies the curiosity of those wondering why RSS Bandit moved to SourceForge. I've been using it for a few days and I'm already much happier with it despite some initial teething problems getting adding modules to CVS.


 

Categories: RSS Bandit

According to Reuters

WASHINGTON (Reuters) - A Pentagon (news - web sites) audit of Halliburton, the oil services firm once run by Vice President Dick Cheney (news - web sites), found the company may have overbilled the U.S. government by more than $120 million on Iraq (news - web sites) contracts, U.S. defense officials said on Thursday.

Why am I not surprised? This entire Iraq war fiasco will be the subject of much consternation and entertainment to future generations.


 

December 12, 2003
@ 12:19 PM

Today is the last day of the XML 2003 conference. So far it's been a pleasant experience.

XML IN THE REAL WORLD

Attendance at the conference was much lower than last year. Considering that last year Microsoft announced Office 2003 at the conference while this year there was no such major event, this is no surprise. I suspect another reason is that XML is no longer new and is now so low down in the stack that a conference dedicated to just XML is no longer that interesting. Of course, this is only my second conference so this level of attendance may be typical from previous years and I may have just witnessed an abnormality last year.

Like last year, the conference seemed targetted mostly at the ex-SGML crowd (or document-centric XML users) although this time there wasn't the significant focus on Semantic Web technologies such as topic maps that I saw last year. I did learn a new buzzword around Semantic Web technologies, Semantic Integration and found out that there are companies selling products that claim to do what until this point I'd assumed was mostly theoretical. I tried to ask one such vendor how they deal with some of the issues with non-trivial transformation such as the pubDate vs. dc:date example from a previous post but he glossed over details but implied that besides using ontologies to map between vocabularies they allowed people to inject code where it was needed. This seems to confirm my suspicions that in the real world you end up either using XSLT or reinventing XSLT to perform transformations between XML vocabularies. 

From looking at the conference schedule, it is interesting to note that some XML technologies got a lot less coverage in the conference  relative to how much discussion they cause in the news or blogosphere. For example, I didn't see any sessions on RSS although there is one by Sam Ruby on Atom scheduled for later this morning. Also there didn't seem to be much about XML Web Service technologies being produced by the major vendors such as IBM, BEA or Microsoft. I can't tell if this is because there was no interest in submitting such sessions or whether the folks who picked the sessions didn't find these technologies interesting. Based on the fact that a number of the folks who had "Reviewer" on their conference badge were from the old school SGML crowd I suspect the latter. There definitely seemed to be disconnect between the technologies covered during the conference and how XML is used in the real world in a number of cases.

MEETING XML GEEKS

I've gotten to chat with a number of people I've exchanged mail with but never met including Tim Bray, Jon Udell, Sean McGrath, Norm Walsh and Betty Harvey. I also got to talk to a couple of folks I met last year like Rick Jellife, Sam Ruby, Simon St. Laurent, Mike Champion  and James Clark. Most of the hanging out occurred at the soiree at Tim and Lauren's. As Tim mentions in his blog post there were a couple of "Wow, you're Dare?" or 'Wow, you're Sean Mcgrath?" through out the evening. The coolest part of that evening was that I got to meet Eve Maler who I was all star struck about meeting since I'd been seeing her name crop up as being one of the Über-XML geeks at Sun Microsystems since I was a programming welp back in college and I'm there gushing "Wow, you're Eve Maler" and she was like "Oh you're Dare? I read your articles, they're pretty good". Sweet. Since Eve worked at Sun I intended to give her some light-hearted flack over a presentation entitled UBL and Object-Oriented XML: Making Type-Aware Systems Work which was spreading the notion that the relying on the "object oriented" features of W3C XML Schema was a good idea then it turned out that she agreed with me. Methinks another W3C XML Schema article on XML.com could be spawned from this. Hmmmm.


 

Categories: XML

December 11, 2003
@ 03:42 PM

The new home of the RSS Bandit project is on SourceForge. Various things precipitated this move with the most recent being the fact that a Passport account was needed to download RSS Bandit from GotDotNet. I'd like to thank Andy Oakley for all his help with  GotDotNet Workspaces while RSS Bandit was hosted on there.

The most current release of RSS Bandit is still v1.2.0.61, you can now download it from sourceforge here. The source code is still available, and you can now browse the RSS Bandit CVS repository if interested in such things.


 

Categories: RSS Bandit

Jeremy Zawodney writes

The News RSS Feeds are great if you want to follow a particular category of news. For example, you might want to read the latest Sports (RSS) or Entertainment (RSS) news in your aggregator. But what if you'd like an RSS News feed generated just for you? One based on a word or phrase that you could supply?
...
 For example, if you'd like to follow all the news that mentions Microsoft, you can do that. Just subscribe to this url. And if you want to find news that mentions Microsoft in a financial context, use Microsoft's stock ticker (MSFT) as the search parameter like this.

Compare this to how you'd programmatically do the same thing with Google using the Google Web API which utilizes SOAP & WSDL. Depending on whether you have the right toolkit or not, the Google Web API ease either much simpler or much harder to program against than the Yahoo RSS based search. With the Yahoo RSS based search, a programmer has to directly deal with HTTP and XML when programming against it while with the Google API and the appropriate XML Web Service tools this is all hidden behind the scenes and for the most part the developer programs directly against objects that represent the Google API without dealing directly with XML or HTTP. For example, see this example of talking to the Google API from PHP. Without using appropriate XML Web Service tools, the Google API is more complex to program against than the Yahoo RSS search because one now has to deal with sending and receiving SOAP requests not just regular HTTP GETs. However there are a number of freely available XML Web Service toolsets available so there should be no reason to program against the Google API directly.

This being said there are a number of benefits to the URI-based (i.e RESTful) search that Yahoo provides which comes from being a part of the Web architecture.

  1. I can bookmark a Yahoo RSS search or send a link to it in an email. I can't do the same with an RPC-style SOAP API.
  2. Intermediaries between my machine and Google are unlikely to cache the results of a search made via the Google API since it uses HTTP POST but could cache requests that use the Yahoo RSS-based  search since it uses HTTP GET.  This improves the scalability of the Yahoo RSS-based search without any explicit work from myself or Yahoo, this is just from utilizing the benefits of the Web architecture.

The above contrast of the differing techniques for returning search results as XML used by Yahoo and Google is a good way to compare and contrast RESTful XML Web Services to RPC-based XML Web Services and understand why some people believe strongly [perhaps too strongly] that XML Web Services should be RESTful not RPC-based.

By the way, I noticed that Adam Bosworth is trying to come to grips with REST which should lead to some interesting discussion for those who are interested in the RESTful XML Web Services vs. RPC-based XML Web Services debate.

 

 


 

Categories: XML

In the most recent release of RSS Bandit we started down the path of making it look like Outlook 2003 by using Tim Divil's Winforms controls. The primary change we made was change the toolbar. This change wasn't good enough for Thomas Feudenberg who made a couple of other changes to RSS Bandit that make it look more like Outlook 2003. He wrote

Anyway, contrary to SharpReader, you can get the bandit's source code. Because I really like the UI of Outlook 2003 (and FeedDemon), I patched RSS Bandit:

It took about 15 minutes. Mainly I docked the feed item list and the splitter to the left. Additionally, I've created a custom feed item formatter, which bases on Julien Cheyssial's MyBlogroll template. You can download my XSLT file here.

You can find a screenshot on his website. Torsten's already made similar changes to the official RSS Bandit source code after seeing Thomas's feedback.


 

Categories: RSS Bandit

I accidentally caught Al Sharpton on Saturday Night Live last night and it was a horrible experience. Not only was the show as unfunny as getting needles shoved in your eyeballs  (why the fuck do good shows like Futurama and Family Guy get cancelled but this turd continues to stink up the airwaves?) but our pal Al keep fumbling his lines like he'd forgotten them and kept having to surreptituously read them from the teleprompter. What a joke.

Definitely a horrible way to end a Saturday night.


 

Categories: Ramblings

Shelley Powers writes

For instance, The W3C TAG team -- that's the team that's defining the architecture of the web, not a new wrestling group -- has been talking about defining a new URI scheme just for RSS, as brought up today by Tim Bray. With a new scheme, instead of accessing a feed with:

http://weblog.burningbird.net/index.rdf

You would access the feed as:

feed://www.tbray.org/ongoing/ongoing.rss

I've been trying to avoid blogging about this discussion since I'll be leaving for Philly to attend the XML 2003 conference in a few hours and won't be able to participate in any debate. However since it seems some folks have started blogging about this topic and there  some misconceptions in their posts I've thrown my hat in the ring.

The first thing I want to point is that although Shelley is correct that some discussion about this has happened on the W3C Technical Architecture Group's mailing list they are not proposing a new URI scheme. Tim Bray was simply reporting on current practices in the RSS world that I mentioned in passing on the atom-syntax list.

THE PROBLEM
The problem statement is "How does a user go to a website such as http://news.yahoo.com or http://www.slashdot.org, who'd like to subscribe to information from these sites in a client-side news aggregator do so in a quick and painless manner?". The current process is to click on an icon (most likely an orange button with the white letters 'XML' on it) that represents an RSS feed, copy the URL from the browser address bar, fire up your RSS client and click on the subscribe dialog (if it has one).

This is lot of steps and many attempts have been made to collapse this into one step (click link and the right dialog pops up). 

PREVIOUS SOLUTIONS TO THE PROBLEM
The oldest one I am aware of was pioneered by Dave Winer and involved client side aggregators listening on a port on the local machine and a hyperlink on the website linking to a URL of the form http://127.0.0.1:5335/system/pages/subscriptions. This technique is used by every Radio Userland weblog and is even used by dasBlog which is my blogging tool of choice as is evidenced by clicking on the icon with a picture of a coffee mug and the letters "XML" on it at bottom of my weblog.

There are two problems with this approach. The first is the security issue brought on by the fact that you have a news aggregator listening on a socket on your local machine which could lead to hack attempts if a security exploit is found on in your news aggregator of choice, however this can be mitigated by firewalls and so far thus hasn't been a problem. The second is that if one has multiple aggregators installed there is contention for which one should listen on that port. For this reason different aggregators listen on different local ports; Radio listens on port 5335, AmphetaDesk listens on port 8888, Awasu listens on port 2604, nntp//rss listens on port 7810 and so on.

An alternative solution was chosen by various other aggregator authors in which hyperlinks pointed to the URLs of RSS feeds with the crucial distinction that the http:// part of the URL was substituted with a custom URI scheme. Since most modern browser have a mechanism for handing off unknown URI schemes to other client applications this also allows "one-click feed subscription".  Here also there is variance amongst news aggregators;  Vox Lite, RSS Bandit & SharpReader support the feed:// URI scheme, WinRSS supports the rss:// URI scheme while NewsMonster supports the news-monster:// scheme.

With all this varying approaches, it means that any website that wants to provide a link that allows one click subscription to an RSS feed needs to support almost a dozen different techniques and thus create a dozen different hyperlinks on their site. This isn't an exaggeration, this is exactly what Feedster when one wants to subscribe to the results of a search. If memory serves correcly, Feedster uses the QuickSub javascript module to present these dozen links in a drop down list.

THE FURORE
The recent debate on both the atom-syntax and the www-tag mailing lists focuses on the feed:// URI proposal and it's lack of adherence to guidelines set in the current draft of Architecture of the World Wide Web document being produced by the W3C Technical Architecure Group. This document is an attempt to document the architecture of the World Wide Web ex post facto.

Specifically the debate hinges on the guideline that states

Authors of specifications SHOULD NOT introduce a new URI scheme when an existing scheme provides the desired properties of identifiers and their relation to resources.
...
If the motivation behind registering a new scheme is to allow an agent to launch a particular application when retrieving a representation, such dispatching can be accomplished at lower expense by registering a new Internet Media Type instead. Deployed software is more likely to handle the introduction of a new media type than the introduction of a new URI scheme.

The use of unregistered URI schemes is discouraged for a number of reasons:

  • There is no generally accepted way to locate the scheme specification.
  • Someone else may be using the scheme for other purposes.
  • One should not expect that general-purpose software will do anything useful with URIs of this scheme; the network effect is lost.

The above excerpt assumes that web browsers on the World Wide Web are more likely to know how to deal with unknown Internet Media Types than unknown URI schemes which is in fact the case. For example, Internet Explorer  will fallback to using the file extension of the file if it doesn't know how to deal with the provided MIME type (see  MIME Type Detection in Internet Explorer for more details). However there are several problems with using MIME types for one click feed subscription that do not exist in the previously highlighted approaches.

Greg Reinacker detailed them in hist post RSS and MIME types a few months ago.

Problem 1: [severity: deal-breaker] In order to serve up a file with a specific MIME type, you need to make some changes in your web server configuration. There are a LOT of people out there (shared hosting, anyone?) who don't have this capability. We have to cater to the masses, people - we're trying to drive adoption of this technology.

Problem 1a: [severity: annoyance] There are even more people who wouldn't know a MIME type from a hole in the head. If Joe user figures out that he can build a XML file with notepad that contains his RSS data (and it's being done more often than you think), and upload it to his web site, you'd think that'd be enough. Sorry, Joe, you need to change the MIME type too. The what?

Problem 2: [severity: deal-breaker] If you register a handler for a MIME type, the handler gets the contents of the file, rather than the URL. This is great if you're a media player or whatever. However, with RSS, the client tool needs the URL of the RSS file, not the actual contents of the RSS file. Well, it needs the contents too, but it needs the URL so it can poll the file for changes later. This means that the file that's actually registered with a new MIME type would have to be some kind of intermediate file, a "discovery" file if you will. So now, not only would Joe user have to learn about MIME types, but he'd have to create another discovery file as well.

Many people in the MIME type camp have pointed out that problem two can be circumvented by having the feed file contain it's location. Although this seems a tad redudundant and may be prone to breakage if the website is reorganized it probably should work for the most part. However there is at least one other problem with using MIME types which people have glossed over. 

Problem 3:  If clicking on a link to an RSS feed in your browser always invokes your aggregator's feed subscription dialog then this means you can't view an RSS feed in your browser if you have a client aggregator installed and may not be able to view even if you don't because your browser of choice may not know how to handle the MIME type if it isn't something like text/xml.

At least one person, Tim Bray, doesn't see this as a big deal and in fact stated, "why not? Every time I click on a link to a PDF doc I get a PDF reader. Every time I click on a .mov file I get Quicktime. Seems like a sensible default behavior".

THE BOTTOM LINE
Using MIME types to solve the one click subscription problem is more diffficult for weblog tools to implement than the other two approaches favored by news aggregators and requires changing web server configurations as well while the other approaches do not. Although the architecure astronauts will rail against the URI scheme based approach it is unlikely that anyone who looks dispassionately at all three approaches will choose to use MIME types to solve this problem. 

Of course, since one of the main forces behind the ATOM movement has stated that MIME types will be the mechanism used for performing one click subscription to ATOM feeds this just seems like one more reason for me to be skeptical about the benefits of adopting the ATOM syndication format.


 

Categories: XML