September 29, 2004
@ 08:33 AM

The Bloglines press release entitled New Bloglines Web Services Selected by FeedDemon, NetNewsWire and Blogbot to Eliminate RSS Bandwidth Bottleneck has this interesting bit of news

Redwood City, Calif.--September 28, 2004 -- Three leading desktop news feed and blog aggregators announced today that they have implemented new open application programming interfaces (API) and Web Services from Bloglines (www.bloglines.com) that connect their applications to Bloglines' free online service for searching, subscribing, publishing and sharing news feeds, blogs and rich web content. FeedDemon (www.bradsoft.com), NetNewsWire (www.ranchero.com), and Blogbot (www.blogbot.com) are the first desktop software applications to use the open Bloglines Web Services.

Bloglines Web Services address a key issue facing the growing RSS market by reducing the bandwidth demands on sites serving syndicated news feeds. Now, instead of thousands of individual desktop PCs independently scanning news sources, blogs and web sites for updated feeds, Bloglines will make low-bandwidth requests to each site on behalf of the universe of subscribers and cache any updates to its master web database. Bloglines will then redistribute the latest content to all the individuals subscribed to those feeds via the linked desktop applications -- FeedDemon, NetNewsWire or Blogbot -- or via Bloglines' free web service.
...
Bloglines Web Services Enable Synchronization for Desktop News Aggregators "Our customers have been looking for the ability to synchronize their feed subscriptions across multiple computers," said Nick Bradbury, founder of Bradbury Software and creator of FeedDemon, the leading RSS aggregator for Windows. "By partnering with Bloglines, we are now able to offer the rich desktop functionality FeedDemon customers have come to expect, with the flexible mobility and portability of a web service."

There are two aspects of this press release I'm skeptical about. The first is that having desktop aggregators fetch feeds from Bloglines versus the original sources of the feeds somehow "eliminates the RSS bandwidth bottleneck". It seems to me that the Bloglines proposal does the opposite. Instead of thousands of desktop aggregators fetching tens of thousands to hundreds of thousands of feeds from as many websites instead it is proposed that they all ping the Bloglines server. This seems to be creating a bottleneck to me, not the other way around.

The second aspect of the proposal is that I call into question is the Bloglines Sync API. The information on this API is quite straightforward

The Bloglines Sync API is used to access subscription information and to retrieve blog entries. The API currently consists of the following functions:

  • listsubs - The listsubs function is used to retrieve subscription information for a given Bloglines account.
  • getitems - The getitems function is used to retrieve blog entries for a given subscription.

All calls use HTTP Basic authentication. The username is the email address of the Bloglines account, and the password is the same password used to access the account through the Bloglines web site.

I was interested in using this API to round out the existing feed synchronization support in RSS Bandit. In current versions a user can designate a file share, WebDAV server or FTP server as the central location for synchronizing multiple instances of RSS Bandit. I investigated what it would take to add Bloglines as a fourth synchronization point after reading the aforementioned press release and came to the conclusion that the API provided by Bloglines falls short of providing the functionality that exists in RSS Bandit today with the other synchronization sources.  

The problems with the Bloglines Sync API include

  1. The Bloglines Sync API only allows clients to retrieve the subscribed feeds. The user has to login to the Bloglines site to perform feed management tasks like adding, deleting or modifying the feeds to which they they are subscribed.
  2. No granular mechanism to get or set the read/unread state of the items in the users feed list. 

These limitations don't make using the Bloglines Sync API a terribly useful way for synchronizing between two desktop aggregators. Instead, it primarily acts as a way for Bloglines to use various desktop aggregators as a UI for viewing a user's Bloglines subscriptions without the Bloglines team having to build a rich client application.

Thanks, but I think I'm going to pass.


 

September 27, 2004
@ 07:28 AM

Today my TiVo dissapointed me for what I feel is the last time. Yesterday I set it to record 4 never before aired episodes of Samurai Jack. Sometime between 6:30 PM this evening and 11 PM the TiVo decided that recording suggestions was more important than keeping around one of the episodes of Samurai Jack.

For the past few months I've been disappointed with the TiVo's understanding of priority when keeping recordings. For example, it shouldn't delete a first run episode over a rerun especially when the Season Pass for the deleted episode is set to record only first runs.  This is a pretty basic rule I'm sure I could write myself if I had access to the TiVo source code. This last mistake is the straw that has broken the camel's back and I'm now seeking a replacement to TiVo.

I now realize I prefer an Open Source solution so I can hack it myself. Perhaps I should take a look at MythTV.


 

Categories: Technology

September 26, 2004
@ 07:10 PM

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds designed by the class of people Mark Pilgrim described in his post Why specs matter as assholes. These are people who

read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos.  Then they write code that is meticulously spec-compliant, but useless.  If someone yells at them for writing useless software, they smugly point to the sentence in the spec that clearly spells out how their horribly broken software is technically correct

This is the first in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website. Feeds in these series will often be technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators.

This week's gem is the Cafe con Leche RSS feed. Instead of pointing out what is wrong with this feed myself I'll let the author of the feed do so himself. On September 24th Elliotte Rusty Harold wrote

I've been spending a lot of time reviewing RSS readers lately, and overall they're a pretty poor lot. Latest example. Yesterday's Cafe con Leche feed contained this completely legal title element:

<title>I'm very pleased to announce the publication of XML in a Nutshell, 3rd edition by myself and W.
          Scott Means, soon to be arriving at a fine bookseller near you.
          </title>

Note the line break in the middle of the title content. This confused at least two RSS readers even though there's nothing wrong with it according to the RSS 0.92 spec. Other features from my RSS feeds that have caused problems in the past include long titles, a single URL that points to several stories, and not including more than one day's worth of news in a feed.

Elliote is technically right, none of the RSS specs says that the <link> element in an RSS feed should be unique for each item so he can reuse the same link for multiple items and still have a valid RSS feed.  So why does this cause problems for RSS aggregators?

Consider the following RSS feed

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item 1</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
  </channel>
</rss>

Now consider the same feed fetched a few hours later

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item one</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 3</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
   </channel>
</rss>

Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one?  The simple answer is that it can't. A naive hack is to look at the content of the <description> element to see if it is the same but what happens when a typo was fixed or some update to the content of the <description>?

Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.

A solution to this problem is for Elliotte Rusty Harrold to upgrade his RSS feed to RSS 2.0 and use guid elements to distinctly identify items.


 

September 26, 2004
@ 05:42 PM

As I mentioned in my post News Aggregators As Denial of Service Clients (part 2) 

the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

It also turned out that the newest version of dasBlog also stopped supporting HTTP Conditional GET to category specific feeds when I upgraded from 1.5 to 1.6. This meant I was wasting a huge amount of bandwidth since thousands of RSS Bandit users are subscribed to the feed for my RSS Bandit category.

I decided to download the dasBlog source code and patch my local instance. As I expected it took longer to figure out how to configure ASP.NET and Visual Studio to allow me to compile my own blog software than it did to fix the problem. I guess that's a testament to how well the dasBlog code is written.

Mad props go out to Omar, Clemens and the rest of the dasBlog crew.


 

Categories: Ramblings

In her post Blog Activity Julia Lerman writes

There must be a few people who have their aggregators set to check rss feeds every 10 seconds or something. I very rarely look at my stats because they don't really tell me much. But I have to say I was a little surprised to see that there were over 14,000 hits to my website today (from 12am to almost 5pm).

So where do they come from?

10,000+ are from NewzCrawler then a whole lot of other aggregators and then a small # of browsers. 

This problem is due to the phenomenon originally pointed out by Phil Ringnalda in his post What's Newzcrawler Doing? and expounded on by me in my post News Aggregators As Denial of Service Clients. Basically 

According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.  

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home.  Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.


 

In a post entitled When will Scoble earn his Longhorn pay? Robert Scoble writes

The thing is that I don't have any credibility left when it comes to Longhorn. Over the last 18 months I got out there and lead lots of Longhorn cheers. And now there's a changing of direction.

Tons of people, both inside and outside of Microsoft, have been talking with me about where we're going now. I've met in the past week with the Avalon and WinFS teams (yes, they both still exist).

The thing is, I am super sensitive right now to making a whole new round of promises. I'd rather wait to talk until there's beta build to hand you. Why? Cause what good does it do to write about the feature set if you can't see it? And if you're a developer, you don't want to hear FUD, you wanna see working APIs.

Shortly after Robert joined Microsoft I sent him a link to Joel Spolsky's Mouth Wide Shut article because I thought he was going overboard in pimping Longhorn. Experience working with product teams at Microsoft had already thought me that until a technology is in beta almost everything about it can change. For example, the plans my team had for what we were shipping in Whidbey two years ago are very different from what we planned to ship a year ago which is very different plan to ship today. Features gets cut all the time, priorities change and then there's the date driven release dance.

Microsoft has always had a credibility problem due to what people have termed vaporware announcements. Although many have assumed that the company does this maliciously the truth of the matter is that a lot of these incidents are product teams prematurely announcing their plans to the world. Personally I think Microsoft's evangelists and marketing folks could do the company, our customers and the software industry in general a service by shutting the hell up about future product plans until they were more than a glimmer in some software architect's eye.

Borrowing a leaf from Apple doesn't sound so bad right about now.


 

Categories: Life in the B0rg Cube

My article Improving XML Document Validation with Schematron is finally up on MSDN. It provides a brief introduction to Schematron, shows how to embed Schematron assertions in a W3C XML Schema document for improved validation capabilities and how to get the power of Schematron in the .NET Framework today. The introduction of the article is excerpted below

Currently the most popular XML schema language is the W3C XML Schema Definition language (XSD). Although XSD is capable of satisfying scenarios involving type annotated infosets it is fairly limited when it comes to describing constraints on the structure of an XML document. There are many examples of situations where common idioms in XML vocabulary design are impossible to express using the constraints available in W3C XML Schema. The three most commonly requested constraints that are incapable of being described by W3C XML Schema are:

  1. The ability to specify a choice of attributes. For example, a server-status element should either have a server-uptime attribute or a server-downtime attribute.

  2. The ability to group elements and attributes into model groups. Although one can group elements using compositors such as xs:sequence, xs:choice, and xs:all, the same cannot be done with both elements and attributes. For example, one cannot create a choice between one set of elements and attributes and another.

  3. The ability to vary the content model based on the value of an element or attribute. For example, if the value of the status attribute is "available" then the element should have an uptime child element; otherwise it should have a downtime child element. The technical name for such constraints is co-occurrence constraints.

Although these idioms are widely used in XML vocabularies it isn't possible to describe them using W3C XML Schema, which makes it difficult to rely on schema validation for enforcing the message contract. This article describes how to layer such functionality on top of the W3C XML Schema language using Schematron.

Embedding Schematron assertions in a W3C XML Schema document allows you to have your cake and eat it to.


 

Categories: XML

September 19, 2004
@ 10:14 PM

Tim Bray has another rant on the prolifieration of WS-* specs in the XML Web Services world. In his post The Loyal WS Opposition he writes

I Still Dont Buy It No matter how hard I try, I still think the WS-* stack is bloated, opaque, and insanely complex. I think its going to be hard to understand, hard to implement, hard to interoperate, and hard to secure.

I look at Google and Amazon and EBay and Salesforce and see them doing tens of millions of transactions a day involving pumping XML back and forth over HTTP, and I cant help noticing that they dont seem to need much WS-apparatus.

One way to view the various WS-* specifications is that they are akin to Java Specification Requests (JSRs) in the Java world. A JSR is basically a way for various Java vendors to standardize on a mechanism for solving a particular customer problem. Usually this mechanism takes the form of an Application Programming Interface (API). Some JSRs are widely adopted and have become an integral aspect of programming on the Java platform (e.g. the JAXP JSR). Some JSRs are pushed by certain vendors while being ignored by others leading to overlap (e.g. the JDO JSR which was voted against by BEA, IBM and Oracle but supported by Macromedia and Sun). Then there's Enterprise Java Beans which is generally decried as a bloated and unnecessarily complex solution to business problems. Again that was the product of the JSR process.

The various WS-* specs are following the same pattern as JSRs which isn't much of a surprise since a number of the players are the same (e.g. Sun & IBM). Just like Tim Bray points out that one can be productive without adopting any of the WS-* family of specifications it is similarly true that one can be productive in Java without relying on the products of JSRs but instead  rolling one's own solutions. However this doesn't mean there aren't benefits of standardizing on the high level mechanisms for solving various business problems besides saying "We use XML and HTTP so we should interop".   

Omri Gazitt, the Produce Unit Manager of the Advanced XML Web Services team has a post on WS-Transfer and WS-Enumeration which should hit close to home for Tim Bray since he is the co-chair of the Atom working group

WS-Transfer is a spec that Don has wanted to publish for a year now.  It codifies the simple CRUD pattern for Web services (the operations are named after their HTTP equivalents - GET, PUT, DELETE, and there is also a CREATE pattern.  The pattern of manipulating resources using these simple verbs is quite prevalent (Roy Fielding's REST is the most common moniker for it), and of course it underlies the HTTP protocol.  Of course, you could implement this pattern before WS-Transfer, but it does help to write this down so people can do this over SOAP in a consistent way.  One interesting existing application of this pattern is Atom (a publishing/blogging protocol built on top of SOAP).  Looking at the Atom WSDL, it looks very much like WS-Transfer - a GET, PUT, DELETE, and POST (which is the CREATE verb specific to this application).  So Atom could easily be built on top of WS-Transfer.  What would be the advantage of that?  The same advantage that comes with any kind of consistent application of a technology - the more the consistent pattern is applied, the more value it accrues.  Just the value of baking that pattern into various toolsets (e.g. VS.NET) makes it attractive to use the pattern. 

I personally think WS-Transfer is very interesting because it allows SOAP based applications to model themselves as REST Web Services and get explicit support for this methodolgy from toolkits. I talked about WS-Transfer with Don a few months ago and I've had to bite my tongue for a while whenever I hear people complain that SOAP and SOAP based toolkits don't encourage building RESTful XML Web Services.

I'm not as impressed with WS-Enumeration but I find it interesting that it also covers another use case of the ATOM API which is a mechanism for pulling down the content archive  from a weblog or similar system in a sequential manner. 


 

Categories: Technology | XML

September 19, 2004
@ 08:44 PM

I saw Ghost in the Shell 2: Innocence last night. I'm not a big fan of some of the critically acclaimed plot driven members of the anime genre such as the original Ghost in the Shell and Akira. I always thought they didn't have enough action nor did they delve deeply enough into the philosophical questions they raised. Thus I have tended to prefer anime that doesn't have any pretension of intellectual depth and was just straight violence fest such as Crying Freeman, Ninja Scroll and most recently Berserk.

Ghost in the Shell 2 struck a happy balance for me. It had excellent action, especially the scene where Batuo visits a local Yakuza hang out with a heavy machine gun. I also thought the exploration of what it means to be truly human in a world where the line between man and machine is continually blurred was better done in Innocence than in the original Ghost in the Shell. It just seemed a lot less heavy handed.

I'm definitely going to pick up the Ghost in the Shell TV series from Fry's later this week

Rating: **** out of *****


 

Categories: Movie Review

September 19, 2004
@ 08:00 PM

Reading a post on Dave Winer blog I caught the following snippet

NY Times survey of spyware and adware. "...a program that creeps onto a computer’s hard drive unannounced, is wrecking the Internet." 

I've noticed that every time I sit at the computer of a non-technical Windows user I end up spending at least an hour removing spyware from their computer. Yesterday, I encountered a particularly nasty piece of work that detected when the system was being scanned by Ad-Aware and forced a system reboot. I'd never realized how inadequate the functionality of the Add or Remove Programs dialog was for removing applications from your computer until spyware came around. After I was done yesterday, I began to suspect that some of the spyware that was polite enough to add an entry in "Add or Remove Programs" simply took the Uninstall instruction as a command to go into stealth mode. One application made you fill out a questionairre before it let you uninstall it. I wondered if it would refuse to uninstall if it didn't like my answers to its questions.

Something definitely has to be done about this crap. In the meantime I suggest using at least two anti-spyware applications if attempting to clean a system. I've already mentioned Ad-Aware, my other recommendation is Spybot - Search & Destroy.


 

Categories: Technology