October 3, 2004
@ 06:38 PM

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds that are often technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators. This is the second in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website.

This week's gem is the Sun Bloggers RSS feed. This RSS feed is a combined feed for all the blogs hosted at http://blogs.sun.com. This means that at any given time the feed most likely contains posts by multiple authors.

To highlight the problem with the feed I present the following two item elements taken from the feed a few minutes ago.

 <item>
    <title>Something fishy...</title>
    <description>A king was very fond of fish products. He went fishing in the only river of his kingdom. While fishing he accidently dropped his diamond ring presented by his wife - The Queen. A fish in the river mistook the sparkling ring for an insect and swallowed it. The fisherman caught the fish and sold it to a chef. The King on the other side was very sad and apologistic. Took the Queen to a restaurant for a dinner and ordered a fried fish. The chef presented the same which had the diamond ring inside. King was happy to find the ring back and rewarded the restaurant. The restaurant rewarded the chef and the Chef rewarded the fisherman. The fisherman then went back to the river, killed all the fishes in search of another diamond ring. I never understood the motto of the story but there is certainly something fishy about it!</description>
    <category>General</category>
    <guid isPermaLink="true">http://blogs.sun.com/roller/page/ashish/20041002#something_fishy</guid>
    <pubDate>Sat, 2 Oct 2004 08:53:15 PDT</pubDate>
  </item>
  <item>
    <title>Another one bytes the dust...</title>
    <description>Well, more like another one got bitten. Accoring to &lt;a href="http://www.heise.de/newsticker/meldung/51749"&gt;this&lt;/a&gt; (german) article from &lt;a href="http://www.heise.de"&gt;Heise&lt;/a&gt; Mr. Gates got himself some Spyware on his personal/private systems, and has now decided to take things into his own hand (or at least into those of his many and skilled engineers). Bravo!&lt;p&gt; Spyware or other unwanted executables like e.g. &lt;a href="http://securityresponse.symantec.com/avcenter/expanded_threats/dialers/"&gt;dialers&lt;/a&gt; are puzzeling me for some time now, since I simply don't understand how those thinks can be kept legal at all. No one needs dialers. There are enough good ways for online payment. No one in their right mind can honestly belive, that anyone with a serious business would need any of that crap. It's a plain ripoff scheme.&lt;p&gt;</description>
    <category>General</category>
    <guid isPermaLink="true">http://blogs.sun.com/roller/page/lars/20041002#another_one_bytes_the_dust</guid>
    <pubDate>Sat, 2 Oct 2004 07:32:18 PDT</pubDate>
  </item>

The problem with the feed is that even though the RSS 2.0 specification has a provision for an author element and the Dublin Core RSS module has the dc:creator element which can be used in its stead the Sun Bloggers RSS feed eschews directly identifying the author of the post in the feed. 

The obvious benefits of identifying authors in collaborative feeds include enabling the reader to better determine whether the speaker is an authority on the topic at hand or begin to ascribe authority to the author if the reader was previously unaware of the author. Then there are aggregator specific benefits such as the fact that readers could then group or filter items in the feed based on the author thus improving their reading experience.  

A solution to this problem is for the webmaster of the Sun Bloggers site to begin to use author elements to identify the authors of the various posts in the Sun Bloggers feed.


 

October 3, 2004
@ 06:10 PM

I just came up on some invites for Wallop a couple of days ago but have been too busy with work to explore it so far. I have about half a dozen invites to give away but don't have any friends who I think would use the system enough to give Lili and co the sort of feedback they want for their research project. If you are a friend of mine who would be interested in exploring and using Wallop then ping me over email or respond to this blog posting.

By the way if you are curious about what Wallop is, the quick description of the project is

is a research project of the Social Computing Group at Microsoft Research, exploring how people share media and build conversations in the context of social networks. 

With any luck I'll get a chance to explore Wallop over the next few days and perhaps will post my thoughts on how the experience compares to other online communities targetted at the same niche.


 

October 2, 2004
@ 05:57 AM

From Len Bullard

>What's the silver bullet?

It's a bar in Phoenix.

From Tim Bray

I disagree with virtually every technical argument Ted Nelson has ever
made and (in most cases) the implementations are on my side, but it
doesn't matter; Ted's place in history is secure because he asked more
important questions than just about anybody.   I think he usually
offered the wrong answers, but questions are more important.

The thread that produced these gems is Ted Nelson's "XML is Evil"  which revisits Ted Nelson's classic rant Embedded Markup Considered Harmful.

 


 

Categories: XML

October 2, 2004
@ 05:46 AM

It's always interesting to me how the same event can be reported completely differently depending on who's reporting the news. For example compare the headline US army massacres over 100 civilians in Iraq from Granma international where it begins

BAGHDAD, October 1 (PL).—, More than 100 Iraqi civilians have been killed and some 200 injured in Samarra and Sadr City today during the cruelest retaliatory operations that the US occupation forces have launched do date.

According to medical sources quoted by the Arab TV network Al Arabiya, 94 people died and another 180 were injured when soldiers from the US 1st Infantry Division attacked a civilian area of the city of Samarra with heavy weaponry. 

In the Sadr City district, located in west Baghdad, US soldiers massacred nine civilians during an operation to eliminate militia forces loyal to the wanted Islamic Shiite cleric Moqtada Al Sadr. Another three people were seriously injured.

to the following description of the same events reported by the Telegraph entitled '100 rebels dead' after US troops storm Samarra where it begins

American forces have stormed the rebel-held town of Samarra, claiming more than 100 insurgents killed, as coalition forces try to establish control in the Sunni triangle.

The US military said 109 fighters and one US soldier were killed in the offensive. Doctors at Samarra's hospital, said 47 bodies were taken in, including 11 women and five children.

An Iraqi spokesman said 37 insurgents were captured. During the push, soldiers of the US 1st Infantry Division rescued Yahlin Kaya, a Turkish building worker being held hostage in the city.

The operation came after "repeated attacks" on government and coalition forces had made the town a no-go zone, the US military said. Samarra lies at the heart of the Sunni Arab belt north and west of Baghdad where many towns are under the control of insurgents.

So was it a 100 civilians killed or a 100 insurgents? The truth is probably somewhere in the middle. Iraq is becoming even more of a giant Messopotamia. So far I can only see two choices for the US in Iraq over the next year; pull out or attempt to retake the country in force. Either way there's even more significant and unnecessary loss of life coming up.

All this because of some chicken hawks in the Bush administration...


 

Categories: Ramblings

September 29, 2004
@ 08:33 AM

The Bloglines press release entitled New Bloglines Web Services Selected by FeedDemon, NetNewsWire and Blogbot to Eliminate RSS Bandwidth Bottleneck has this interesting bit of news

Redwood City, Calif.--September 28, 2004 -- Three leading desktop news feed and blog aggregators announced today that they have implemented new open application programming interfaces (API) and Web Services from Bloglines (www.bloglines.com) that connect their applications to Bloglines' free online service for searching, subscribing, publishing and sharing news feeds, blogs and rich web content. FeedDemon (www.bradsoft.com), NetNewsWire (www.ranchero.com), and Blogbot (www.blogbot.com) are the first desktop software applications to use the open Bloglines Web Services.

Bloglines Web Services address a key issue facing the growing RSS market by reducing the bandwidth demands on sites serving syndicated news feeds. Now, instead of thousands of individual desktop PCs independently scanning news sources, blogs and web sites for updated feeds, Bloglines will make low-bandwidth requests to each site on behalf of the universe of subscribers and cache any updates to its master web database. Bloglines will then redistribute the latest content to all the individuals subscribed to those feeds via the linked desktop applications -- FeedDemon, NetNewsWire or Blogbot -- or via Bloglines' free web service.
...
Bloglines Web Services Enable Synchronization for Desktop News Aggregators "Our customers have been looking for the ability to synchronize their feed subscriptions across multiple computers," said Nick Bradbury, founder of Bradbury Software and creator of FeedDemon, the leading RSS aggregator for Windows. "By partnering with Bloglines, we are now able to offer the rich desktop functionality FeedDemon customers have come to expect, with the flexible mobility and portability of a web service."

There are two aspects of this press release I'm skeptical about. The first is that having desktop aggregators fetch feeds from Bloglines versus the original sources of the feeds somehow "eliminates the RSS bandwidth bottleneck". It seems to me that the Bloglines proposal does the opposite. Instead of thousands of desktop aggregators fetching tens of thousands to hundreds of thousands of feeds from as many websites instead it is proposed that they all ping the Bloglines server. This seems to be creating a bottleneck to me, not the other way around.

The second aspect of the proposal is that I call into question is the Bloglines Sync API. The information on this API is quite straightforward

The Bloglines Sync API is used to access subscription information and to retrieve blog entries. The API currently consists of the following functions:

  • listsubs - The listsubs function is used to retrieve subscription information for a given Bloglines account.
  • getitems - The getitems function is used to retrieve blog entries for a given subscription.

All calls use HTTP Basic authentication. The username is the email address of the Bloglines account, and the password is the same password used to access the account through the Bloglines web site.

I was interested in using this API to round out the existing feed synchronization support in RSS Bandit. In current versions a user can designate a file share, WebDAV server or FTP server as the central location for synchronizing multiple instances of RSS Bandit. I investigated what it would take to add Bloglines as a fourth synchronization point after reading the aforementioned press release and came to the conclusion that the API provided by Bloglines falls short of providing the functionality that exists in RSS Bandit today with the other synchronization sources.  

The problems with the Bloglines Sync API include

  1. The Bloglines Sync API only allows clients to retrieve the subscribed feeds. The user has to login to the Bloglines site to perform feed management tasks like adding, deleting or modifying the feeds to which they they are subscribed.
  2. No granular mechanism to get or set the read/unread state of the items in the users feed list. 

These limitations don't make using the Bloglines Sync API a terribly useful way for synchronizing between two desktop aggregators. Instead, it primarily acts as a way for Bloglines to use various desktop aggregators as a UI for viewing a user's Bloglines subscriptions without the Bloglines team having to build a rich client application.

Thanks, but I think I'm going to pass.


 

September 27, 2004
@ 07:28 AM

Today my TiVo dissapointed me for what I feel is the last time. Yesterday I set it to record 4 never before aired episodes of Samurai Jack. Sometime between 6:30 PM this evening and 11 PM the TiVo decided that recording suggestions was more important than keeping around one of the episodes of Samurai Jack.

For the past few months I've been disappointed with the TiVo's understanding of priority when keeping recordings. For example, it shouldn't delete a first run episode over a rerun especially when the Season Pass for the deleted episode is set to record only first runs.  This is a pretty basic rule I'm sure I could write myself if I had access to the TiVo source code. This last mistake is the straw that has broken the camel's back and I'm now seeking a replacement to TiVo.

I now realize I prefer an Open Source solution so I can hack it myself. Perhaps I should take a look at MythTV.


 

Categories: Technology

September 26, 2004
@ 07:10 PM

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds designed by the class of people Mark Pilgrim described in his post Why specs matter as assholes. These are people who

read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos.  Then they write code that is meticulously spec-compliant, but useless.  If someone yells at them for writing useless software, they smugly point to the sentence in the spec that clearly spells out how their horribly broken software is technically correct

This is the first in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website. Feeds in these series will often be technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators.

This week's gem is the Cafe con Leche RSS feed. Instead of pointing out what is wrong with this feed myself I'll let the author of the feed do so himself. On September 24th Elliotte Rusty Harold wrote

I've been spending a lot of time reviewing RSS readers lately, and overall they're a pretty poor lot. Latest example. Yesterday's Cafe con Leche feed contained this completely legal title element:

<title>I'm very pleased to announce the publication of XML in a Nutshell, 3rd edition by myself and W.
          Scott Means, soon to be arriving at a fine bookseller near you.
          </title>

Note the line break in the middle of the title content. This confused at least two RSS readers even though there's nothing wrong with it according to the RSS 0.92 spec. Other features from my RSS feeds that have caused problems in the past include long titles, a single URL that points to several stories, and not including more than one day's worth of news in a feed.

Elliote is technically right, none of the RSS specs says that the <link> element in an RSS feed should be unique for each item so he can reuse the same link for multiple items and still have a valid RSS feed.  So why does this cause problems for RSS aggregators?

Consider the following RSS feed

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item 1</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
  </channel>
</rss>

Now consider the same feed fetched a few hours later

<rss version="0.92">
  <channel>
    <title>Example RSS feed</title>
    <link>http://www.example.com</link>
    <description>This feed contains an example of how not to design an RSS feed</description>  
    <item>
      <title>I am item one</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
     <item>
      <title>I am item 3</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
      <title>I am item 2</title>    
      <link>http://www.example.com/rssitem</link>
    </item>
   </channel>
</rss>

Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one?  The simple answer is that it can't. A naive hack is to look at the content of the <description> element to see if it is the same but what happens when a typo was fixed or some update to the content of the <description>?

Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.

A solution to this problem is for Elliotte Rusty Harrold to upgrade his RSS feed to RSS 2.0 and use guid elements to distinctly identify items.


 

September 26, 2004
@ 05:42 PM

As I mentioned in my post News Aggregators As Denial of Service Clients (part 2) 

the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

It also turned out that the newest version of dasBlog also stopped supporting HTTP Conditional GET to category specific feeds when I upgraded from 1.5 to 1.6. This meant I was wasting a huge amount of bandwidth since thousands of RSS Bandit users are subscribed to the feed for my RSS Bandit category.

I decided to download the dasBlog source code and patch my local instance. As I expected it took longer to figure out how to configure ASP.NET and Visual Studio to allow me to compile my own blog software than it did to fix the problem. I guess that's a testament to how well the dasBlog code is written.

Mad props go out to Omar, Clemens and the rest of the dasBlog crew.


 

Categories: Ramblings

In her post Blog Activity Julia Lerman writes

There must be a few people who have their aggregators set to check rss feeds every 10 seconds or something. I very rarely look at my stats because they don't really tell me much. But I have to say I was a little surprised to see that there were over 14,000 hits to my website today (from 12am to almost 5pm).

So where do they come from?

10,000+ are from NewzCrawler then a whole lot of other aggregators and then a small # of browsers. 

This problem is due to the phenomenon originally pointed out by Phil Ringnalda in his post What's Newzcrawler Doing? and expounded on by me in my post News Aggregators As Denial of Service Clients. Basically 

According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.  

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home.  Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.


 

In a post entitled When will Scoble earn his Longhorn pay? Robert Scoble writes

The thing is that I don't have any credibility left when it comes to Longhorn. Over the last 18 months I got out there and lead lots of Longhorn cheers. And now there's a changing of direction.

Tons of people, both inside and outside of Microsoft, have been talking with me about where we're going now. I've met in the past week with the Avalon and WinFS teams (yes, they both still exist).

The thing is, I am super sensitive right now to making a whole new round of promises. I'd rather wait to talk until there's beta build to hand you. Why? Cause what good does it do to write about the feature set if you can't see it? And if you're a developer, you don't want to hear FUD, you wanna see working APIs.

Shortly after Robert joined Microsoft I sent him a link to Joel Spolsky's Mouth Wide Shut article because I thought he was going overboard in pimping Longhorn. Experience working with product teams at Microsoft had already thought me that until a technology is in beta almost everything about it can change. For example, the plans my team had for what we were shipping in Whidbey two years ago are very different from what we planned to ship a year ago which is very different plan to ship today. Features gets cut all the time, priorities change and then there's the date driven release dance.

Microsoft has always had a credibility problem due to what people have termed vaporware announcements. Although many have assumed that the company does this maliciously the truth of the matter is that a lot of these incidents are product teams prematurely announcing their plans to the world. Personally I think Microsoft's evangelists and marketing folks could do the company, our customers and the software industry in general a service by shutting the hell up about future product plans until they were more than a glimmer in some software architect's eye.

Borrowing a leaf from Apple doesn't sound so bad right about now.


 

Categories: Life in the B0rg Cube