Nightcrawler Thoughts: Thumbs Up, Thumbs Down and Attention.xml

March 24, 2005

@ 06:13 PM

While at ETech I got to spend about half an hour chatting with Steve Gillmor about what he's called "the attention problem" which isn't the same thing as the attention.xml specification. The attention problem is the problem that faces every power users of XML syndication clients such as RSS Bandit or Bloglines. It is so easy to subscribe to various feeds that eventually readers get overwhelmed by the flood of information hitting their aggregator's inbox. Some have used the analogy "drinking from a firehose" to describe this phenomenon.

This problem affects me as well which is the impetus for a number of features in the most recent release of RSS Bandit such as newspaper views which allow one to view all the unread posts in a feed in single pane, adding more sortable columns such as author and comment count to the list view, and skim mode ('mark all items as read on exiting a feed or category'). However the core assumption behind all these features is that the user is reading every entry.

Ideally a user should be able to tell a client, "Here are the sites I'm interested in, here are the topics I'm interested in, and now only show me stuff I'd find interesting or important". This is the next frontier of features for RSS/ATOM aggregators and an area I plan to invest a significant amount of time in for the next version of RSS Bandit.

In my post Some Opinions on the Attention.xml Specification I faulted the attention.xml specification because it doesn't seem to solve the problems it sets out to tackle and some of the data in the format is unrealistic for applications to collect. After talking to Steve Gillmor I realize another reason I didn't like the attention.xml spec; it ignores all the hard problems and assumes they've been solved. Figuring out what data or what algorithms are useful for determining what items are relevant to a user is hard. Using said data to suggest new items to the user is hard. Coming up with an XML format for describing an arbitrary set of data that could be collected by an RSS aggregator is easy.

There are a number of different approaches I plan to explore over the next few months in various alphas of the Nightcrawler release of RSS Bandit. My ideas have run the gamut from using Bayesian filtering to using the Technorati link cosmos feature for weighting posts [in which case I'd need batch methods which is something I briefly discussed with Kevin Marks at Etech last week]. There is also weighting by author that needs to be considered, for example I read everything written by Sam Ruby and Don Box. Another example is a topic that may be mundane (e.g. what I had for lunch) and something I'd never read if published by a stranger but would be of interest to me if posted by a close friend or family member.

We will definitely need a richer extensibility model so I can try out different approaches [and perhaps others can as well] before the final release. Looks like I have yet another spring and summer spent indoors hacking on RSS Bandit to look forward to. :)