While at ETech I got to spend about half an hour chatting with Steve Gillmor about what he's called "the attention problem" which isn't the same thing as the attention.xml specification. The attention problem is the problem that faces every power users of XML syndication clients such as RSS Bandit or Bloglines. It is so easy to subscribe to various feeds that eventually readers get overwhelmed by the flood of information hitting their aggregator's inbox. Some have used the analogy "drinking from a firehose" to describe this phenomenon.

This problem affects me as well which is the impetus for a number of features in the most recent release of RSS Bandit such as newspaper views which allow one to view all the unread posts in a feed in single pane, adding more sortable columns such as author and comment count to the list view, and skim mode ('mark all items as read on exiting a feed or category'). However the core assumption behind all these features is that the user is reading every entry.

Ideally a user should be able to tell a client, "Here are the sites I'm interested in, here are the topics I'm interested in, and now only show me stuff I'd find interesting or important". This is the next frontier of features for RSS/ATOM aggregators and an area I plan to invest a significant amount of time in for the next version of RSS Bandit.

In my post Some Opinions on the Attention.xml Specification I faulted the attention.xml specification because it doesn't seem to solve the problems it sets out to tackle and some of the data in the format is unrealistic for applications to collect. After talking to Steve Gillmor I realize another reason I didn't like the attention.xml spec; it ignores all the hard problems and assumes they've been solved. Figuring out what data or what algorithms are useful for determining what items are relevant to a user is hard. Using said data to suggest new items to the user is hard. Coming up with an XML format for describing an arbitrary set of data that could be collected by an RSS aggregator is easy.

There are a number of different approaches I plan to explore over the next few months in various alphas of the Nightcrawler release of RSS Bandit.  My ideas have run the gamut from using Bayesian filtering to using the Technorati link cosmos feature for weighting posts [in which case I'd need batch methods which is something I briefly discussed with Kevin Marks at Etech last week]. There is also weighting by author that needs to be considered, for example I read everything written by Sam Ruby and Don Box. Another example is a topic that may be mundane (e.g. what I had for lunch) and something I'd never read if published by a stranger but would be of interest to me if posted by a close friend or family member.

We will definitely need a richer extensibility model so I can try out different approaches [and perhaps others can as well] before the final release. Looks like I have yet another spring and summer spent indoors hacking on RSS Bandit to look forward to. :)


 

Thursday, March 24, 2005 8:57:38 PM (GMT Standard Time, UTC+00:00)
Now we're getting into some interesting stuff. This is exactly the type of research I'm interested in helping with. The final solution will probably be some combination of a rules based system, combined with Bayesian filtering, with a sprinkle of collaborative filtering.
Friday, March 25, 2005 12:06:19 AM (GMT Standard Time, UTC+00:00)
[Sorry about the length of this comment. I don't actually have a blog to post this to :) ]

I think it's great to go chasing after things like Bayesian filtering, collaboration/tagging, whatever. Some very powerful functionality will come from that, eventually. On the other hand, it's a really hard problem and if you go down that route first we probably won't ever get to actually use it!

Assuming many methods (including, but not limited to, attention.xml) contribute to the generation of a set of items you should read ... what IS the result, actually? I'm assuming it's as simple as a list of [fully qualified item + priority] pairs. The RSS reader figures out how best to group them, sort them, render them, navigate between them, etc. in user-friendly ways.

Given that, why not start easy? How about figuring out a priority at the *feed* level before trying at the item level?

Here's what I did: I took my top-level feed folders ("Coding", "Fun", "Opinion", "Friends" etc) and split them up by priority. Now I have "1-Coding","2-Coding","3-Coding","1-Fun","2-Fun",etc. When I subscribe to a feed I stick it where I think it belongs. If I find myself keeping up with it regularly, I bump it up. If I find it filling with 100's of unread posts, I bump it down.

It works very well, with the only downside being a little maintenance headache because RSS readers are not designed to work exactly this way. Also, it completely cured me of the feeling of "oh no I'm falling behind" - every once in a while I get a chance to go into my "archives" and confirm why I'm not paying attention to the low-priority feeds, or re-evaluate.

I think it would deliver incredible bang-for-buck if we simply had a way to assign priorities to feeds in Wolverine/Bloglines/whatever, and a few interesting ways to sort them in folder lists ("sort by Name-Priority", "sort by Priority-Name", at least.)

This is very compatible with Bayesian filtering or whatever other fanciful ideas people may have. First off, the "bump-up" and "bump-down" can be automated or at least recommended. Next, new feeds can be automatically added at a medium priority; recommended feeds could even be added automatically. Hard statistics such as post frequency could be combined with soft statistics such as your interest level in the topic, through configurable formulas. ETC.

Meanwhile work can proceed on an item-level version of the same thing. I tend to agree with Dare that attention.xml does not solve the "attention" problem. The questions it attempts to answer are definitely not ones that I've asked myself:

How many sources of information must you keep up with?
-> as many as I want to add

Tired of clicking the same link from a dozen different blogs?
-> nope, because I get different opinions and don't actually click the link after the first time

RSS readers collect updates, but with so many unread items, how do you know which to read first?
-> because the feeds have a higher priority
Steve Eisner
Friday, March 25, 2005 12:09:28 AM (GMT Standard Time, UTC+00:00)
BTW, just want to say thanks for writing a Priority 1 blog. ;)
Steve Eisner
Saturday, March 26, 2005 12:53:14 AM (GMT Standard Time, UTC+00:00)
It's always interesting to watch other people come to this same realisation and then watch as they try the same obvious things I tried when I started on my RSS aggregator. If you take a look at Aggrevator: http://www.oshineye.com/software/aggrevator.html you can see an attempt at building an aggregator which lets users rate the things they like and by simply recording their behaviour enables the things they're most interested in to float up to the top.

It doesn't use bayesian filtering, latent semantic indexing, contextual network graphs or anything fancy because simple scoring gets you a long way. Ask any GNUS user.

Speaking as someone who has been interested in this for a _long_ time (I've been on advogato for years at least partly because it's one of the few successful applications of this kind of technology) I would love it if someone found a way to give users more powerful ways of filtering large amounts of data. Unfortunately everyone I've seen who has investigated this has been sucked into either the RDF or the bayesian black holes where they spend ages on complex computer science and seldom emerge with anything useful.

A successful solution needs to first of all get the user experience right. You'd need answers to questions like:
- does this thing only work from my dataset or is it collaborative?
- what options are available if this thing makes decisions I dislike?
- how do I reinforce it's decisions?
- how do I tell it that there are things I ought to like but which I hate?

Good luck.
Saturday, April 2, 2005 10:35:17 PM (GMT Daylight Time, UTC+01:00)
Blog software also could be helpful on this next wave that you are describing, Dare. For example, I modded the blog software we're using for search so that a virtually unlimited number of Feedster-style keyword RSS feeds could be generated. It was a relatively basic script modification which also worked with A9 columns, so I was able to kill two birds with one stone.

This way readers can subscribe to only the keywords/topics they are interested in. I wish all blog software made that feature available easily to the bloggers. Of course it still would require the bloggers to choose to enable this functionality.

This next wave of aggregators has great potential for users :)
Comments are closed.