The Perils of Premature Standardization: Attention Data and OPML

November 20, 2005

@ 08:01 PM

Nick Bradbury has a post entitled An Attention Namespace for OPML where he writes

In a recent post I said that OPML would be a great format for sharing attention data, but I wasn't sure whether this would be possible due to uncertainty over OPML's support for namespaces.
...
As I mentioned previously, FeedDemon already stores attention data in OPML, but it uses a proprietary fd: namespace which relies on attributes that make little sense outside of FeedDemon. What I propose is that aggregator users and developers have an open discussion about what specific attention data could (and should) be collected by aggregators.
Although there's a lot of attention data that could be stored in OPML, my recommendation is that we keep it simple - otherwise, we risk seeing each aggregator support a different subset of attention data. So rather than come up with a huge list of attributes, I'll start by recommending a single piece of attention data: rank.

We need a way to rank feeds that makes sense across aggregators, so that when you export OPML from one aggregator, the aggregator you import into would know which feeds you're paying the most attention to. This could be used for any number of things - recommending related feeds, giving higher ranked feeds higher priority in feed listings, etc.

Although user interface and workflow differences require each aggregator to have its own algorithm for ranking feeds, we should be able to define a ranking attribute that makes sense to every aggregator. In FeedDemon's case, a simple scale (say, 0-100) would work: feeds you rarely read would get be ranked closer to zero, while feeds you read all the time would be ranked closer to 100. Whether this makes sense outside of FeedDemon remains to be seen, so I'd love to hear from developers of other aggregators about this.

I used be the program manager responsible for a number of XML technologies in the .NET Framework while I was on the XML team at Microsoft. The technology I spent the most time working with was the XML Schema Definition Language (XSD). After working with XSD for about three years, I came to the conclusion that XSD has held back the proliferation and advancement of XML technologies by about two or three years. The lack of adoption of web services technologies like SOAP and WSDL on the world wide web is primarily due to the complexity of XSD. The fact that XQuery has spent over 5 years in standards committees and has evolved to become a technology too complex for the average XML developer is also primarily the fault of XSD. This is because XSD is extremely complex and yet is rather inflexible with minimal functionality. This state of affairs is primarily due to its nature as a one size fits all technology with too many contradictory design objectives. In my opinion, the W3C XML Schema Definition language is a victim of premature standardization. The XML world needed experiment more with various XML schema languages like XDR and RELAX NG before we decided to settle down and come up with a standard.

So what does this have to do with attention data and XML? Lots. We are a long way from standardization. We aren't even well into the experimentation stage yet. How many feed readers do a good job of giving you an idea of which among the various new items in your RSS inbox are worth reading? How many of them do a good job suggesting new feeds for you to read based on your reading habits? Until we get to a point where such features are common place in feed readers, it seems like putting the cart way before the horse to start talking about standardizing the XML representation of these features.

Let's look at the one field Nick talks about standardizing; rank. He wants all readers to track 'rank' using a numeric scale of 1-100. This seems pretty arbitrary. In RSS Bandit, users can flag posts as Follow Up, Review, Read, Reply or Forward. How does that map to a numeric scale? It doesn't. If I allowed users to prioritize feeds, it wouldn't be in a way that would map cleanly to a numeric scale.

My advice to Nick and others who are entertaining ideas around standardizing attention data in OPML; go build some features first and see which ones work for end users and which ones don't. Once we've figured that out amongst multiple readers with diverse user bases, then we can start thinking about standardization.

Categories: RSS Bandit | Syndication Technology

« Reading Lists in OPML and RSS Bandit | Home | RSS Bandit [Nightcrawler Edition] Progre... »

Sunday, 20 November 2005 22:30:44 (GMT Standard Time, UTC+00:00)

I don't disagree, I *can't* disagree. I've not had anything remotely like your experience with XSD but the time I spent with the stuff led to very similar conclusions.

But I'd add that standardizing on domain-specific languages at the syntax/schema level is generally painful over time because languages are generally brittle at that level. A more agile approach would be to use a common substrate language on top of the syntax and remove/add parts as they are found to not work or be needed. Maybe there is a way to that kind of flexibility with Relax NG, but until I see it in practice, personally I'll be sticking to RDF.

Danny

Monday, 21 November 2005 07:22:03 (GMT Standard Time, UTC+00:00)

"How many feed readers do a good job of giving you an idea of which among the various new items in your RSS inbox are worth reading? How many of them do a good job suggesting new feeds for you to read based on your reading habits? Until we get to a point where such features are common place in feed readers"
- Amen! I can't wait until gets to that point, and I really think that will be when 'Web 2.0' takes users beyond flashy AJAX apps and into real internet strength...

Chris

Monday, 21 November 2005 17:59:31 (GMT Standard Time, UTC+00:00)

Wow, I am so glad you posted this. It is right on about XSD and about how a new OPML type of thing will best come out of experience rather than standardization beforehand. May the best feed readers lead the way!

Ben Bryant

Monday, 21 November 2005 18:50:07 (GMT Standard Time, UTC+00:00)

Doomed to fail just for the simple reason that brownnosers are pushing OPML when it does bring no benefit at all.

My advice?

Start with the "attention" tag, from there build the format.
Add as many tags and attributes as needed, organize, test, redesign, then start trimming unused or redundant info.

Apply the 80/20 rule and then create namespaces for very specific scenarios.

Of course the big pushers won't like this option, because they want their names in flashy lights when this hits the streets.

My name? it doesn't matter more than the standard itself.

The first tag is the wrong tag, it should start with "ATTENTION" not OPML.

Doomed

Monday, 21 November 2005 19:04:33 (GMT Standard Time, UTC+00:00)

We don’t have to define the standard right away, but there’s no reason not to discuss it right now since some aggregators are already experimenting with attention, and they're each using proprietary extensions for storing that data.

Why not start by talking about what attention data you’d like to collect in RSS Bandit? If 'rank' is arbitrary to you, then what would be useful?

Nick Bradbury

Saturday, 26 November 2005 03:08:07 (GMT Standard Time, UTC+00:00)

Why are people so obsessed with _collecting_ attention data rather than _using_ attention data? If you start out by building features that are useful for real people you can derive the attention data you collect from the things that real people find useful about the different kinds of features you've tried out. Personally I'm subscribed to nearly 7k feeds and my aggregator has no problem showing me what's the most interesting out of that mass of feeds. Why? I have 2 years worth of 'attention' data based on real usage and very simple scoring. It works.

It's only when you get to this point (where you have lots of raw data in whatever format and some knowledge of what helps end users achieve their goals) that you can start thinking about all-singing all-dancing universal interchange formats. Till then this is all just hand-waving rather than tackling the important questions.

Here's an example of the very crude data I have on my laptop installation of Aggrevator.
Dare's blog: 360 unread posts(including duplicates), the oldest post (which happens to be unread) dates from 2005-07-03, score is: 468.

This means that when I start up my aggregator his blog is number 14.

That's not terribly detailed and I haven't done much in the way of analysis about the distribution of scores between the various blogs. However the very simple scoring algorithm I use (1 point for every article read or link clicked with 10 points added/removed for articles I particularly like/dislike) is enough to sort all 7000 feeds into the order of preference.

ade

Saturday, 26 November 2005 03:16:02 (GMT Standard Time, UTC+00:00)

Oops. Make that
"the oldest post (which happens to be unread) dates from 2004-07-03".

ade

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for The Perils of Premature Standardization: Attention Data and OPML - Dare Obasanjo's weblog