Dare Obasanjo's weblog

Brizzly, Seesmic Web and the Future of RSS

Recently I came across two blogs I thought were interesting and would love to follow regularly; Chris Dixon's blog and the Inside Windows Live blog. What surprised me was that my first instinct was to see if they were on Twitter instead of adding their RSS feeds to my favorite RSS reader. I thought this was interesting and decided to analyze my internal thought process that led me to preferring following blogs via Twitter instead of consuming the RSS feeds in Google Reader + RSS Bandit.

I realized it comes down to two things, one I’ve mentioned before and the second which dawned on me recently

The first problem is that the user experience around consuming feeds in traditional RSS readers which take their design cues from email readers is all sorts of wrong. I’ve written about this previously in my post The Top 5 Reasons RSS Readers Went Wrong. Treating every blog post as important enough that I have view the entire content and explicitly mark it as read is wrong. Not providing a consistent mechanism to give the author feedback or easily reshare the content is archaic in today’s world. And so on.
The mobile experience for consuming Twitter streams is all sorts of awesome. I currently use Echofon to consume Twitter on my phone and have used Twitterific which is also excellent. I’ve also heard people say lots of good things about Tweetie. On the other hand, I haven’t found a great mobile application for consuming RSS feeds on my mobile phone which may be a consequence of #1 above.

So I’ve been thinking about how to make my RSS experience more like my Twitter experience given that not all the blogs I read are on Twitter or will ever be on the service. At first I flirted with building a tool that automatically creates a Twitter account for a given RSS feed but backed away from that when I remembered that the Twitter team hates people using it as a platform for rebroadcasting RSS feeds.

I realized that what I really need is a Twitter applicationthat also understands RSS feeds and shows them in the same stream. In addition, I may have been fine with this being a new app on the Web but don’t want to lose the existing Twitter clients on my mobile phone. So I really want a web app that shows me a merged Twitter/RSS streams and that exposes the Twitter API so I can point apps like Echofon/Twitterific/Tweetie at it.

As I thought about which web app could be closest to doing this today I landed on Brizzly and Seesmic Web. These sites are currently slightly different web interfaces to to the Twitter service which [at least to me] currently haven’t provided enough value above and beyond the Twitter website for me to use on a regular basis. Being able to consume both my RSS feeds and my Twitter stream on such services would not only serve as a differentiator between them and other Twitter web clients but would also be functionality that Twitter wouldn’t be able to make obsolete given their stated dislike of RSS content on their service.

I’d write something myself except that I doubt that the authors of Twitter mobile apps will be interested in making it easy to consume a Twitter stream from sites other than http://www.twitter.com unless lots of their users ask for this feature which will only happen if services like Brizzly, Seesmic Web and others start providing a reason to consume Twitter-like streams from non-Twitter sources.

Categories: Syndication Technology

December 21, 2009

@ 03:09 PM

Some Thoughts on the Twitter API as a "standard API" for microblogging

There are a couple of posts written this past weekend about services beginning to expose their services using the Twitter API and how this marks the rise of Twitter as a de facto standard for use in microblogging (or whatever we're calling it these days).

The first post I was on this topic was from Fred Wilson in his post Open APIs and Open Standards where he writes

As Dave Winer has been pointing out in recent weeks, there is something quite interesting happening in the blogging/microblogging world.

First WordPress allowed posting and reading wordpress blogs via the Twitter API.

Then yesterday our portfolio company Tumblr did the same.

John Borthwick has been advising companies for a while now to build APIs that mimic the Twitter API. His reasoning is that if your API look and feels similar to the Twitter API then third party developers will have an easier time adopting it and building to it. Makes sense to me.

But what Wordpress and Tumblr have done is a step farther than mimicing the API. They have effectively usurped it for their own blogging platforms. In the case of Tumblr, they are even replicating key pieces of their functionality in it

Anil Dash quickly followed up by declaring The Twitter API is Finished. Now What? and stating

Twitter's API has spawned over 50,000 applications that connect to it, taking the promise of fertile APIs we first saw with Flickr half a decade ago and bringing it to new heights. Now, the first meaningful efforts to support Twitter's API on other services mark the maturation of the API as a de facto industry standard and herald the end of its period of rapid fundamental iteration.

From here, we're going to see a flourishing of support for the Twitter API across the web, meaning that the Twitter API is finished. Not kaput, complete. If two companies with a significant number of users that share no investors or board members both support a common API, we can say that the API has reached Version 1.0 and is safe to base your work on. So now what?

This is a pattern that repeats itself regularly in the software industry; companies roll their own proprietary APIs or data formats in a burgeoning space until one or two leaders emerge and then the rest of the industry quickly wants to crown a winning data format or API to prevent Betamax vs. VHS style incompatibility woes for customers and developers.

Given that this is a common pattern, what can we expect in this instance? There are typically two expected outcomes when such clamoring for a company's proprietary platform or data format to become the property reaches a fever pitch. The first outcome is similar to what Anil Dash and Fred Wilson have described. Some competitors or related companies adopt the format or API as is to take advantage of the ecosystem that has sprung up around the winning platform. This basically puts the company (Twitter in this case) in a spot where they either have to freeze the API or bear the barbs from the community if they ever try to improve the API in a backwards incompatible way.

The problem with freezing the API is that once it becomes a de facto standard all sorts of folks will show up demanding that it do more than it was originally expected to do since they can't ship their own proprietary solutions now that there is a "standard". This is basically what happened during the RSS vs. Atom days where Dave Winer declared that RSS is Frozen. What ended up happening was that there were a lot of people who felt that RSS and it's sister specifications such as the MetaWeblog API were not the final word in syndicating and managing content on the Web. Dave Winer stuck to his guns and people were left with no choice but to create a conflicting de jure standard to compete with the de facto standard that was RSS. So Atom vs. RSS became the XML syndication world's Betamax vs. VHS or Blu-Ray vs. HD-DVD. As a simple thought experiment, what happens if Twitter goes along with the idea that their API is some sort of de facto standard API for microcontent delivered in real-time streams. What happens when a company like Facebook decides to adopt this API but needs to API to be expanded because it doesn't support their features? And that they need the API to be constantly updated since they add new features on Facebook at a fairly rapid clip? Amusingly enough there are already people preemptively flaming Facebook for not abandoning their API and adopting Twitter's even though it is quite clear to any observer that Facebook's API predates Twitter's, has more functionality and is supported by more applications & websites.

Things get even more interesting if Facebook actually did decide to create their own fork or "profile" of the Twitter API due to community pressure to support their scenarios. Given how this has gone down in the past such as the conflict between Dave Winer and the RSS Advisory board or more recently Eran Hammer-Lahav's strong negative reaction to the creation of OAuth WRAP which he viewed as a competitor to OAuth, it is quite likely that a world where Facebook or someone else with more features than Twitter decided to adopt Twitter's API wouldn't necessarily lead to everyone singing Kumbaya.

Let's say Twitter decides to take the alternate road and ignores this hubbub since the last thing a fast moving startup needs is to have their hands tied by a bunch of competitors telling them they can't innovate in their API or platform any longer. What happens the first time they decide to break their API or even worse deprecate it because it no longer meets their needs? That isn't far fetched. Google deprecated the Blogger API in favor of GData (based on the Atom Publishing Protocol) even though Dave Winer and a bunch of others had created a de facto standard around a flavor of the API called the MetaWeblog API. About two weeks ago Facebook confirmed that they were deprecating a number of APIs used for interacting with the news feed. What happens to all the applications that considered these APIs to be set in stone? It is a big risk to bet on a company's platform plans even when they plan to support developers let alone doing so as a consequence of a bunch of the company's competitors deciding that they want to tap into its developer ecosystem instead of growing their own.

The bottom line is that it isn't as simple as saying "Twitter is popular and it's API is supported by lots of apps so everyone needs to implement their API on their web site as well". There are lots of ways to create standards. Crowning a company's proprietary platform as king without their participation or discussion in an open forum is probably the worst possible way to do so.

Note Now Playing: Eminem - Hell Breaks Loose Note

Categories: Syndication Technology | Web Development

November 2, 2009

@ 02:50 PM

Real-time, Distributed Conversations: Some Thoughts on the Salmon Protocol

Last week John Panzer, who works on Blogger at Google, wrote about some of the work he’s been doing on creating a protocol for syndicating comments associated with activity streams in his post The Salmon Protocol: Introducing the Salmon Project. Key parts of his post are excerpted below

A few days ago, at the Real Time Web Summit, we had a session about Salmon, a protocol for re-aggregated distributed conversations around web content. I was hoping for some feedback and to generate some interest, and I was overwhelmed by the positive reactions, especially after Louis Gray's post "Proposed Salmon Protocol aims to unify Conversations on the Web". Adina Levin's "Salmon - Re-assembling distributed conversations" is a good, insightful review as well. There's clearly a great deal of interest in this, and so I've gone ahead and expanded Salmon's home at salmon-protocol.org with an open source project, salmon-protocol.googlecode.com, and a mailing list, groups.google.com/group/salmon-protocol.

Louis Gray’s post on the topic includes an embedded presentation which captures the essence of the protocol

Before talking about the technical details of the protocol it is a good idea to understand the end user problem the protocol solves. For me, it solves a problem I have in the way that RSS Bandit integrates with Facebook. The problem is that although there is a way to get regular updates on changes to the user’s news feed by polling Facebook’s stream and getting data back in the Activity Stream format there isn’t a mechanism today to get updates on the comments on items in the feed. What it means in practice today is that once an item rolls off of the news feed, there is no way to keep the comments up to date in RSS Bandit.

The Salmon Protocol aims to address this problem by piggybacking on PubSubHubBub as a way for applications to get real-time updates on comments on items in an activity stream not just updates on new activities.

There have also been several mentions of Salmon being a way to aggregate distributed conversations on an item (e.g. this blog post is syndicated to FriendFeed and there are comments there as well as in the comments on my blog) but I am less clear on those scenarios or whether Salmon is enough to solve the various tough problems that need to be solved to make that work end to end.

Any API for posting comments to a site needs to solve two problems; identity and dealing with comment spam. I decided to take a look at the Salmon Protocol Summary to see how it addresses these problems.

The meat of the Salmon Protocol format is excerpted below

A source provides an RSS/Atom feed of content. It includes a Salmon link in its feed:

<link rel="salmon" href="http://example.org/salmon-endpoint"/>

An aggregator reads the feed (ideally via a push mechanism such as PubSubHubbub), and sees from the link that it is Salmon-enabled. It remembers the endpoint URL for later use.

When an aggregator's user leaves a comment on a feed item, the aggregator stores the comment as usual, and then also POSTs a salmon version of it to the source's Salmon endpoint:

POST /salmon-endpoint HTTP/1.1

Host: example.org

Content-Type: application/atom+xml

<?xml version='1.0' encoding='UTF-8'?>

    <entry xmlns='http://www.w3.org/2005/Atom'>

    <author>

      <name>John Doe</name>

      <uri>acct:johndoe@aggregator-example.com</uri>

    </author>

    <content>Yes, but what about the llamas?</content>

    <id>tag:aggregator-example.com,2009:cmt-441071406174557701</id>

    <updated>2009-09-28T18:30:02Z</updated>

    <thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0'

       ref='tag:example.org,1999:id-22717401685551851865'/>

    <sal:signature xmlns:sal='http://salmonprotocol.org/ns/1.0'>

        e55bee08b4c643bc8aedf122f606f804269b7bc7

    </sal:signature>

    <title/>

</entry>

The commenter is identified in the published comment using the atom:uri element. How this author is authenticated in situations outside of public comments on a blog such as RSS Bandit posting a comment to Facebook on my behalf isn’t really discussed. I noticed an offhand reference to OAuth headers which seems to imply that the publishing application should also be sending authentication headers as well when publishing the comment. How these authentication headers would flow through the systems involved is unclear to me especially given the approach Salmon has taken to deal with spam prevention.

The workflow for dealing with spam comments is described as follows

A major concern with this type of distributed protocol is how to prevent spam and abuse. Salmon provides building blocks to allow in-depth defense against attacks. Specifically, every salmon has a verifiable author and user agent. The basic security flow when salmon swims upstream looks like this:

aggregator-example.com: "Here is a salmon, authored and signed by 'acct:johndoe@aggregator-example.com'; please accept it."

Recipient: "I know that this is really aggregator-example.com due to its OAuth headers, and it has a good reputatation, but I do not trust it completely; I will do a double check."

Recipient: Uses Webfinger/XRD to discover salmon validation service for acct:johndoe@aggregator-example.com, which turns out to be hosted by aggregator-example.com.

Recipient: "Given that johndoe has delegated Salmon validation to aggregator-example, and I know I'm talking to aggregator-example already, I'll skip the actual check." (Returns HTTP 200 to aggregator-example.com)

The flow can get more complicated, especially if the aggregator is not also providing identity services for the user. In the most general case, the recipient needs to take the salmon, discover a salmon validator service for the author via XRD discovery on the author's URI, and POST the salmon to the validator service. The validator service does an integrity / signature check against the salmon and returns 200 if the salmon checks out, 400 if not. The signature check means that the given author (johndoe in this case) signed the salmon with the given id, parent id, and timestamp. It does not attempt to do a full, XML-DSig style verification, though such a service is another reasonable extension.

This flow seems weird and it is unclear to me that it actually solves the problems involved in distributed commenting. So let’s say I post a comment to Facebook from RSS Bandit, in step 3 above they are now supposed to use WebFinger to lookup my email address provider and determine which service I use for digitally signing comments. Then they ask it if the comment looks like it was from me.

Hmm, this looks like a user authentication workflow in disguise as a comment validation workflow. Shouldn’t the service receiving the comment (i.e. Facebook) in the example above be responsible for validating my identity not some third party service? Maybe this protocol wasn’t meant for sites like Facebook?

Let’s say this protocol is really meant for situations when the comment recipient doesn’t intend to be the sole identity provider such as commenting on Robert Scoble's blog where he allows comments from anyone with just an email address and an optional web page URL as identifiers. So each commenter needs to provide an email address on an email service provider that supports WebFinger and validates digital signatures in the specific situation related to the Salmon protocol? Sounds like boiling the ocean. I wonder why this can’t work with OpenID validation or some other authentication protocol that has already been validated by developers and is seeing some adoption?

At the end of the day, I think the problem Salmon attempts to solve is one that needs solving as activity streams become a more popular and intrinsic feature across the Web. However in its current form it’s hard for me to see how it actually solves the real problems that exist today in a practical way.

Of course, this may just be my misunderstanding of the protocol documents currently published and I look forward to being corrected by one of the protocol gurus if that is the case.

Note Now Playing: Chris Brown - I Can Transform Ya (feat. Lil Wayne) Note

Categories: Social Software | Syndication Technology

August 26, 2009

@ 05:03 PM

The Top 5 Reasons RSS Readers Went Wrong

Sam Diaz over at ZDNet wrote the following in a blog entry titled RSS: A good idea at the time but there are better ways now in response to an announcement of a new feature in Google Reader

Once a big advocate for Google Reader, I have to admit that I haven’t logged in in weeks, maybe months. That’s not to say I’m not reading. Sometimes I feel like reading - and writing this blog - are the only things I do. But my sources of for reading material are scattered across the Web, not in one aggregated spot.

I catch headlines on Yahoo News and Google News. I have a pretty extensive lineup of browser bookmarks to take me to sites that I scan throughout the day. Techmeme is always in one of my browser tabs so I can keep a pulse on what others in my industry are talking about. And then there are Twitter and Facebook. I actually pick up a lot of interesting reading material from people I’m following on Twitter and some friends on Facebook, with some of it becoming fodder for blog posts here.

The truth of the matter is that RSS readers are a Web 1.0 tool, an aggregator of news headlines that never really caught on with the mainstream the way Twitter and Facebook have.

I take issue with the title of Sam’s post since his complaint is really about the current generation of consumer tools for reading RSS feeds not the underlying technology itself. In general, I agree with Sam that the current generation of RSS readers have failed users and I now use pretty much the same tools that he does to catch up on blog (i.e. Twitter & Techmeme). I’ve listed some of my gripes with RSS readers including the one I wrote (RSS Bandit) in the past and will reiterate some of these points below

Dave Winer was right about River of News style aggregators. A user interface where I see a stream of news and can click on the bits that interest me without doing a lot of management is superior to the using the current dominant RSS reader paradigm where I need to click on multiple folders, manage read/unread state and wade through massive walls of text I don’t want to read to get to the gems.
Today’s RSS readers are a one way tool instead of a two-way tool. One of the things I like about shared links in Twitter & Facebook is that I can start or read a conversation about the story and otherwise give feedback (i.e. “like” or retweet) to the publisher of the news as part of the experience. This is where I think Sam’s comment that these are “Web 1.0” tools rings the truest. Google Reader recently added a “like” feature but it is broken in that the information about who liked one of my posts never gets back to me whereas it does when I share this post on Twitter or Facebook.
As Dave McClure once ranted, it's all about the faces. The user interface of RSS readers is sterile and impersonal compared to social sites like Twitter and Facebook because of the lack of pictures/faces of the people whose words you are reading. It always makes a difference to me when I read a blog and there is a picture of the author and the same goes for just browsing a Twitter account.
No good ways to separate the wheat from the chaff. As if it isn’t bad enough that you are nagged about having thousands of unread blog posts when you don’t visit your RSS reader for a few days, there isn’t a good way to get an overview of what is most interesting/pressing and then move on by marking everything as read. On the other hand, when I go to Techmeme I can always see what the current top stories are and can even go back to see what was popular on the days I didn’t visit the site.
The process of adding feeds still takes too many steps. If I see your Twitter profile and think you’re worth following, I click the “follow” button and I’m done. On the other hand, if I visit your blog there’s a multi-step process involved to adding you to my subscriptions even if I use a web-based RSS aggregator like Google Reader.

These are the five biggest bugs in the traditional RSS reading experience today that I hope eventually get fixed since it is holding back the benefits people can get from reading blogs and/other activity streams using the open & standard infrastructure of the Web.

Categories: Syndication Technology

May 5, 2009

@ 01:17 PM

Comments [21]

RSS readers modeled after email clients are fundamentally broken

Farhad Manjoo has an article on Slate entitled Kill your RSS reader which captures a growing sentiment I’ve had for a while and ranted about during a recent panel at SXSW. Below are a few key excerpts from Farhad’s article that resonate strongly with me

In theory, the RSS reader is a great idea. Many years ago, as blogs became an ever-larger part of my news diet, I got addicted to Bloglines, one of the first popular RSS programs. I used to read a dozen different news sites every day, going to each site every so often to check whether something fresh had been posted. With Bloglines, I just had to list the sites I loved and it would do the visiting for me. This was fantastic—instead of scouring the Web for interesting stories, everything came to me!
...
But RSS started to bring me down. You know that sinking feeling you get when you open your e-mail and discover hundreds of messages you need to respond to—that realization that e-mail has become another merciless chore in your day? That's how I began to feel about my reader. RSS readers encourage you to oversubscribe to news. Every time you encounter an interesting new blog post, you've got an incentive to sign up to all the posts from that blog—after all, you don't want to miss anything. Eventually you find yourself subscribed to hundreds of blogs, many of which, you later notice, are completely useless. It's like having an inbox stuffed with e-mail from overactive listservs you no longer care to read.

It's true that many RSS readers have great tools by which to organize your feeds, and folks more capable than I am have probably hit on ways to categorize their blogs in a way that makes it easy to get through them. But that was just my problem—I began to resent that I had to think about
organizing my reader.

This mirrors my experience of that of many of my friends who used to be enthusiastic users of RSS readers. Today I primarily find out what’s going on in blogs using a combination of Twitter, Techmeme and Planet Intertwingly. The interesting thing is that I’m already subscribed to about half of the blogs that end up getting linked to in these sources on a regular basis yet I tend to avoid firing up my RSS reader.

The problem is that the RSS readers I use regularly, Google Reader and RSS Bandit, take their inspiration from email clients which is the wrong model for consuming casual content like blogs. Whenever I fire up an email application like Outlook or Hotmail it presents me with a list of tasks I must complete in the form of messages that need responses, work items, meeting invitations, spam that needs to deleting, notifications related to commercial/financial transactions that I need to be aware of and so on. Reading email is a chore where you are constantly taunted by the BOLD unread messages indicator silently nagging you about the stuff you haven’t done yet.

Given that a significant percentage of the time, the stuff in my email inbox is messages that were sent directly to me that need some form of response or acknowledgment this model is somewhat sound although as many have pointed out there is a lot of room for improvement.

When it comes to blogs and other casual content, this model breaks down. I really don’t need a constant nagging reminder that I haven’t read the half dozen reposts of the same tech news stories about Google, Twitter and Facebook after I’ve seen the first one. Furthermore, if I haven’t fired up my reader in a while then I don’t care to be nagged about all the stuff I missed since they are just blogs so it is OK if I never read them. This opinion isn’t new, Dave Winer has been evangelizing “River of News” style aggregators for several years and given the success of this model for social networking sites like Facebook and microblogging sites like Twitter, it’s clear that Dave was onto something.

Looking back at the time I’ve spent working on RSS Bandit, I realize there are a couple of features I added to attempt to glom the river of news model on top of an email based model for reading feeds. These features include

the ability to mark all items as read after navigating away from a feed. This allows you to skim the interesting headlines then not have to deal with the “guilt” of not reading the rest of the items in the feed.
a reading pane inspired by Google Reader where unread items are presented in a single flow and marked as read as you scroll past each item

Looking back now, it seems to me that the way we think of RSS readers needs to fundamentally change. Presenting information as a news feed where the user isn’t pressured to read every item or feel like a failure is one way to move the needle on the user experience here. What I wonder is whether it isn’t already too late for this category of applications as services like Twitter & Facebook take over as how people keep up to date with what’s going on with the people and content they care about.

Categories: Syndication Technology

March 24, 2009

@ 01:07 PM

Sharing social activity streams across the Web: How Gnip fits in

Over the past two weeks I participated in panels at both the SXSW and MIX 09 on the growing trend of provide streams of user activities on social sites and aggregating these activities from multiple services into a single experience. Aggregating activities from multiple sites into a single service for the purpose of creating a activity stream is fairly commonplace today and was popularized by Friendfeed. This functionality now exists on many social networking sites and related services including Facebook, Yahoo! Profile and the Windows Live Profile.

In general, the model is to receive or retrieve user updates from a social media site like Flickr and make these updates available on the user's profile on the target social network and share it with the user's friends via an activity stream (or news feed) on the site. The diagram below attempts to capture this many-to-many relationship as it occurs today using some well known services as examples.

The bidirectional arrows are meant to indicate that the relationship can be push-based where the content-based social media site notifies the target social network of new updates from the user or pull-based where the social network polls the site on a regular basis seeking new updates from the target user.

There are two problems that sites have to deal with in this model

Content sites like Flickr have to either deal with being polled unnecessarily millions of times a day by social networks seeking photo updates from their users. There is the money quote from last year that FriendFeed polled Flickr 2.7 million times a day to retrieve a total of less than 7,000 updates. Even if they move to a publish-subscribe model it would mean not only having to track which users are of interest to which social network but also targeting APIs on different social networks that are radically different (aka the beautiful f-ing snowflake API problem).
Social aggregation services like Friendfeed and Windows Live have to target dozens of sites each with a different APIs or schemas. Even in the case where the content sites support RSS or Atom, they often use radically different schemas for representing the same data.

The approach I've been advocating along with others in the industry is that we need to adopt standards for activity streams in a way that reduces the complexity of this many-to-many conversation that is currently going on between social sites.

While I was at SXSW, I met one of the folks from Gnip who is advocating an alternate approach. He argued that even with activity stream standards we've only addressed part of the problem. Such standards may mean that FriendFeed gets to reuse their Flickr code to poll Smugmug with little to no changes but it doesn't change the fact that they poll these sites millions of times a day to get a few thousand updates.

Gnip has built a model where content sites publish updates to Gnip and then social networking sites can then choose to either poll Gnip or receive updates from Gnip when the update matches one of the rules they have created (e.g. notify us if you get a digg vote from Carnage4Life). The following diagram captures how Gnip works.

The benefit of this model to content sites like Flickr is that they no longer have to worry about being polled millions of times a day by social aggregation services. The benefit to social networking sites is that they now get a consistent format for data from the social media sites they care about and can choose to either pull the data or have it pushed to them.

The main problem I see with this model is that it sets Gnip up to be this central point of failure and I'd personally rather deal interact directly with the content services directly instead of inject a middle man into the process. However I can see how their approach would be attractive to many sites who might be buckling under the load of being constantly polled and to social aggregation sites that are tired of hand coding adapters for each new social media sites they want to integrate with.

What do you think of Gnip's service and the problem space in general?

Note Now Playing: Eamon – F**k It (I Don't Want You Back) Note

Categories: Startup Shoutout | Syndication Technology

January 12, 2009

@ 02:10 PM

Can RDF really save us from data format proliferation?

Bill de hÓra has a blog post entitled Format Debt: what you can't say where he writes

The closest thing to a deployable web technology that might improve describing these kind of data mashups without parsing at any cost or patching is RDF. Once RDF is parsed it becomes a well defined graph structure - albeit not a structure most web programmers will be used to, it is however the same structure regardless of the source syntax or the code and the graph structure is closed under all allowed operations.

If we take the example of MediaRSS, which is not consistenly used or placed in syndication and API formats, that class of problem more or less evaporates via RDF. Likewise if we take the current Zoo of contact formats and our seeming inability to commit to one, RDF/OWL can enable a declarative mapping between them. Mapping can reduce the number of man years it takes to define a "standard" format by not having to bother unifying "standards" or getting away with a few thousand less test cases.

I've always found this particular argument by RDF proponents to be suspect. When I complained about the the lack of standards for representing rich media in Atom feeds, the thrust of the complaint is that you can't just plugin a feed from Picassa into a service that understands how to process feeds from Zooomr without making changes to the service or the input feed.

RDF proponents often to argue that if we all used RDF based formats then instead of having to change your code to support every new photo site's Atom feed with custom extensions, you could instead create a mapping from the format you don't understand to the one you do using something like the OWL Web Ontology Language. The problem with this argument is that there is a declarative approach to mapping between XML data formats without having to boil the ocean by convincing everyone to switch to RD; XSL Transformations (XSLT).

The key problem is that in both cases (i.e. mapping with OWL vs. mapping with XSLT) there is still the problem that Picassa feeds won't work with an app that understand's Zoomr's feeds until some developer writes code. Thus we're really debating on whether it is ~~better~~ cheaper to have the developer write declarative mappings like OWL or XSLT instead of writing new parsing code in their language of choice.

In my experience I've seen that creating a software system where you can drop in an XSLT, OWL or other declarative mapping document to deal with new data formats is cheaper and likely to be less error prone than having to alter parsing code written in C#, Python, Ruby or whatever. However we don't need RDF or other Semantic Web technologies to build such solution today. XSLT works just fine as a tool for solving exactly that problem.

Note Now Playing: Lady GaGa & Colby O'Donis - Just Dance Note

Categories: Syndication Technology | XML

January 8, 2009

@ 01:02 PM

Representing Rich Media and Social Network Activities in RSS/Atom Feeds

As I've mentioned previously, one of the features we shipped in the most recent release of Windows Live is the ability to import your activities from photo sharing sites like Flickr and PhotoBucket or even blog posts from a regular RSS/Atom feed onto your Windows Live profile. You can see this in action on my Windows Live profile.

One question that has repeatedly come up for our service and others like it, is how users can get a great experience from just importing RSS/Atom feeds of sites that we don't support in a first class way. A couple of weeks ago Dave Winer asked this of FriendFeed in his post FriendFeed and level playing fields where he writes

Consider this screen (click on it to see the detail):

Suppose you used a photo site that wasn't one of the ones listed, but you had an RSS feed for your photos and favorites on that site. What are you supposed to do? I always assumed you should just add the feed under "Blog" but then your readers will start asking why your pictures don't do all the neat things that happen automatically with Flickr, Picasa, SmugMug or Zooomr sites. I have such a site, and I don't want them to do anything special for it, I just want to tell FF that it's a photo site and have all the cool special goodies they have for Flickr kick in automatically.

If you pop up a higher level, you'll see that this is actually contrary to the whole idea of feeds, which were supposed to create a level playing field for the big guys and ordinary people.

We have a similar problem when importing arbitrary RSS/Atom feeds onto a user's profile in Windows Live. For now, we treat each imported RSS feed as a blog entry and assume it has a title and a body that can be used as a summary. This breaks down if you are someone like Kevin Radcliffe who would like to import his Picasa Web albums. At this point we run smack-dab into the fact that there aren't actually consistent standards around how to represent photo albums from photo sharing sites in Atom/RSS feeds.

Let's look at the RSS/Atom feeds from three of the sites that Dave names that aren't natively supported by Windows Live's Web Activities feature.

Picassa

<item>
  <guid isPermaLink='false'>http://picasaweb.google.com/data/entry/base/user/bo.so.po.ro.sie/albumid/5280893965532109969/photoid/5280894045331336242?alt=rss&amp;hl=en_US</guid>
  <pubDate>Wed, 17 Dec 2008 22:45:59 +0000</pubDate>
  <atom:updated>2008-12-17T22:45:59.000Z</atom:updated>
  <category domain='http://schemas.google.com/g/2005#kind'>http://schemas.google.com/photos/2007#photo</category>
  <title>DSC_0479.JPG</title>
  <description>&lt;table&gt;&lt;tr&gt;&lt;td style="padding: 0 5px"&gt;&lt;a href="http://picasaweb.google.com/bo.so.po.ro.sie/DosiaIPomaraCze#5280894045331336242"&gt;&lt;img style="border:1px solid #5C7FB9" src="http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/s288/DSC_0479.JPG" alt="DSC_0479.JPG"/&gt;&lt;/a&gt;&lt;/td&gt;&lt;td valign="top"&gt;&lt;font color="#6B6B6B"&gt;Date: &lt;/font&gt;&lt;font color="#333333"&gt;Dec 17, 2008 10:56 AM&lt;/font&gt;&lt;br/&gt;&lt;font color=\"#6B6B6B\"&gt;Number of Comments on Photo:&lt;/font&gt;&lt;font color=\"#333333\"&gt;0&lt;/font&gt;&lt;br/&gt;&lt;p&gt;&lt;a href="http://picasaweb.google.com/bo.so.po.ro.sie/DosiaIPomaraCze#5280894045331336242"&gt;&lt;font color="#3964C2"&gt;View Photo&lt;/font&gt;&lt;/a&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;</description>
  <enclosure type='image/jpeg' url='http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/DSC_0479.JPG' length='0'/>
  <link>http://picasaweb.google.com/lh/photo/PORshBK0wdBV0WPl27g_wQ</link>
  <media:group>
    <media:title type='plain'>DSC_0479.JPG</media:title>
    <media:description type='plain'></media:description>
    <media:keywords></media:keywords>
    <media:content url='http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/DSC_0479.JPG' height='1600' width='1074' type='image/jpeg' medium='image'/>
    <media:thumbnail url='http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/s72/DSC_0479.JPG' height='72' width='49'/>
    <media:thumbnail url='http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/s144/DSC_0479.JPG' height='144' width='97'/>
    <media:thumbnail url='http://lh4.ggpht.com/_xRL2P3zJJOw/SUmBJ6RzLDI/AAAAAAAABX8/MkPUBcKqpRY/s288/DSC_0479.JPG' height='288' width='194'/>
    <media:credit>Joanna</media:credit>
  </media:group>
</item>

Smugmug

<entry>
   <title>Verbeast's photo</title>
   <link rel="alternate" type="text/html" href="http://verbeast.smugmug.com/gallery/5811621_NELr7#439421133_qFtZ5"/>
   <content type="html">&lt;p&gt;&lt;a href="http://verbeast.smugmug.com"&gt;Verbeast&lt;/a&gt; &lt;/p&gt;&lt;a href="http://verbeast.smugmug.com/gallery/5811621_NELr7#439421133_qFtZ5" title="Verbeast's photo"&gt;&lt;img src="http://verbeast.smugmug.com/photos/439421133_qFtZ5-Th.jpg" width="150" height="150" alt="Verbeast's photo" title="Verbeast's photo" style="border: 1px solid #000000;" /&gt;&lt;/a&gt;</content>
   <updated>2008-12-18T22:51:58Z</updated>
   <author>
     <name>Verbeast</name>
     <uri>http://verbeast.smugmug.com</uri>
   </author>
   <id>http://verbeast.smugmug.com/photos/439421133_qFtZ5-Th.jpg</id>
   <exif:DateTimeOriginal>2008-12-12 18:37:17</exif:DateTimeOriginal>
 </entry>

Zooomr

 <item>
      <title>ギンガメアジとジンベイ</title>
      <link>http://www.zooomr.com/photos/chuchu/6556014/</link>
      <description>
        &lt;a href=&quot;http://www.zooomr.com/photos/chuchu/&quot;&gt;chuchu&lt;/a&gt; posted a photograph:&lt;br /&gt;

        &lt;a href=&quot;http://www.zooomr.com/photos/chuchu/6556014/&quot; class=&quot;image_link&quot; &gt;&lt;img src=&quot;http://static.zooomr.com/images/6556014_00421b6456_m.jpg&quot; alt=&quot;ギンガメアジとジンベイ&quot; title=&quot;ギンガメアジとジンベイ&quot;  /&gt;&lt;/a&gt;&lt;br /&gt;

      </description>
      <pubDate>Mon, 22 Dec 2008 04:14:52 +0000</pubDate>
      <author zooomr:profile="http://www.zooomr.com/people/chuchu/">nobody@zooomr.com (chuchu)</author>
      <guid isPermaLink="false">tag:zooomr.com,2004:/photo/6556014</guid>
      <media:content url="http://static.zooomr.com/images/6556014_00421b6456_m.jpg" type="image/jpeg" />
      <media:title>ギンガメアジとジンベイ</media:title>
      <media:text type="html">
        &lt;a href=&quot;http://www.zooomr.com/photos/chuchu/&quot;&gt;chuchu&lt;/a&gt; posted a photograph:&lt;br /&gt;

        &lt;a href=&quot;http://www.zooomr.com/photos/chuchu/6556014/&quot; class=&quot;image_link&quot; &gt;&lt;img src=&quot;http://static.zooomr.com/images/6556014_00421b6456_m.jpg&quot; alt=&quot;ギンガメアジとジンベイ&quot; title=&quot;ギンガメアジとジンベイ&quot;  /&gt;&lt;/a&gt;&lt;br /&gt;

      </media:text>
      <media:thumbnail url="http://static.zooomr.com/images/6556014_00421b6456_s.jpg" height="75" width="75" />
      <media:credit role="photographer">chuchu</media:credit>
      <media:category scheme="urn:zooomr:tags">海遊館 aquarium kaiyukan osaka japan</media:category>
    </item>

As you can see from the above XML snippets there is no consistency in how they represent photo streams. Even though both Picasa and Zoomr use Yahoo's Media RSS extensions, they generate different markup. Picasa has the media extensions as a set of elements within a media:group element that is a child of the item element while Zooomr simply places a grab bag of Media RSS elements such including media:thumbnail and media:content as children of the item element. Smugmug takes the cake by simply tunneling some escaped HTML in the atom:content element instead of using explicit metadata to describe the photos.

The bottom line is that it isn't possible to satisfy Dave Winer's request and create a level playing field today because there are no consistently applied standards for representing photo streams in RSS/Atom. This is unfortunate because it means that services have to write one off code (aka the beautiful fucking snowflake problem) for each photo sharing site they want to integrate with. Not only is this a lot of unnecessary code, it also prevents such integration from being a simple plug and play experience for users of social aggregation services.

So far, the closest thing to a standard in this space is Media RSS but as the name states it is an RSS based format and really doesn't fit with the Atom syndication format's data model. This is why Martin Atkins has started working on Atom Media Extensions which is an effort to create a similar set of media extensions for the Atom syndication format.

What I like about the first draft of Atom media extensions is that it is focused on the basic case of syndicating audio, video and image for use in activity streams and doesn't have some of the search related and feed republishing baggage you see in related formats like Media RSS or iTunes RSS extensions.

The interesting question is how to get the photo sites out there to adopt consistent standards in this space? Maybe we can get Google to add it to their Open Stack™ since they've been pretty good at getting social sites to adopt their standards and have been generally good at evangelization.

Note Now Playing: DMX - Ruff Ryders' Anthem Note

Categories: Syndication Technology

July 27, 2008

@ 01:38 PM

When REST Doesn't Scale, XMPP to the Rescue?

There was an interesting presentation at OSCON 2008 by Evan Henshaw-Plath and Kellan Elliott-McCrea entitled Beyond REST? Building Data Services with XMPP PubSub. The presentation is embedded below.

The core argument behind the presentation can be summarized by this tweet from Tim O'Reilly

On monday friendfeed polled flickr nearly 3 million times for 45000 users, only 6K of whom were logged in. Architectural mismatch. #oscon08

On July 21st, FriendFeed had 45,000 users who had associated their Flickr profiles with their FriendFeed account. FriendFeed polls Flickr about once every 20 – 30 minutes to see if the user has uploaded new pictures. However only about 6,000 of those users logged into Flickr that day, let alone uploaded pictures. Thus there were literally millions of HTTP requests made by FriendFeed that were totally unnecessary.

Evan and Kellan's talk suggests that instead of Flickr getting almost 3 million requests from FriendFeed, it would be a more efficient model for FriendFeed to tell Flickr which users they are interested in and then listen for updates from Flickr when they upload photos.

They are right. The interaction between Flickr and FriendFeed should actually be a publish-subscribe relationship instead of a polling relationship. Polling is a good idea for RSS/Atom for a few reasons

there are a thousands to hundreds of thousands clients that might be interested in a resource so the server keeping track of subscriptions is prohibitively expensive
a lot of these end points aren't persistently connected (i.e. your desktop RSS reader isn't always running)
RSS/Atom publishing is as simple as plopping a file in the right directory and letting IIS or Apache work its magic

The situation between FriendFeed and Flickr is almost the exact opposite. Instead of thousands of clients interested in document, we have one subscriber interested in thousands of documents. Both end points are always on or are at least expected to be. The cost of developing a publish-subscribe model is one that both sides can afford.

Thus this isn't a case of REST not scaling as implied by Evan and Kellan's talk. This is a case of using the wrong tool to solve your problem because it happens to work well in a different scenario. The above talk suggests using XMPP which is an instant messaging protocol as the publish-subscribe mechanism. In response to the talk, Joshua Schachter (founder of del.icio.us) suggested a less heavyweight publish-subscribe mechanism using a custom API in his post entitled beyond REST. My suggestion for people who believe they have this problem would be to look at using some subset of XMPP and experimenting with off-the-shelf tools before rolling your own solution. Of course, this is an approach that totally depends on network effects. Today everyone has RSS/Atom feeds while very few services use XMPP. There isn't much point in investing in publishing as XMPP if your key subscribers can't consume it and vice versa. It will be interesting to see if the popular "Web 2.0" companies can lead the way in using XMPP for publish-subscribe of activity streams from social networks in the same way they kicked off our love affair with RESTful Web APIs.

It should be noted that there are already some "Web 2.0" companies using XMPP as a way to provide a stream of updates to subscribing services to prevent the overload that comes from polling. For example, Twitter has confirmed that it provides an XMPP stream to FriendFeed, Summize, Zappos, Twittervision and Gnip. However they simply dump out every update that occurs on Twitter to these services instead of having these services subscribe to updates for specific users. This approach is quite inefficient and brings it's own set of scaling issues.

The interesting question is why people are just bringing this up? Shouldn't people have already been complaining about Web-based feed readers like Google Reader and Bloglines for causing the same kinds of problems? I can only imagine how many millions of times a day Google Reader must fetch content from TypePad and Wordpress.com but I haven't seen explicit complaints about this issue from folks like Anil Dash or Matt Mullenweg.

Now Playing: The Pussycat Dolls - When I Grow Up

Categories: Syndication Technology | Web Development

February 16, 2008

@ 07:29 PM

Thoughts on Google's Proposal for Granular Updates in AtomPub

Via Sam Ruby's post Embrace, Extend then Innovate I found a link to Joe Gregorio's post entitled How to do RESTful Partial Updates. Joe's post is a recommendation of how to extend the Atom Publishing Protocol (RFC 5023) to support updating the properties of an entry without having to replace the entire entry. Given that Joe works for Google on GData, I have assumed that Joe's post is Google's attempt to float a trial balloon before extending AtomPub in this way. This is a more community centric approach than the company has previously taken with GData, OpenSocial, etc where these protocols simply appeared out of nowhere with proprietary extensions to AtomPub with an FYI to the community after the fact.

The Problem Statement

In the Atom Publishing Protocol, an atom:entry represents an editable resource. When editing that resource, it is intended that an AtomPub client should download the entire entry, edit the fields it needs to change and then use a conditional PUT request to upload the changed entry.

So what's the problem? Below is an example of the results one could get from invoking the users.getInfo method in the Facebook REST API.

<user> <uid>8055</uid> <about_me>This field perpetuates the glorification of the ego. Also, it has a character limit.</about_me> <activities>Here: facebook, etc. There: Glee Club, a capella, teaching.</activities> <birthday>November 3</birthday> <books>The Brothers K, GEB, Ken Wilber, Zen and the Art, Fitzgerald, The Emporer's New Mind, The Wonderful Story of Henry Sugar</books> <current_location> <city>Palo Alto</city> <state>CA</state> <country>United States</country> <zip>94303</zip> </current_location> <first_name>Dave</first_name> <interests>coffee, computers, the funny, architecture, code breaking,snowboarding, philosophy, soccer, talking to strangers</interests> <last_name>Fetterman</last_name> <movies>Tommy Boy, Billy Madison, Fight Club, Dirty Work, Meet the Parents, My Blue Heaven, Office Space </movies> <music>New Found Glory, Daft Punk, Weezer, The Crystal Method, Rage, the KLF, Green Day, Live, Coldplay, Panic at the Disco, Family Force 5</music> <name>Dave Fetterman</name> <profile_update_time>1170414620</profile_update_time> <relationship_status>In a Relationship</relationship_status> <religion/> <sex>male</sex> <significant_other_id xsi:nil="true"/> <status> <message>Pirates of the Carribean was an awful movie!!!</message> </status> </user>

If this user was represented as an atom:entry then each time an application wants to edit the user's status message it needs to download the entire data for the user with its over two dozen fields, change the status message in an in-memory representation of the XML document and then upload the entire user atom:entry back to the server. This is a fairly expensive way to change a status message compared to how this is approached in other RESTful protocols (e.g. PROPPATCH in WebDAV).

Previous Discussions on this Topic: When the Shoe is on the Other Foot

A few months ago I brought up this issue as one of the problems encountered when using the Atom Publishing Protocol outside of blog editing contexts in my post Why GData/APP Fails as a General Purpose Editing Protocol for the Web. In that post I wrote

Lack of support for granular updates to fields of an item: As mentioned in the previous section editing an entry requires replacing the old entry with a new one. The expected client interaction with the server is described in section 5.4 of the current APP draft and is excerpted below.
Retrieving a Resource
Client                                     Server
  |                                           |
  |  1.) GET to Member URI                    |
  |------------------------------------------>|
  |                                           |
  |  2.) 200 Ok                               |
  |      Member Representation                |
  |<------------------------------------------|
  |                                           |
The client sends a GET request to the URI of a Member Resource to retrieve its representation.

The server responds with the representation of the Member Resource.

Editing a Resource
Client                                     Server
  |                                           |
  |  1.) PUT to Member URI                    |
  |      Member Representation                |
  |------------------------------------------>|
  |                                           |
  |  2.) 200 OK                               |
  |<------------------------------------------|
The client sends a PUT request to store a representation of a Member Resource.

If the request is successful, the server responds with a status code of 200.
Can anyone spot what's wrong with this interaction? The first problem is a minor one that may prove problematic in certain cases. The problem is pointed out in the note in the documentation on Updating posts on Google Blogger via GData which states

IMPORTANT! To ensure forward compatibility, be sure that when you POST an updated entry you preserve all the XML that was present when you retrieved the entry from Blogger. Otherwise, when we implement new stuff and include <new-awesome-feature> elements in the feed, your client won't return them and your users will miss out! The Google data API client libraries all handle this correctly, so if you're using one of the libraries you're all set.

Thus each client is responsible for ensuring that it doesn't lose any XML that was in the original atom:entry element it downloaded. The second problem is more serious and should be of concern to anyone who's read Editing the Web: Detecting the Lost Update Problem Using Unreserved Checkout. The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

That post was negatively received by many members of the AtomPub community including Joe Gregorio. Joe wrote a scathing response to my post entitled In which we narrowly save Dare from inventing his own publishing protocol where he addressed that particular issue as follows

The second complaint is one of data loss:

The problem is that there is data loss if the entry has changed between the time the client downloaded it and when it tries to PUT its changes.

Fortunately, the only real problem is that Dare seems to have only skimmed the specification. From Section 9.3:

To avoid unintentional loss of data when editing Member Entries or Media Link Entries, Atom Protocol clients SHOULD preserve all metadata that has not been intentionally modified, including unknown foreign markup as defined in Section 6 of [RFC4287].

And further, from Section 9.5:

Implementers are advised to pay attention to cache controls, and to make use of the mechanisms available in HTTP when editing Resources, in particular entity-tags as outlined in [NOTE-detect-lost-update]. Clients are not assured to receive the most recent representations of Collection Members using GET if the server is authorizing intermediaries to cache them.

Hey look, we actually reference the lost update paper that specifies how to solve this problem, right there in the spec! And Section 9.5.1 even shows an example of just such a conditional PUT failing. Who knew? And just to make this crystal clear, you can build a server that is compliant to the APP that accepts only conditional PUTs. I did, and it performed quite well at the last APP Interop.

The bottom line of Joe's response is that he didn't think it was a real problem. My assumption is that his perspective on the problem has broadened now that he has a responsibility to the wide breadth of AtomPub implementations at Google as opposed to when his design decisions were being influenced by a home grown blogging server he wrote in his free time.

The Google Solution: Embrace, Extend then Innovate

Now that Joe thinks supporting granular updates of a resource is a valid scenario, he and the folks at Google have proposed the following solution to the problem. Joe writes

Now if I wanted to update part of this entry, say the title, using the mechanisms in RFC 5023 then I would change the value of the title element and PUT the whole modified entry back to the the URI http://example.org/edit/first-post.atom. Now this document isn't large, but we'll use it to demonstrate the concepts. The first thing we want to do is add a URI Template that allows us to construct a URI to PUT changes back to:
<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">
<t:link_template ref="sub" 
        href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title>Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author><name>John Doe</name></author>
    <content>Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>
Then we need to add id's to each of the pieces of the document we wish to be able to individually update. For this we'll use the W3C xml:id specification:
<?xml version="1.0"?>
<entry         
        xmlns="http://www.w3.org/2005/Atom"
        xmlns:t="http://blah...">   
    <t:link_template ref="sub" href="http://example.org/edit/first-post/{-listjoin|;|id}"/>
    <title xml:id="X1">Atom-Powered Robots Run Amok</title>
    <id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
    <updated>2003-12-13T18:30:02Z</updated>
    <author xml:id="X2"><name>John Doe</name></author>
    <content xml:id="X3">Some text.</content>
    <link rel="edit"
        href="http://example.org/edit/first-post.atom"/>
</entry>
So if I wanted to update both the content and the title I would construct the partial update URI using the id's of the elements I want to update:

http://example.org/edit/first-post/X1;X3

And then I would PUT an entry to the URI with only those child elements:
PUT /edit/first-post/X1;X3
Host: example.org

<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
   <title xml:id="X1">False alarm on the Atom-Powered Robots things</title>
   <content xml:id="X3">Sorry about that.</content>
</entry>

The Problems with the Google Solution: Your Shipment of FAIL has Arrived

Ignoring the fact that this spec depends on specifications that are either experimental (URI Templates) or not widely supported (xml:id), there are still significant problems with how this approach (mis)uses the Atom Publishing Protocol. Sam Ruby eloquently points out the problems in his post Embrace, Extend then Innovate where he wrote

With HTTP PUT, the the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. Having some servers interpret the removal of elements (such as content) as a modification, and others interpret the requests in such a way that elided elements are to be left alone is hardly uniform or self-descriptive. In fact, depending on usage, it is positively stateful.

I’m fine with a server choosing to interpret the request anyway it sees fit. As a black box, it could behave as if it updated the resource as requested and then one nanosecond later — and before it processes any other requests — fill in missing data with defaults, historical data, whatever. My concern is with clients coding with to the assumption as to how the server works. That’s called coupling.

The main problem is that it changes the expected semantics of HTTP PUT in a way that not only conflicts with how PUT is typically used in other HTTP-based protocols but also how it is used in AtomPub. It's also weird that the existence of xml:id in an Atom document is now used to imply special semantics (i.e. this field supports direct editing). I especially don't like that after all is said and done, the server controls which fields can be partially updated or not which seems to imply a tight coupling between clients and servers (e.g. some servers will support partial updates on all fields, some may only support partial updates on atom:title + atom:category while others will support partial updates on a different set of fields). So the code for editing a title or category changes depending on which AtomPub service you are talking to.

From where I stand Joe has pretty much invented yet another diff + patch protocol for XML documents. When I worked on the XML team at Microsoft, there were quite a few floating around the company including Diffgram, UpdateGram, and Patchgrams to name three. So I've been around the block when it comes to diff + patch formats for XML and this one has its share of issues. The most eye brow raising issue with the diff + patch protocol is that half the semantics of the update are in the XML document (which elements to add/edit) while the other half are in the URL (if an ID exists in the URL but is not in the document then it is a delete). This means the XML isn't very self describing nor can it really be said that the URL is identifying a resource [more like it identifies an operation].

Actual Solution: Read the Spec

In Joe's original response to my post his suggestion was that the solution to the "problem" of lack of support for granular updates of entries in AtomPUb is to read the spec. In retrospect, I agree. If a field is important enough that it needs to be identifiable and editable then it should be its own resource. If you want to make it part of another resource then use atom:link to link both resources.

Case closed. Problem solved.

Now Playing: Too Short - Couldn't Be a Better Player Than Me (feat. Lil Jon & The Eastside Boyz)

Categories: Syndication Technology | XML Web Services

February 9, 2008

@ 04:00 AM

RSS Bandit Update: v1.6.0.2 Ships and Integrating with RSS feeds in Outlook/Exchange

I just realized that the current released version of RSS Bandit doesn’t have a working code name based on a character from the X-Men comic book. The previous 1.5.0.17 release was codenamed ShadowCat while the next release is codenamed Phoenix. Since the v1.6.0.x releases have been an interim releases on the road to Phoenix, I’ve decided to give them the codename Jean Grey retroactively. Now, on to the updates.

Jean Grey (v1.6.0.x) Update

The last bug fix release of RSS Bandit fixed a few bugs but introduced a couple of even worse bugs [depending on your perspective]. We’ve shipped version 1.6.0.2 that addresses the following issues

Application crashes with AccessViolationException on startup on Windows XP.
Application crashes and red 'X' shows in feed subscriptions window on Windows XP.
User's credentials are not used when accessing feeds via a proxy server leading to proxy errors when fetching feeds.
Duplicate feed URLs not detected if they differ by trailing slash or "www." in the host name
Application crashes when displaying an error dialog when a certificate issue is detected with a secure feed.

The first three issues are regressions that were introduced as part of refactoring the code and making it work better on Windows Vista. Yet another data point that shows that you can never have too many unit tests and that beta testing isn’t a bad idea either.

You can download the new release from http://downloads.sourceforge.net/rssbandit/RssBandit1.6.0.2_Installer.zip

Phoenix (v2.0) Update

I’m continuing with my plan to make RSS Bandit a desktop client for Web based feed readers like NewsGator Online and Google Reader. I’ve been slightly sidetracked by the realization that it would be pretty poor form for a Microsoft employee to write an application that synchronized with Google’s RSS reader but not any of Microsoft’s, even if it is a side project. My current coding project is to integrate with the Windows RSS platform which would allow one to manipulate the same set of feeds in RSS Bandit, Internet Explorer 7 and Outlook 2007. The good news is that with Outlook 2007 integration, you also get Exchange synchronization for free.

The bad news has been having to use the RSS reading features of Internet Explorer 7 and Outlook 2007 on a regular basis as a way of eating my own dog food with regards to the integration features. It’s pretty stunning to see not one but two RSS reading applications that assume “mark all items as read” or “delete all feeds” are actions that users never have to take. When you have people writing shell scripts to perform basic tasks in your application then it is a clear sign that somewhere along the line, the user experience for that particular set of features got the shaft.

I’m about half way through the integration after which I’ll continue with integrating with Google Reader and finally NewsGator Online using an Outlook + Exchange style model. While I’m working on this, both Oren and Torsten will be mapping out the rewrite of the graphical user interface using WPF. I’ll probably need to buy a book on XAML or something in the next few months so I can contribute to this effort. The only thing I’ve heard about any of the various books about the subject on the market is that they all seem to have had their forewords written by Don Box. Does anyone have recommendations on which book or website I should use to start learning XAML + WPF?

Now playing: Eminem - Sing For The Moment

Categories: RSS Bandit | Syndication Technology

January 17, 2008

@ 04:00 AM

FeedDemon and NetNewsWire are now Free

It seems I missed the news when this happened last week but according to Greg Reinacker’s post NewsGator’s RSS clients are now free!

We’ve got a lot of big news today at NewsGator.

First, we’ve got new releases of our most popular applications: FeedDemon 2.6, NetNewsWire 3.1, Inbox 3.0 (beta), and NewsGator Go! for Windows Mobile 2.0. Each of these is a pretty major release on its own - tons of new features in all of them.

But second, every one of those applications is now free! Free as in beer, that is. And add to the free list NewsGator Go! for BlackBerry as well. And not only are they free, but our online services (including synchronization) are now free as well! Not to mention our iPhone reader, HTML mobile reader, and all of the other applications that are part of our online platform.

According to Greg and Nick Bradbury, the reason they are doing this is because the bulk of their profits/revenues come from selling Enterprise licenses and the desktop readers are now being used as advertising to get enterprise customers.

They also mention that the other reason they are giving away their desktop application is that they see a lot of financial value from collecting information about what feeds their users are reading. My assumption was that this is because the demographic data is being resold to marketers although both Greg and Nick make it seem like the collection of this data is benign and only used for end user facing features.

Anyway, this is pretty great news for fans of desktop RSS readers. If I didn’t already have RSS Bandit, FeedDemon would be my first choice when it comes to a desktop RSS reader for Windows. I also like the fact that we get a shout out as part of the setup experience for FeedDemon which is shown below

Thanks for the shout out Nick. Smile

Now that these apps are free, it does encourage us to step our game up with RSS Bandit. Right now, my thinking is that official version number for the release currently codenamed Phoenix will be RSS Bandit 2.0. This release will be deserve the monicker “2.0” for two reasons

The user interface will be completely rewritten from scratch by Oren and Torsten using WPF.
The application will become capable of being a full-blown desktop client to both Google Reader and NewsGator Online.

If I seem to have been blogging less, it is because I’ve been spending more time reading about code, thinking about code and writing code in my free time. I can’t wait to get the first beta out to you guys in a few months.

Now playing: Kid Rock - Welcome To The Party (Ode 2 the Old School)

Categories: RSS Bandit | Syndication Technology

December 11, 2007

@ 01:21 PM

ADO.NET Data Services (Astoria) Transforms SQL Server into an Atom Store

The top story in my favorite aggregator today is the announcement on Scott Guthrie’s blog of the ASP.NET 3.5 Extensions CTP Preview. Normally, announcements related to ASP.NET would not interest me except this time is an interesting item in the list of technologies being released

ADO.NET Data Services: In parallel with the ASP.NET Extensions release we will also be releasing the ADO.NET Entity Framework. This provides a modeling framework that enables developers to define a conceptual model of a database schema that closely aligns to a real world view of the information. We will also be shipping a new set of data services (codename "Astoria") that make it easy to expose REST based API endpoints from within your ASP.NET applications.

Wow. It looks like Astoria has quickly moved from being an experimental project to see what it would like to place RESTful interfaces on top of SQL Server database to being very close to shipping a production version. I dug around for more posts about ~~Astoria~~ ADO.NET Data Services so I could find out what was in the CTP and came across two posts from Mike Flasko and Andy Conrad respectively.

In his post entitled ADO.NET Data Services ("Project Astoria") CTP is Released on the ADO.NET Data Services team blog Mike Flasko writes

The following features are in this CTP:

Support to create ADO.NET Data Services backed by:

A relational database by leveraging the Entity Framework.

Any data source (file, web service, custom store, application logic layer, etc)

Serialization Formats:

Industry standard AtomPub serialization

JSON serialization

Simple HTTP interface

Any platform with an HTTP stack can easily consume a data service

Designed to leverage HTTP semantics and infrastructure already deployed at large

Client libraries:

.NET Framework

ASP.NET AJAX

Silverlight (coming soon)

This is sick. With Astoria I can expose my relational database or even a local just an XML file using a RESTful interface that utilizes the Atom Publishing Protocol or JSON. I am somewhat amused that one of the options is placing a RESTful interface over a SOAP Web Service. My, how times have changed…

It is pretty cool that Microsoft is the first major database vendor to bring the dream of the Atom store to fruition. I also like that one of the side effects of this is that there is now an AtomPub client library for .NET Framework. Smile

Andy Conrad has a blog post entitled Linq to REST which gives an idea of what happens when you combine the Astoria client library with the Language Integrated Query (LINQ) features of C# 3.0

    [OpenObject("PropBag")]
    public class Product{
        private Dictionary<string, object> propBag = new Dictionary<string, object>();

        [Key]
        public int ProductID { get; set; }        
        public string ProductName { get; set; }        
        public int UnitsInStock { get; set; }
        public IDictionary<string, object> PropBag { get { return propBag; } }
    }
static void Main(string[] args){ WebDataContext context = new WebDataContext("http://localhost:18752/Northwind.svc"); var query = from p in context.CreateQuery<Product>("Products") where p.UnitsInStock > 100 select p; foreach (Product p in query){ Console.WriteLine(p.ProductName + " , UnitsInStock= " + p.UnitsInStock); } }
If you hover over the query variable, you will actually see the Astoria URI which the Linq query is translated into by the Astoria client library:

http://localhost:18752/Northwind.svc/Products?$filter=(UnitsInStock)%20gt%20(100)

So, there you go. Linq to Astoria's RESTFUL API. In other words, Linq to REST.

Like I said earlier, this is sick. I need to holla at Andy and see if there is a dependency on the Atom feed containing Microsoft specific extensions or whether this Linq to REST capability can be utilized over any arbitrary Atom feed.

Now playing: Jay-Z - Success (feat. Nas)

Categories: Syndication Technology | XML Web Services

November 3, 2007

@ 07:23 PM

Comments [12]

Google OpenSocial: Technical Overview and Critique

Disclaimer: This post does not reflect the opinions, thoughts, strategies or future intentions of my employer. These are solely my personal opinions. If you are seeking official position statements from Microsoft, please go here.

One of the Google folks working on OpenSocial sent me a message via Facebook asking what I thought about the technical details of the recent announcements. Since my day job is working on social networking platforms for Web properties at Microsoft and I'm deeply interested in RESTful protocols, this is something I definitely have some thoughts about. Below is what started off as a private message but ended up being long enough to be it's own blog post.

First Impressions

In reading the OpenSocial API documentation it seems clear that is intended to be the functional equivalent of the Facebook platform. Instead of the Facebook users and friends APIs, we get the OpenSocial People and Friends Data API. Instead of the Facebook feed API, we get the OpenSocial Activities API. Instead of the Facebook Data Store API, we get the OpenSocial Persistence Data API. Instead of FQL as a friendly alternative to the various REST APIs we get a JavaScript object model.

In general, I personally prefer the Facebook platform to OpenSocial. This is due to three reasons

There is no alternative to the deep integration into the Web site's user experience that is facilitated with FBML.
I prefer idiomatic XML to tunnelling data through Atom feeds in ways that [in my opinion] add unnecessary cruft.
The Facebook APIs encourage developers to build social and item relationship graphs within their application while the OpenSocial seems only concerned with developers stuffing data in key/value pairs.

The Javascript API

At first I assumed the OpenSocial JavaScript API would provide similar functionality to FBML given the large number of sound bites quoting Google employees stating that instead of "proprietary markup" you could use "standard JavaScript" to build OpenSocial applications. However it seems the JavaScript API is simply a wrapper on top of the various REST APIs. I'm sure there's some comment one could make questioning if REST APIs are so simple why do developers feel the need to hide them behind object models?

Given the varying features and user interface choices in social networking sites, it is unsurprising that there is no rich mechanism specified for adding entry points to the application into the container sites user interface. However it is surprising that no user interface hooks are specified at all. This is surprising given that there are some common metaphors in social networking sites (e.g. a profile page, a friends list, etc) which can be interacted with in a standard way. It is also shocking that Google attacked Facebook's use of "proprietary markup" only to not even ship an equivalent feature.

The People and Friends Data API

The People and Friends Data API is used to retrieve information about a user or the user's friends as an Atom feed. Each user is represented as an atom:entry which is a PersonKind (which should not be confused with an Atom person construct). It is expected that the URL structure for accessing people and friends feeds will be of the form http://<domain>/feeds/people/<userid> and http://<domain>/feeds/people/<userid>/friends respectively.

Compare the following response to a request for a user's information using OpenSocial with the equivalent Facebook API call response.

GET http://orkut.com/feeds/people/14358878523263729569

<entry xmlns='http://www.w3.org/2005/Atom' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005'>
  <id>http://sandbox.orkut.com:80/feeds/people/14358878523263729569</id>
  <updated>2007-10-28T14:01:29.948-07:00</updated>
  <title>Elizabeth Bennet</title>
  <link rel='thumbnail' type='image/*' href='http://img1.orkut.com/images/small/1193601584/115566312.jpg'/>
  <link rel='alternate' type='text/html' href='http://orkut.com/Profile.aspx?uid=17583631990196664929'/>
  <link rel='self' type='application/atom+xml' href='http://sandbox.orkut.com:80/feeds/people/14358878523263729569'/>
  <georss:where>
    <gml:Point xmlns:gml='http://www.opengis.net/gml'>
      <gml:pos>51.668674 -0.066235</gml:pos>
    </gml:Point>
  </georss:where>
  <gd:extendedProperty name='lang' value='en-US'/>
  <gd:postalAddress/>
</entry>

Below is the what the above information would look like if returned by Facebook's users.getInfo method

GET

<users_getInfo_response xmlns="http://api.facebook.com/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://api.facebook.com/1.0/ http://api.facebook.com/1.0/facebook.xsd" list="true">
  <user>
    <uid>14358878523263729569</uid>
     <current_location>
      <city>Palo Alto</city>
      <state>CA</state>
      <country>United States</country>
      <zip>94303</zip>
    </current_location>
    <first_name>Elizabeth</first_name>
    <is_app_user>1</is_app_user>
    <has_added_app>1</has_added_app>
    <pic>http://photos-055.facebook.com/ip007/profile3/1271/65/s8055_39735.jpg</pic>
    </user>
</users_getInfo_response>

I've already mentioned that I prefer idiomatic XML to tunnelling data through Atom feeds. Comparing the readability of both examples should explain why.

The Activities Data API

A number of social networking sites now provide a feature which enables users to see the recent activities of members of their social network in an activity stream. The Facebook news feed, Orkut's updates from your friends, and the Windows Live Spaces what's new page are all examples of this feature. The OpenSocial Activities Data API provides a mechanism for OpenSocial applications to access and update this activity stream as an Atom feed. All of the users activities or all activities from a specific application can be accessed using URIs of the form http://<domain>/activities/feeds/activities/user/<userID> and http://<domain>/activities/feeds/activities/user/<userID>/source/<sourceID> respectively.

Currently there is no reference documentation on this API. My assumption is that since Orkut is the only OpenSocial site that supports this feature, it is difficult to produce a spec that will work for other services without it being a verbatim description of Orkut's implementation.

There are some notes on how Orkut attempts to prevents applications from spamming a user's activity stream. For one, applications are only allowed to update the activity stream for their source directly instead of the activity stream for the user. I assume that Google applies some filter to the union of all the source specific activity streams before generating the user's activity feed to eliminate spam. Secondly, applications are monitored to see if they post too many messages to the activity stream or if they post promotional messages instead of the user's activities to the stream. All of this makes it seem difficult to see how one could specify the behavior of this API and feature set reliably for a diverse set of social networking sites.

The Persistence Data API

The OpenSocial Persistence API allows applications to store and retrieve key<->value pairs that are either user-specific or are global to the application. An example of the former is a listing of company name and stock ticker pairs while an example of the latter is a user's stock portfolio. The feed of global key<->value pairs for an application can be accessed at a URL of the form http://<domain>/feeds/apps/<appID>/persistence/global for the entire feed and http://<domain>/feeds/apps/<appID>/persistence/global/<key> if seeking a particular key<->value pair. User-specific key<->value pairs are available at the URL of the form http://<domain>/feeds/apps/<appID>/persistence/<userID>/instance/<instanceID>.

This is probably the least interesting aspect of the API. A simple persistence API like this is useful for applications with simple storage needs that need to store user preferences or simple textual data that is needed by the application. However you aren't going to use this as the data storage platform for applications like iLike, Flixster or Scrabulous.

However I will add that an Atom feed seems like a horrible representation for a list of key<->value pairs. It's so bad that the documentation doesn't provide an example of such a feed.

Hosting OpenSocial Applications

The documentation on hosting OpenSocial applications implies that any site that can host Google gadgets can also host OpenSocial applications. In practice, it means that any site that you can place a <script> element on can point to a gadget and thus render it. Whether the application will actually work will depend on whether the hosting service has actually implemented the OpenSocial Service Provider Interface (SPI).

Unfortunately, the documentation on implementing the OpenSocial SPI is missing in action. From the Google site

To host OpenSocial apps, your website must support the SPI side of the OpenSocial APIs. Usually your SPI will connect to your own social network, so that an OpenSocial app added to your website automatically uses your site's data. However, it is possible to use data from another social network as well, should you prefer. Soon, we will provide a development kit with documentation and code to better support OpenSocial websites, along with a sample sandbox which implements the OpenSocial SPI using in-memory storage. The SPI implements:

Adding and removing friends
Adding and removing apps
Storing activities
Retrieving activity streams for self and friends
Storing and retrieving per-app and per-app-per-user data

The OpenSocial website development kit will include full SPI documentation. It will provide open source reference implementations for both client and server components.

I assume that the meat of the OpenSocial SPI is documentation is just more detailed rules about how to implement the REST APIs described above. The interesting bits will likely be the reference implementations of the API which will likely become the de facto standard implementations instead of encouraging dozens of buggy incompatible versions of the OpenSocial API to bloom.

Conclusion

In general I believe that any effort to standardize the widget/gadget APIs exposed by various social networking sites and AJAX homepages (e.g. iGoogle, Netvibes, Live.com, etc) is a good thing. Niall Kennedy has an excellent series of articles on Web Widget formats and Web Widget update technologies that shows how diverse and disparate the technologies that developers have to learn and utilize when they want to build widgets for various sites. Given that Web widgets are now a known quantity, the time is ripe for some standardization.

That said, there are a number of things that give me cause to pause with regards to OpenSocial

A common practice in the software industry today is to prefix "Open" to the name of your technology which automatically gives it an aura of goodness while attempting to paint competing technologies as being evil and "closed". Examples include OpenDocument, OpenID, OpenXML, OAuth, etc. In this case, OpenSocial is being positioned as an "open" alternative to the Facebook platform. However as bloggers like Shelley Powers, Danny Ayers and Russell Beattie have pointed out, there isn't much "open" about OpenSocial. Russell Beattie asks in his post Where the hell is the Container API?
Would people be jumping on this bandwagon so readily if it was Microsoft unilaterally coming up with an API, holding secret meetings geared towards undercutting the market leader, and then making sure that only those anointed partners get a head start on launch day by making sure a key part of the API isn't released - even in alpha. (It obviously exists already, all the partners have that spec and even sample code, I'm sure. The rest of us don't get access yet, until the GOOG says otherwise).

Let's say we ignore that the process for creating the technology was not "open" nor have key aspects of the technology even been unveiled [which makes this more of a FUD announcement to take the wind out of Facebook's sails than an actual technology announcement], is the technology itself open? Shelley Powers points out her post Terms that

Perhaps the world will read the terms of use of the API, and realize this is not an open API; this is a free API, owned and controlled by one company only: Google. Hopefully, the world will remember another time when Google offered a free API and then pulled it. Maybe the world will also take a deeper look and realize that the functionality is dependent on Google hosted technology, which has its own terms of service (including adding ads at the discretion of Google), and that building an OpenSocial application ties Google into your application, and Google into every social networking site that buys into the Dream.

Google has announced a technology platform that is every bit as proprietary as Facebook's. The only difference is that they've cut deals with some companies to utilize their proprietary platform while Facebook's platform is only for use on the Facebook site. If Zuckerburg announces next week that the Facebook platform is freely implementable by any 3rd party Web site, where does that leave OpenSocial? After all, the Facebook platform is actually a proven, working system with complete documentation instead of the incomplete rush job that OpenSocial clearly is right now.

There are all sorts of forums for proposing and discussing open Web technologies including the IETF, W3C, OASIS and even ECMA. Until all of the underlying technologies in OpenSocial have been handed over to one or more of these standards bodies, this is a case of the proprietary pot calling the proprietary kettle black.
One of the things that comes along with OpenSocial is that Google has now proposed GData as the standard protocol for interacting with social graphs on the Web. This is something that I've been worried about for a while and I've written a couple of blog posts to address this topic because it is not clear that the Atom Publishing Protocol upon which GData is based works well outside it's original purpise of editing blog posts and the like. I'm not the only one that feels this way.

Danny Ayers wrote in his post Open? Social?

However the People Data API is cruel and unusual. It first stretches Atom until it creaks with "each entry in the People or Friends feed is a PersonKind"; then gives a further tug (a person's name is represented using atom:title) then extends it even more (a person's email is gd:email) and finally mops up all the blood, sweat and dribble:
Key value parameters - gd:extendedProperty - "As different social networks and other sources of People data have many different named fields, this provides a way for them to be passed on generally. Agreeing on common naming conventions is to be decided in future."
Got to admire the attempt, but (to mix the metaphorical namespaces) silk purses don't make very good sow's ears either.

In addition, AtomPub geek extraordinairre, Tim Bray wrote in his blog post entitled Web3S
If you decide you totally can’t model your world as collections of entries populated with hyperlinks to express relationships, well then I guess APP’s not for you. And at the level of engineering intuition, I have to say that a monster online address book does feel different at a deep level from most online “publications” (I thought that was why we had LDAP... but I repeat myself).

Now that we have AtomPub/GData as a de facto standard protocol for accessing various kinds of non-microcontent data on the Web as a reality, I'm done debating its suitability for the task since the horse has already left the barn. However I will continue to ask when will GData be RFC 5023 compliant?
At the end of the day, the most disappointing thing about OpenSocial is that it doesn't really further the conversation about actual interoperability across social networking sites. If I use Orkut, I still need a MySpace account to interact with my friends on that site. Some people have claimed that OpenSocial will enable routing around such lock-in via applications like iLike and Flixster which have their own social networks and thus could build cross-site social networking services since they will be hosted on multiple social networking sites. However the tough part of this problem is how a hosted application knows that carnage4life@windowslivespaces is the same user as DareObasanjo@Facebook? It seems OpenSocial completely punts on satisfying this scenario even though it wouldn't be hard to add this as a requirement of the system. I guess the various applications can create their own user account systems and then do the cross-site social network bridging that way, which sucks because it will be a lot of duplicative work and will require users to create even more accounts with various services.

Given that the big widget vendors like iLike, Slide and RockYou already have their users creating accounts on their sites that can be tied back to which social networking site the user utilizes their widgets on, this might be a moot point. Wouldn't it be mad cool if the Top Friends Facebook application could also show your top friends from MySpace or Orkut? I suspect the valuation of various widget companies will be revised upwards in the coming months.
There is no mention of a user-centric application authorization model. Specifically, there is no discussion of how users grant and revoke permission to access their personal data to various OpenSocial applications. Regular readers of my blog are familiar with my mantra of putting the user in control which is why I've been so enthusiastic about OAuth. Although there is some mention of Google's Authentication for Web Application in the documentation, this seems specific to Google's implementation of OpenSocial hosting and it is unclear to me that we should expect that this is the same model that will be utilized by MySpace, Bebo, TypePad or any of the other social networking sites that have promised to implement OpenSocial. On the other hand, Facebook has a well thought out applications permission model and I would have thought it would be quite easy to simply reverse engineer that and add it to the OpenSocial spec than to simply punt on this problem.

Despite these misgivings, I think this is a step in the right direction. Web widget and social graph APIs need to be standardized across the Web.

PS: I've subscribed to the Google OpenSocial blog. So far there have only been posts by clueless marketing types but I'm sure interesting technical information that addresses some of the points above will be forthcoming.

Categories: Competitors/Web Companies | Platforms | Syndication Technology | XML Web Services

November 1, 2007

@ 03:00 AM

When Will GData Be RFC 5023 Compliant?

James Snell has a blog post entitled Batch! which talks about the Batch processing model in GData APIs. He provides a sample of a GData batch request and points out the following

If the mere sight of this doesn’t give you shivers and shakes, let me give you a few reasons why it should:

It’s not valid Atom. Note the first entry in the feed for instance. An Atom entry has an id, a title, some content, an author, some links, maybe some categories, etc. If the type of objects you want to represent does not also have those things, Atom is not the right format to use.

It only works with Atom. What about binary resources like Jpeg’s? I guess we could base64 encode the binary data and stuff that into our invalid Atom entries but doing so would suck.

We can’t use Etag’s and conditional requests

I’m sure there are more reasons but these should be enough to convince you that a better approach is needed.

In a previous post I entitled One Protocol to Rule Them All and in the Darkness Bind Them I pointed out that since the Atom Publishing Protocol is not a good fit for interacting with data types that aren’t microcontent, it would need to be embraced and extended to satisfy those needs. In addition, this leads to problems because different vendors will embrace and extend it in different ways which fragments interoperability.

An alternative approach would be for vendors to utilize protocols that are better suited for the job instead of creating incompatible versions of a standard protocol. However the response I’ve seen from various people is that it is better if we have multiple slightly incompatible implementations of a single standard than multiple completely incompatible proprietary technologies. I’ve taken to calling this “the ODF vs. OOXML lesson”. This also explains why there was so much heat in the RSS vs. Atom debates but not so much when it came to debates over Yahoo’s podcasting extensions vs. Apple’s podcasting extensions to RSS.

Let’s say we now take it as a given that there will be multiple proprietary extensions to a standard protocol and this is preferable to the alternative, what should we have as the ground rules to ensure interoperability isn’t completely thrown out the windows? A fundamental ground rule should be that vendors should actually provide standards compliant implementations of the protocol before deciding to embrace and extend it. That way clients and services that conform to the standard can interoperate with them. In this regard, GData falls down horribly as James Snell points out.

Given that Joe Gregorio now works at Google and is a co-author of RFC 5023, I assume it is just a matter of time before Google fixes this brokenness. The only question is when?

PS: Defining support for batch operations in a standard way is going to be rather difficult primarily because of how to deal with failure modes. The fact that there is always the struggle between consistency and availability in distributed systems means that some folks will want a failure in any of the batched operations to result in the equivalent of a rollback while there are others that don’t care if one or two out of a batch of fifty operations fails. Then there are some folks in the middle for whom “it depends on the context” for which kind of failure mode they want.

Now playing: Method Man - Say (feat. Lauryn Hill)

Categories: Syndication Technology | XML Web Services

February 27, 2007

@ 06:15 PM

Created my first Yahoo! Pipe

With the hubbub now settling down down I decided to go back and try out Yahoo! Pipes. For a while, I've wanted a feed for articles by Chris Kelly over on Huffington Post so I decided to build that. After a couple of false starts I created the feed which currently doesn't have any items because there aren't any posts by Chris Kelly in the Huffington Post feed.

Now that I've actually used the service I'm pretty surprised that anyone thinks that this is a service that non-geeks will use. Programming with flowcharts to process RSS feeds seems even geekier than having a Star Trek wedding which was my previous bar for geekiest thing ever.

Categories: Syndication Technology

February 19, 2007

@ 01:11 AM

Why Feedburner Doesn't Count Outlook 2007 Subscribers

While everyone else was raving about the fact that Feedburner can now count RSS subscribers coming from Google reader I've been noticing that there was another discrepancy in the Feedburner data that didn't seem to be accounted for. Below is a screenshot of number of hits from Web browsers on my RSS feed

It seems pretty unlikely that people have clicked on my RSS feed over 5000 times today. At first I thought Feedburner was miscounting feeds that had been subscribed from IE 7 but a quick look in Fiddler shows that IE 7 requests feeds using Windows-RSS-Platform as the User-Agent and is correctly counted by Feedburner.

So I sent some mail to Eric Lunt who's a co-founder and the CTO of Feedburner to see if he knew what was wrong. He let me know that the problem is that Outlook 2007 doesn't identify itself in the User-Agent string and instead pretends to be Internet Explorer 7. This means there is no way to separate out accesses of your feed from Outlook 2007 from people clicking on your feed in IE 7.

This seems like a fairly rookie mistake to ship in a bigtime product like Outlook. I don't have the latest version installed so I can't confirm that this is truly the case but if it is I hope they plan to fix this soon. It's really lame to not identify your product correctly in the User-Agent string.
...
Oops. I should have done a search before sending out mail. It looks like this was already covered in a blog post entitled Outlook, RSS, & the user-agent string by Michael Affronti who was the PM for RSS in Outlook 2007. He wrote

For Outlook 2007 we will unfortunately not be able to report any custom user agent string for our RSS aggregation. Due to the way we integrate with IE across many parts of the application (the WININET stack is the underlying infrastructure for all of Outlook’s internet communication), we cannot easily and safely change the way we broadcast ourselves when connecting to external servers. To do so would require a fundamental change in the way the WININET stack is called from Outlook and could affect all of the Office applications. The scope of this fix is unfortunately outside of what we can provide this release.

I guess this won't be fixed anytime soon, if ever. Anyway, I hope this post helps out other users of Feedburner who've also been curious about their weird number of hits supposedly from IE 7.

Categories: Syndication Technology

February 8, 2007

@ 04:47 PM

The Problem with Web Scale: Yahoo! Pipes

A couple of blogs I'm subscribed to are pimping the brand new Yahoo! Pipes which I unfortunately can't seem to access right now. You can read some of the hype in blog posts like Jeremy Zawodny's Yahoo! Pipes: Unlocking the Data Web and Tim O'Reilly's Pipes and Filters for the Internet where it is described as "milestone in the history of the internet". I'd have loved to try out the service giving my interest in mashups and feed syndication but the site seems to be down or is just really, really slow.

As Dave Winer writes in his post Pipes Investigation

I see that Yahoo has a new web app, called Pipes, that looks to me like a feed construction kit. It takes RSS inputs, processes them in ways that are specified by the user, and produces feeds as its output.
...
From a quick persual of the functionality last night and the fact that the server isn't responding right now (5:45AM Pacific), it seems this app uses lots of CPU on the server.

As someone who works on large scale online services for a living, Yahoo! Pipes seems like a scary proposition. It combines providing a service that is known for causing scale issues due to heavy I/O requirements (i.e. serving RSS feeds) with one that is known for scaling issues due to heavy CPU and I/O requirements (i.e. user-defined queries over rapidly changing data). I suspect that this combination of features makes Yahoo! Pipes resistant to popular caching techniques especially if the screenshot below is any indication of the amount of flexibility [and thus processing power required] that is given to users in creating queries.

Really interesting idea though. I agree with Dave Winer that this is definitely fodder for geeks and not the average Web user. After all, RSS still hasn't crossed the adoption chasm with average Web users let alone an RSS feed remixing service.

Categories: Syndication Technology

December 14, 2006

@ 05:09 PM

Issues with TypePad's Atom 0.3 Feeds

I've noticed that some problems with viewing feeds of sites hosted on TypePad for the past few months in RSS Bandit. The problem was that every other post in a feed would display raw markup instead of correctly rendered HTML. I decided to look into the problem this morning and tracked down the problem. Take a look at http://blog.flickr.com/flickrblog/atom.xml. Here are relevant excerpts from the feed


 <content type="html" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
&lt;div xmlns=&quot;http://www.w3.org/1999/xhtml&quot;&gt;&lt;p&gt;&&nbsp;&lt;/p&gt;
&lt;div style=&quot;text-align: center;&quot;&gt;&lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;img border=&quot;0&quot; src=&quot;http://us.i1.yimg.com/us.yimg.com/i/ww/news/2006/12/12/gtfof.gif&quot; style=&quot;padding-bottom: 6px;&quot; /&gt;&lt;/a&gt;&lt;/div&gt;
&lt;p&gt;It&#39;s now easier than ever to spread joy this holiday season by giving the &lt;a href=&quot;http://www.flickr.com/gift/&quot;&gt;&lt;strong&gt;Gift of Flickr&lt;/strong&gt;&lt;/a&gt;. You can purchase a special activation code that you can give to anyone, whether or not they have an existing Flickr account. We&#39;ve even created a special Gift Certificate card that you can print out yourself, fold up and stuff in a stocking, under a tree or hidden away for after the candles are lit (of course, you can also send the gift code in an email).&lt;/p&gt;

&lt;p&gt;And it&#39;s even better to give the gift of Flickr since now your recipients will get &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;unlimited uploads&lt;/strong&gt;&lt;/a&gt; — the two gigabyte monthly limit is no more (&lt;em&gt;yep, pro users have no limits on how many photos they can upload&lt;/em&gt;)! At the same time, we&#39;ve upped the limit for free account members as well, from &lt;a href=&quot;http://www.flickr.com/help/limits/#28&quot;&gt;&lt;strong&gt;20MB per month up to 100MB&lt;/strong&gt;&lt;/a&gt; (yep, five times more)!&lt;/p&gt;

&lt;p&gt;The Flickr team also wants to take this opportunity to thank you for a wonderful year and wish you and yours all the best of the season. Yay!&lt;/p&gt;&lt;/div&gt;
</content>
...
<content type="xhtml" xml:lang="en-ca" xml:base="http://blog.flickr.com/flickrblog/">
<div xmlns="http://www.w3.org/1999/xhtml"><p><a href="http://www.flickr.com/photos/eye_spied/313572883/" title="Photo Sharing"><img width="500" height="357" border="0" src="http://static.flickr.com/117/313572883_8af0cddbc7.jpg" alt="Dec 2 2006 208 copy" /></a></p>

<p><a title="Photo Sharing" href="http://www.flickr.com/photos/mrtwism/71294604/"><img width="500" height="375" border="0" alt="riding" src="http://static.flickr.com/34/71294604_b887c01815.jpg" /></a></p>

<p>See more photos in the <a href="http://www.flickr.com/photos/tags/biggame/clusters/cal-berkeley-stanford/">"Berkeley," "Stanford," "big game" cluster</a>.</p>

<p>Photos from <a href="http://www.flickr.com/photos/eye_spied/" title="Link to caryniam's photos">caryniam</a> and <a title="Link to mrtwism's photos" href="http://www.flickr.com/photos/mrtwism/">mrtwism</a>.</p></div>
</content>

So the first mystery is solved. The reason some posts look OK and some don't is that for some reason TypePad seems to alternate between escaped HTML and well-formed XHTML as the content of an entry in the feed. When the feed uses well-formed XHTML the item looks fine but when it uses escaped HTML it looks like crap. The next question is why the items aren't rendered correctly when escaped HTML is used.

So I referred to section 3.1 of the Atom 0.3 specification and saw the following

3.1.2 "mode" Attribute

Content constructs MAY have a "mode" attribute, whose value indicates the method used to encode the content. When present, this attribute's value MUST be listed below. If not present, its value MUST be considered to be "xml".

"xml":
A mode attribute with the value "xml" indicates that the element's content is inline xml (for example, namespace-qualified XHTML).
"escaped":
A mode attribute with the value "escaped" indicates that the element's content is an escaped string. Processors MUST unescape the element's content before considering it as content of the indicated media type.
"base64":
A mode attribute with the value "base64" indicates that the element's content is base64-encoded [RFC2045]. Processors MUST decode the element's content before considering it as content of the the indicated media type.

To prevent aggregators from having to use their psychic powers to determine when an item contains plain text or escaped HTML, the Atom folks introduced a mode attribute that indicated whether the content should be treated as is or should be unescaped. As you can see the default value for this is not "escaped". Since the TypePad Atom feeds do not state that the HTML content is escaped then the aggregator is not expected to unescape the content before rendering it. Second mystery solved. Buggy feeds are the culprit.

Even though these feeds are broken it is probably faster for me to special case feeds fromTypePad than trying to track down and convince the folks at SixApart that this is a bug worth fixing. This issue will be fixed in the next beta of the Jubilee release of RSS Bandit.

Categories: RSS Bandit | Syndication Technology

November 16, 2006

@ 11:43 PM

Shelley Powers on the Problem with TechMeme

Shelley Powers has a good pair of posts critical of TechMeme, a technology meme-tracker. In her post entitled Techmeme tells us to Feed the Daddy she writes

Techmeme heard the recent discussion about sites not appearing, and responds with a post (at http://blog.memeorandum.com/061115/how-to-show-up) on how to show up on techmeme. The money shot:

Early on I noticed my system occasionally missed good posts from blogs that link back to my sites. So recently I extended my system to take referrals into account. Now if your blog or news article sends a moderate level of traffic to one of my sites, it will be evaluated for inclusion. Linking certainly doesn't guarantee you'll appear, since all posts are run though the usual tests for newsworthiness. In fact, extra steps to avoid spam are now in effect since faked referrals and splogs are already commonplace. So in summary, sending memeorandum (or Techmeme or…) visitors is another way to "enable discovery of your post".
In other words, if you puff up Gabe Riviera's empire, giving it lots of Google rank, as well as do all the marketing for him (such as techmeme's primary gatekeeper, Scoble, for whom Riviera sends special love and kisses), you might be able to 'buy' your way into being listed.

I'm not sure what the goals of TechMeme are but it seems rather weird to use link exchange as a mechanism for getting sites into TechMeme's index. I doubt that will improve the 'quality' of the service and instead seems like a rather tacky 'scratch-my-back-and-I'll-scratch-yours" ploy. If the intent is to determine if the site has enough traffic to be worth including, why not look at its Alexa statistics or Technorati rank [as flawed as they are] instead of requesting a tit-for-tat link exchange? I think Gabe got some bad advice there.

In her followup post entitled Feed your Daddy Follow-up, Shelley adds

I wouldn't 'fix' Techmeme. What I would like to see is a growth in sites that provide topic aggregations, each using its own metrics and filtering algorithms. The more of these there are, the more likely we'll see a more fair distribution of attention, as well as a greater variety of stories, and more timely ones at that. In history, a way to discover an unbiased view of a fact or an event is to seek out at least three separate sources of information. The same can be said of topic aggregators. More than three; I'd actually like to see at least five.

One of the biggest problems with Techmeme is that it is asserted to be the 'ultimate authority' on what are the top stories in technology (or politics for Memeorandum). Yet according to it, 30% of us spend all of our time talking about Google, 10% of us discuss new startup funding, 10% talk about Microsoft, how it is, or is not clued; probably about 15% of us talk about some variation of gadget, typically iPod and now Zune; the rest talk about Techcrunch, Scoble, Second Life, or Techcrunch and Scoble in a Second Life. I could go on, but the point is that Techmeme is based more or less on seeded terms and seeded webloggers, and it can't shake that influence. As such, it provides an incredibly skewed look at the tech area of weblogging–completely ignoring most of what is truly technology.
Techmeme serves a purpose for those who are into Google and VC and San Francisco and startups and money, and Michael Arrington and Calacanis, and Scoble and the scene there, and that's fine. But that doesn't make it an authority on what's important, interesting, or even timely.

As usual Shelley hits the nail on the head. TechMeme is good at what it does, gathering the popular or interesting links among the Silicon Valley blogger crowd. However this is just one particular view into the technology industry and specifically the technology blogosphere. Most of the content isn't particularly relevant if you aren't a regular reader of sites like TechCrunch and Robert Scoble's blog.

Me, I personally would prefer a meme tracker that was heavy on bloggers like Sam Ruby, Tim Bray and Jon Udell instead of the large number of PR hacks and VCs that populate TechMeme. Where I disagree with Shelley is that I don't think the answer is more meme trackers each with their own bias yet which will likely overlap significantly. We already have that today, if you read sites like TechMeme, TailRank and Megite. I think the future is in personalization and not more news aggregator whose bias you can't control.

Categories: Social Software | Syndication Technology

October 30, 2006

@ 02:38 PM

Lost All My Rojo Subscribers

I had some DNS issues last week because I forgot to renew my domain name registration. According to my FeedBurner statistics page, one of the consequences of this lapse on my part is that I've lost about 1,000 subscribers to my feed on Rojo. It looks like they may have decommissioned my feed in their service once the domain name stopped resolving after a couple of hours. This seems like a reasonable thing to do.

I've sent some mail to the Rojo folks through their feedback email address but don't hold out much for ever getting a response. It may be that there is nothing for them to fix since the relationship between my feed and the subscribers may have been deleted from their database once my domain was considered to be gone.

Anyway, if you are a Rojo user wondering why my feed disappeared from you subscription list, now you know.

Categories: Syndication Technology

October 7, 2006

@ 02:33 AM

Why I Don't Like atom:updated

Tim Bray has a blog post entitled On Comments where he writes

I’ve had comments running for a few days here now (I prefer to say “contributions”, but whatever). People are irritated at me because an ongoing fragment shows up as unread in their feed-reader whenever a new comment comes in. I’m not sure what the right thing to do is. This piece outlines a few options and asks the community for discussion.

This is one of the reasons I've given about disliking the atom:updated element in blog posts like Indicating Updated Items in RSS Bandit. It should be up to the user to decide what count as 'significant' updates that warrant marking the item as changed or new in the user interface, not the publisher. Tim thinks that new comments in a blog post should lead to the reader being notified by their aggregator, I think this should only be the case when the user has explicitly opted in for notifications on changed new comments. This doesn't extend to updates because the definition of what counts as a 'significant' update is going to vary from publisher to publisher and from user to user.

My advice to Tim is to use the Atom threading extensions which provides explicit mechanisms for indicating changes to the number of comments and provides a way to link to comment feeds as opposed to hacks like changing the value of atom:updated or putting the comments into the atom:content of the entry. Those both sound like recipes for a negative user experience when reading his blog in many aggregators.

The title of this blog post is probably harsher than I intend. I think it is useful to have a last modified date in the form of atom:updated on items in a feed. What I disagree with is impacting the user experience based on changes to that element.

Categories: Syndication Technology

September 18, 2006

@ 06:10 PM

Niall Kennedy on Private and Authenticated Feeds

Niall Kennedy has a blog post entitled Authenticated and private feeds where he writes

Examples of private feeds intended for 1:1 communication include bank balances, e-mail notifications, project status, and the latest bids on that big contract. Data in the wrong hands could be dangerous, and many companies will stay away from the feed syndication space until they feel their users' personal data is secure.

A private feed's data could be exposed in a variety of ways. A desktop aggregator's feed content might be available to other users on the same computer, either through directory access or desktop search. An online aggregator might expose a feed and its content in search results or a preview mode.
...
A feed publisher could whitelist the user-agents it knows comply with its access policies. SSL encryption might not be a bad idea either as shared aggregation spaces might not store content requested over HTTPS. It would place extra load on the server as each request requires extra processing, but if the alternative is placing your customer's data in the Yahoo! search index then that's not such a bad thing.

I believe large publishers such as Salesforce.com or eBay would produce more feed content if they knew their customers' data was kept private and secure. There's a definite demand for more content transmitted over feed syndication formats but it will take the cooperation and collaboration of security formats and consistent aggregation practices to really move the needle in the right direction.

How to properly support private and authenticated feeds is a big problem which Niall highlights but fails to go into much detail on why it is hard. The main problem is that the sites providing the feed have to be sure that the application consuming the feed is secure. At the end of the day, can Bank of America trust that RSS Bandit or Bloglines is doing a good job of adequately protecting the feed from spyware or malicious hackers?

More importantly, even if they certify these applications in some way how can they verify that the applications are the ones accessing the feed? Niall mentions white listing user agents but those are trivial to spoof. With Web-based readers, one can whitelist their IP range but there isn't a good way to verify that the desktop application accessing your web server is really who the user agent string says it is.

This seems to be yet another example of where Web-based software trumps desktop software.

Categories: Syndication Technology

September 5, 2006

@ 08:58 PM

Jubilee Thoughts: More on Personalized Meme Tracking

Gabe Rivera, author of Techmeme, has a blog post entitled Why I don't offer a personal filter where he writes

I'm facing another round of inquiries on personal filtering, mostly from Techmeme fans who've read Ross Mayfield's or Dare Obasanjo's Jeff Clavier and Ted Leung nearly a year ago!) recent thoughts on the matter. (Just for the record, the first round included requests from

Why don't I offer a personal filter service aka "meMeme" aka "my.memeorandum"? Briefly, filters based on the editorial approach used for Techmeme/memeorandum don't work well outside of a few topic domains (like politics and tech), because cross linking is typically too sparse to produce a compelling mix of news. Sam Ruby unintentionally confirmed this yesterday should you pause to consider what sort of daily news selection could be derived from his Venus output. While it's true that cross linking is dense in some blogospheres, these are largely the same domains already covered by my existing sites.

Why not try editorial approaches based on new kinds of semantic analyses? My belief is that the requisite technology is harder than anything powering Google News, Topix, or my current sites. Attempts based on current technologies come up woefully short, with the resulting "Daily Me" consisting of a seemingly random mix of content missing most or all "must have" articles and posts. And having the "must haves" is essential for winning the earlier adopter types that would dominate the userbase of such a filter in the first place.

I reread the output from Sam's blogroll and it reminded me that there is a difference between the scenarios that sites like Techmeme and Tailrank are interested in and the goals of a personalized meme tracker. Here are a copuple of questions to get you started on understanding the differences in implementation choices one might make between implementing a personalized meme tracker vs. a topic specific memetracker,

Q: How do you deal with "noise" links such as http://del.icio.us/tag/rest or http://www.technorati.com/tag/AJAX which may be common in the feeds the user is interested in?

A: In both cases, it would seem the first step is to hard code the application to understand certain kinds of links as "noise". The interesting question is how to deal with the introduction of new types of "noise" links to the ecosystem. A web-based application may be easily updated as new "noisy" links enter the system but things are a bit more difficult for a desktop application. Perhaps allowing users to nominate certain classes of links as noise?
Q: What 'class' of news items or blog posts should be used in evaluating what is [currently] popular?

A: It is quite obvious that simply using the entirety of the posts from a particular feed to calculate a links popularity is flawed. Using that metric, I suspect that links such as http://adaptivepath.com/publications/essays/archives/000385.php or http://scobleizer.wordpress.com/2006/06/10/correcting-the-record-about-microsoft/ would always be the most popular links from my blogroll. Using a specific date or time range (e.g. over the past 24-48 hours) seems to be what sites such as Techmeme and Tailrank seem to do. An aggregator such as RSS Bandit or FeedDemon may use other techniques such as only using 'unread' items to calculate currently popular topics.
Q: How do you deal with link blogs?

A: A number of people in my blogroll have blog posts that are basically a repost of al the links they have posted to del.icio.us that day (e.g. Stephen O'Grady and Mark Baker). Sites like Techmeme and Tailrank filter these posts because no one wants to see a bunch of headlines that are all of the form 'links for 2006-09-05' with no real content. On the other hand, if a large number of folks in my blogroll are linking to a particular news item then it is likely to be interesting to me regardless of whether there 'meaty' blog posts behind their links or just linkblog style postings.

These are a couple of the queestions that I've been pondering since I started thinking about this feature a couple of months ago. At the end of the day I think that although Gabe's perspective is useful since he did build the site that inspired this thinking, the scenarios are different enough to change some of the implementation choices in ways that may seem surprising to some.

PS: It seems Sam has already turned Gabe's feedback into code based on reading his blog post MeMeme 2.0. There are definitely interesting times ahead.

Categories: RSS Bandit | Syndication Technology

September 4, 2006

@ 03:42 AM

Jubilee Thoughts: Personalized Meme Tracking

About six months ago, I wrote a blog post entitled Jubilee Thoughts: Tracking Hot Topics where I talked about adding meme tracking functionality similar to the features of Memeorandum and TailRank to RSS Bandit. Since I wrote that blog post I haven't written a lick of code that actually does this but I've thought and talked about it a lot. While all I've done is talk I can't help but notice that a few others have been writing code while I've been pontificating in my blog.

In his blog post entitled Spyder Spots a Memetracker Nick Bradbury writes

Andy "Spyder" Herron writes about the "personal memetracker" that's hidden in FeedDemon 2.0.0.25.

I had hoped to complete this feature by now, but as Andy points out, it still needs some work (which is why I hid it and gave it a "beta" label). If you'd like to try it out, select "Popular Topics" from the Browse menu (or just add the "Popular Topics" toolbutton to the toolbar above FeedDemon's browser).
I should add that this feature will probably be useful only to people who subscribe to a lot of feeds since it relies on common links to determine popularity. So if you're not subscribed to feeds which link to the same articles, chances are it won't show you any results.

In another blog post entitled MeMeme Sam Ruby writes

Ross Mayfield: Cue up not what is popular, or what the people I subscribed to produced. Cue up what my social network has found interesting.
Herewith, a simple demonstration of what aggressive canonicalization can produce. Venus may be in Python, but suppose I’m in a Ruby mood. The cache is simply files in Atom 1.0 format, with all textual content normalized to XHTML.

Lets make a few simplifying assumptions: all posts are created equal, each post can only vote once for any given link (this also takes care of things like summaries which partially repeat content), posts implicitly vote (once!) for themselves, and the weight of a vote degrades as the square of the distance between when the post was made and now.

Here’s the code, and here’s a snapshot of the output. The output took 6.239 elapsed seconds to produce on my laptop. I still have more work to do to eliminate some of the self-referential links (in fact, I a priori removed Bob Sutor’s blog from the analysis as it otherwise he would dominate the results). But I am confident that this is solvable, in fact, I am working on expanding what filters can do. I’ll post more on that shortly.

With both Sam and Nick on the case, I'm quite sure that within the next few months it will be taken for granted that one of the features of news aggregator is to provide personalized meme tracking. Although I'm sure that we'll all use the same set of basic rules for providing this feature, I suspect that the problems that we are trying to solve will end up being different which will influence how we'll implement the feature.

For example, the main reason I want this feature isn't to track what the popular topics are across multiple blogs but instead to find what the popular topics are across aggregated blog feeds such as blogs.msdn.com and the numerous planet sites. In reading Sam's blog it seems he'd consider the same feed linking to the same news items as spam to filter out while I consider it be the only part of the feature I'd use. This issue illustrates the main problems I've had with designing the feature in my head. What "knobs" or options should we give users to control how the meme tracker decides what is interesting or not vs. what should be ignored when generating the list of 'hot topics' (e.g. the various meme trackers have said they filter out link blogs since they tend to dominate the results as well)?

Since I've decided to be more focused with regards to RSS Bandit development, I won't touch this feature until podcasting support is done. However I'd like to hear thoughts from our users in the meantime.

Categories: RSS Bandit | Social Software | Syndication Technology

August 3, 2006

@ 05:43 PM

Some Thoughts On Bloglines Feed Access Control Proposal

I just read the rather brief Feed Access Control RSS and ATOM specification from the Bloglines team. It defines the access:control element as

<access:restriction> element
Sub element of <rss> or <feed>. Used to indicate the re-distribution restrictions for a feed. The 'relationship' attribute is used to indicate whether a feed will 'allow' or 'deny' access.

To 'allow' access means a feed may be redistributed to other public sources, including search. To allow access, for example:
    <access:restriction relationship="allow" />
To 'deny' access means a feed should not be redistributed to other public sources, including search. To deny access, for example:
    <access:restriction relationship="deny" />
The default relationship is to allow access. However, if a feed is currently set to 'deny', the relationship must be explicitly set back to 'allow' for it to be registered (Simply ommiting it from the feed is not sufficient to turn access back on).

The problem with this 'specification' is that it says nothing about its goals, scenarios or expected use cases. Without these it is hard to tell whether this is a good idea or a bad idea. Danny Ayers points out that this mimics the behavior of the Robots META tag that can be placed in HTML pages. I guess this means it prevents search engines from indexing your page and showing it in search results which makes sense in certain limited scenarios. For example, it makes sense to exclude a search engine from indexing the search results page of another search engine or the RSS feed of some search results. Hints like the Robots META tag and robots.txt are good ways to prevent this from happening for HTML pages. I guess this proposal does the same for RSS and Atom feeds.

On the other hand, it is definitely not an access control mechanism. You wouldn't want your bank to tell you that the way that they prevent anyone from viewing your bank account details is via robots.txt would you?

Categories: Syndication Technology

July 18, 2006

@ 05:12 PM

Bloglines UI Faux Pas

Yesterday, I spent way too much time trying to figure out how to import an OPML feed list into Bloglines from the UI before giving up and performing a Web search to find out the how to do it. Below is a screenshot of the key choices one has for managing ones feeds in Bloglines.

And this is what the Bloglines FAQ has in response to the question How Can I Import An Existing List of Subscriptions?

Once you have registered with Bloglines and replied to the confirmation email, click on the My Feeds tab at the top of the screen. Then, click on the Edit link. At the bottom of the left panel will be a link to import subscriptions. The subscription list must be in OPML format.

Why is importing a feed list an 'Edit' operation and not an 'Add'? Who designs this crud?

Categories: Syndication Technology

July 15, 2006

@ 10:25 PM

On GData and Hailstorm

Nathan Torkington has a blog post entitled A Week in the Valley: GData on the O'Reilly Radar blog that talks about the growth of the usage of GData & the Atom Publishing Protocol within Google as well as Marc Lukovsky's take on how this compared to his time at Microsoft working on Hailstorm. Nat writes

They're building APIs to your Google-stored data via GData, and it's all very reminiscent of HailStorm. Mark, of course, was the architect of that. So why's he coming up with more strategies to the same ends? I figure he's hoping Google won't screw it up by being greedy, the way Microsoft did...The reaction to the GData APIs for Calendar have been very positive. This is in contrast to HailStorm, of course, which was distrusted and eventually morphed its way through different product names into oblivion. Noting that Mark's trying again with the idea of open APIs to your personal data, I joked that GData should really be "GStorm". Mark deadpanned, " I wanted to call it ShitStorm but it didn't fly with marketing".

Providing APIs to access and manipulate data owned by your users is a good thing. It extends the utility of the data outside that of the Web applications that may be the primary consumer of the data and it creates an ecosystem of applications that harness the data. This is beneficial to customers as can be seen by looking around today at the success of APIs such as the MetaWeblog API, Flickr API or del.icio.us API.

Five years ago, while interning at Microsoft, I saw a demo about Hailstorm in which a user visiting an online CD retailer was showed an ad for a concert they'd be interested in based on their music preferences in Hailstorm. The thinking here was that it would be win-win because (i) all the user's data is entered and stored in one place which is convenient for the user (ii) the CD retailer can access the user's preferences from Hailstorm and cut a deal with the concert ticket provider to show their ads based on user preferences and (iii) the concert ticket provider gets their ads shown in a very relevant context.

The big problem with Hailstorm is that it assumed that potential Hailstorm partners such as retailers and other businesses would give up their customer data to Microsoft. As expected most of them told Microsoft to take a long walk of a short pier.

Unfortunately Microsoft didn't take the step of opening up these APIs to its online services such as Hotmail and MSN Messenger but instead quietly canned the project. Fast forward a few years later and the company is now playing catchup to ideas it helped foster. Amusingly, people like Mark Lucovsky and Vic Gundotra who were influential during the Hailstorm days at Microsoft are now at Google rebuilding the same thing.

I've taken a look at GData and have begun to question the wisdom of using Atom/RSS as the baseline for information interchange on the Web. Specifically, I have the same issues as Steven Ickman raised in a comment on DeWitt Clinton's blog where he wrote

From a search perspective I’d argue that the use of either format, RSS or Atom, is pretty much a hack. I think OpenSearch is awesome and I understand the motivators driving the format choices but it still feels like a hack to me.

Just like you I want to see rich structured results returned for queries but both formats basically limit you to results of a single type and contain a few known fields (i.e. link, title, subject, author, date, & enclosure) that are expected to be common across all items.

Where do we put the 100+ Outlook defined contact fields and how do we know that a result is a contact and not an appointment or auction? Vista has almost 1000 properties defined in its schema so how do we convey that much metadata in a loseless way? Embedded Microformats are a great sugestion for how to deal with richer content but it sort of feels like a hack on top of a hack to me? What’s the Microformat for an auction? Do I have to wait a year for some committee to arrive at joint aggreement on what attributes define an auction before I can return structured auction results?

When you have a hammer, everything looks like a nail. It seems Steven Ickman and I reviewed OpenSearch/GData/Atom with the same critical lens and came away with the same list of issues. The only thing I'd change in his criticism is the claim that both formats (RSS & Atom) limit you to results of a single type, that isn't the case. Nothing stops a feed from containing data of wildly varying types. For example, a typical MSN Spaces RSS feed contains items that represent blog posts, photo albums, music lists, and book lists which are all very different types.

The inability to represent hierarchical data in a natural manner is a big failing of both formats. I've seen the Atom Threading Extensions but that seems to be a very un-XML way for an XML format to represent hierarchy. Especially given how complicated message threading algorithms can be for clients to implement.

It'll be interesting to see how Google tackles these issues in GData.

Categories: Competitors/Web Companies | Syndication Technology | XML Web Services

July 9, 2006

@ 05:32 PM

Some Thoughts on the Newsgator Roadmap

About two weeks ago, Greg Reinacker wrote about NewsGator's past, present and future in two blog posts entitled NewsGator platform roadmap - Part I (a look back) and NewsGator platform roadmap - Part II (a look forward). The blog posts are a good look at the achievements of a company that has gone from a one-man shop building an RSS reading plugin for Outlook into being the dominant syndication platform company on almost any platform from Windows & Mac to the Web & mobile phones. If you are interested in XML syndication, then Greg's posts are bookmark-worthy since they describe the future plans of a company that probably has the best minds building RSS/Atom applications working there today. Below are some excerpts from his posts in my areas of interest

NewsGator Online

As I said 16 months ago, the proposed feature list is long and distinguished - and it still is. There is so much to do here...some of the short-term planned additions range from more interactive feed discovery mechanisms (based on the larger community of users and their subscriptions), to completely different user interface paradigms (where a user could potentially select from different options, each catering to a different kind of user).

A larger initiative is around the whole paradigm. Techies aside, users don't want to think about feeds, and subscriptions, and searching for content...Given all that, we're really rethinking the way we present information to the user, and the way users discover new information. We're designing ways for people to participate in a larger community if they wish, and get more value out of the content they consume, at the point they discover it. While we all have our own set of feeds, and we all participate to some extent in the larger ecosystem, there is a lot of potential in linking people with similar interests to each other. Some users will continue to use our system as they always have - and others will use it in completely different ways. We're testing a couple of approaches on this right now - I think it's truly a game-changer.
NewsGator Inbox, FeedDemon, NetNewsWire

As I mentioned before, the enthusiasm around these products has continued to grow - people obviously see the value in a rich, synchronized, offline-capable user experience for consuming content. Moving forward, online integration will get tighter, and more complete - ranging from the low hanging fruit like FeedDemon "News Bins" becoming Clippings (and thus synchronize with the entire platform), to more involved features like analytics-related features (recommendations, interest-based surfacing, etc.) and community-related features.
...
NewsGator core platform

This is the heart of our entire product line (with the exception of NewsGator Enterprise Server). Moving forward, we're investing a lot in the platform. We're building out more support for deep analytics (which we can use to deliver different kinds of user experience), and building out a much deeper metadata engine (which means if a client retrieves content from our system, they'll get much richer data than they otherwise would). We'll have other ways to "slice" our data to get what you need, without having to subscribe to hundreds of feeds.

The API has been very successful, and we process millions of API calls per day from client applications, web services, and private label clients. This traffic actually makes up a large percentage of our overall system traffic - which I think is a testament to the popularity and utility of the API. Moving forward here, we're obviously very committed to the API story, and we'll continue to enhance it as we add platform capabilities.

There's lots of good stuff here. The first thing that pops out at me is that while a bunch of startups these days tend to proclaim the death of desktop software, NewsGator is actually seeing the best of both worlds and improving the quality of the desktop experience by harnessing a Web-based platform. It's not Web-based software replacing desktop software, it's desktop software becoming better by working in tandem with APIs and applications on the Web. When Ray Ozzie talks about "live software", NewsGator is the company that leaps most readily to my mind.

I like the idea of making discovery of new content more of a social experience. It'd be interesting to see what would happen if NewsGator Online had a del.icio.us-inspired interface for browsing and subscribing to people's feeds. I notice that Gordon Weakliem who works on the NewsGator API recently wrote a post entitled Needles in Haystacks where he talks about serendipitous discovery of new websites by browsing bookmarks of people with similar interests to him in del.icio.us. I'm sure it's just a matter of time before NewsGator adds these features to their platform.

I also like the idea of exposing richer metadata in the NewsGator API especially if it relates to the social features that they plan to unveil in the next couple of months. Unfortunately, I've never been able to get the NewsGator API to work quite right with RSS Bandit but I'll be revisiting that code later in the summer.

Categories: Syndication Technology

July 4, 2006

@ 04:05 PM

What Do Your Feed Subscribers Say About You?

Mike Arrington of TechCrunch fame has a blog post entitled where he lays out the demographics of the various RSS readers used to subscribe to his feed. Below is an excerpt of his post and a partial screenshot of his FeedBurner statistics showing the top fourteen feed readers used to access the TechCrunch feed

Firefox (including Flock) accounts for 20% of feed readers. Bloglines is in second place with 13%, followed by NewsGator at 12%, Rojo at 8%, FeedReader at 7%, and Netvibes at 7%. Other notables include Pageflakes, Pluck and Attensa. If you add NetNewsWire to the core NewsGator stats, NewsGator is actually bigger than bloglines.
...

The feed reader statistics are surprising to me both for the feed readers that show up in the list and for those that don't. For example, I'm surprised to see FeedReader at #5 yet not see FeedDemon in the top 14. Similarly, the popularity of AJAX home pages like Pageflakes and Netvibes over those from the big 3 (Google/Yahoo/Microsoft) is also unexpected. Of course, these statistics might be skewed because TechCrunch is one of the default feeds in Netvibes. A final surprise is that NewsGator Online is almost as popular as Bloglines among readers of TechCrunch.This seems to mean that the latter is finally getting a lot of cred among the early adopter crowd especially since the former has been slow to update in the past year.

For a completely different set of demographics, here are the top 14 feed readers used to access my RSS feed according to FeedBurner.

I wonder what conclusion you draw from how different the distribution of feed readers is in the above screenshots. For example, I think the fact that a bunch of Microsoft employees and developers on Microsoft's platfoms read my blog explains why there are multiple instances of feed readers based on the .NET Framework in the above list. In addition, I suspect this also explains why there is an entry for the Windows RSS platform in the top 10 applications hitting my feed.

On the flip side, I have no explanation for why it seems that NewsGator Online is half as popular as Bloglines among the readers of my blog.

Categories: Syndication Technology

June 5, 2006

@ 07:55 PM

Extending RSS and Atom: What is the Best Way?

Seeing Jon Udell's post about having difficulty with the Google PR team with regards to discussing the Google GData API reminded me that I needed to write down some of my thoughts on extending RSS and Atom based on looking at GData. There are basically three approaches one can take when deciding to extend an XML syndication format such as RSS or Atom

Add extension elements in a different namespace: This is the traditional approach to extending RSS and it involves adding new elements as children of the item or atom:entry element which carry application/context specific data beyond that provided by the RSS/Atom elements. Microsoft's Simple Sharing Extensions, Apple's iTunes RSS extensions, Yahoo's Media RSS extensions and Google's GData common elements all follow this model.
Provide links to alternate documents/formats as payload: This approach involves providing links to additional data or metadata from an item in the feed. Podcasting is the canonical example of this technique. One argument for this approach is that instead of coming up with extension elements that replicate existing file formats, one should simply embed links to files in the appropriate formats. This argument has been used in various discussions on syndicating calendar information (i.e. iCalendar payloads) and contact lists (i.e. vCard payloads). See James Snell's post Notes: Atom and the Google Data API for more on this topic.
Embed microformats in [X]HTML content: A microformat is structured data embedded within another markup language (typically HTML/XHTML). This allows one to represent both human-readable data and machine-readable data in a single document. The Structured Blogging initiative is an example of this technique.

All three approaches have their pros and cons. Option #1 is problematic because it encourages a proliferation of duplicative extensions and may lead to fragmenting the embedded data into multiple unrelated elements instead of a single document/format. Option #2 requires RSS/Atom clients to either build parsers for non-syndication formats or rely on external libraries for consuming information in the feed. The problem with Option #3 above is that it introduces a dependency on an HTML/XHTML parser for extracting the embedded data from the content of the feed.

From my experience with RSS Bandit, I have a preference for Option #1 although there is a certain architectural purity with Option #2 which appeals to me. What do the XML syndication geeks in the audience think about this?

Categories: Syndication Technology

April 25, 2006

@ 06:03 PM

Lessons from the NewsGator Outage

Last week there was an outage on NewsGator Online. This outage didn't just affect people who use the NewsGator Online but also users of their desktop readers such as FeedDemon which synchronize the users feed state between the desktop and the web-based reader.

In his post Dealing with Connectivity Issues in Desktop Applications Nick Bradbury writes

One of the more frustrating challenges when designing a desktop application that connects to the Internet is figuring out how to deal with connectivity issues caused by firewalls, proxy servers and server outages.
...
And as we discovered last week, when your application relies on a server-side API, it has to be able to deal with the server being unavailable without significantly impacting the customer. This was something FeedDemon 2.0 failed to do, and I have to take the blame for this. Because of my poor design, synchronized feeds couldn't be updated while our server was down, and to make matters worse, FeedDemon kept displaying a "synchronization service unavailable" message every time it tried to connect - so not only could you not get new content, but you were also bombarded with error messages you could do nothing about.

A couple of months ago I wrote a blog post entitled The Newsgator API Continues to Frustrate Me where I complained about the fact that Newsgator Online assumes that clients that synchronize with it actually just fetch all their data from Newsgator Online including feed content. This is a bad design decision because it means that they expected all desktop clients who synchronize with the web-based reader to have a single point of failure. As someone who's day job is working on the platforms that power a number of Windows Live services, I know from experience that service outages are a fact of life. In addition, I also know that you don't want clients making requests to your service unless they absolutely have to. This is not a big deal at first but once you get enough clients you start wanting them to do as much data retrieval and processing as they can without hitting your service. Having a desktop feed reader rely on a web service for fetching feeds instead of having it fetch feeds itself needlessly increases the costs of running your online service and doesn't buy your customers a significantly improved user experience.

I've bumped into Greg Reinacker since I complained about the Newsgator API and he's been adamant about the correctness of their design decisions. I hope the fallout from the recent outage makes them rethink some of the design of Newsgator's RSS platform.

Categories: Syndication Technology

April 10, 2006

@ 02:53 PM

Mark Pilgrim is Back

Via Shelley Powers I found out that Mark Pilgrim has restarted his blog with a new post entitled After the Bath. Ironically, I didn't find this out from my favorite RSS reader because it correctly supports the HTTP 410(GONE) status code which Mark's feed has been returning for over a year.

Mark Pilgrim's feed being resurrected from the dead is another example of why simply implementing support for Web specifications as written sometimes bites you on the butt. :)

Categories: Mindless Link Propagation | Syndication Technology

April 4, 2006

@ 06:27 PM

Pointless Reinvention of the Wheel

I stopped paying attention to the syndication wars a couple of months ago. I barely have time to stay on top of all the stuff I have to worry about as part of my day job let alone keeping track of the pointlessness that is the Atom vs. RSS debate. Unfortunately, every once in a while something happens that forces me to pay attention because I'm also the project lead for RSS Bandit.

One cool thing about XML syndication formats like RSS and Atom is that they are easily extensible. This means that anyone can come up with a new extension to the RSS/Atom formats which adds a new feature but is ignored by feed readers that don't understand the extension. Some of my favorite extensions are slash:comments which provides a count of the number of comments in a feed and wfw:commentRss which provides the URL to the feed for the comments to a blog post in the feed. One of my work items for the next version of RSS Bandit is to make it easier for people to 'watch' the comments for a blog post they are interested in if it supports these extensions. That way I can get get a notification everytime some comment thread I am interested in gets new posts directly from my aggregator instead of using other tools like CoComment.

A few days ago Sam Ruby posted an entry entitled Rogers Switches! where he mentioned that he now redirects all requests for his RSS feeds to his Atom feed. This meant that in RSS Bandit I now no longer see the comment count for the blog posts in Sam's feed nor can I view the comments to his posts directly within my aggregator. Sam and I had the following exchange in his blog comments when I discovered the ramifications of his change

I was trying to figure out if I’d introduced a bug in RSS Bandit to make your comment count and inline comments disappear. Instead, it seems you have made your feed less useful as part of the fallout of yet another iteration of the eternal pissing match which is the XML syndication wars.

sigh
Posted by Dare Obasanjo at 20:25

Dare, is this is your way of saying that you don’t intend to support the Feed Thread Extension? I’d think that you would be on the watch for it.

Posted by Sam Ruby at 22:08

I looked at the draft of the Feed Thread Extension specification Sam linked to and it seems like a reinvention of the functionality provided by the slash:comments, wfw:commentRss and annotate:reference extensions. Great, not only do we now have to deal with the an increase in the number of competing XML syndication formats due to the Atom process ...By the way, have you seen the Atom 0.3 vs. Atom 1.0 debate? I told you so... we now also have to deal with duplicates of all the popular RSS extensions as well? Give me a break!

That said, you can expect support for these new extensions in the next version of RSS Bandit. At the end of the day, what matters is building useful software for our users regardless of how many petty annoyances are thrown in our way on the road there.

Categories: RSS Bandit | Syndication Technology

March 27, 2006

@ 07:09 PM

Open Letter to the Google Blogger Team

Thanks to numerous reports from RSS Bandit users it has come to my attention that the Atom feeds provided by Google's Blogger are invalid and in many cases aren't even well-formed XML. Please fix this. I'm tired of dealing with threads like Blogspot feeds - XML Failure in our support forums.

If you'd like an example of what is wrong with your feeds. Click on http://feedvalidator.org/check?url=http://nothing-more.blogspot.com/atom.xml which shows the results of validating the feed for Derek Denny-Brown's blog. Below is the list of errors returned

This feed does not validate.
line 4, column 0: This feed uses an obsolete namespace [help]
<feed xmlns="http://purl.org/atom/ns#" version="0.3" xml:lang="en-US">
line 4, column 0: Unexpected version attribute on feed element [help]
<feed xmlns="http://purl.org/atom/ns#" version="0.3" xml:lang="en-US">
line 7, column 0: type attribute must be "text", "html", or "xhtml" [help]
<title mode="escaped" type="text/html">only this, and nothing more</title>
line 7, column 0: Unexpected mode attribute on title element (7 occurrences) [help]
<title mode="escaped" type="text/html">only this, and nothing more</title>
line 8, column 0: Undefined feed element: tagline [help]
<tagline mode="escaped" type="text/html">irregular eccentic eclecticisms, di ...
line 11, column 0: Undefined feed element: modified [help]
<modified>2006-03-27T00:01:47Z</modified>
line 12, column 0: Unexpected url attribute on generator element [help]
<generator url="http://www.blogger.com/" version="5.15">Blogger</generator>
line 13, column 0: Undefined feed element: info [help]
<info mode="xml" type="text/html">
line 4, column 0: Missing feed element: updated [help]
<feed xmlns="http://purl.org/atom/ns#" version="0.3" xml:lang="en-US">
line 22, column 0: Undefined entry element: issued (6 occurrences) [help]
<issued>2006-03-26T15:25:00-08:00</issued>
line 23, column 0: Undefined entry element: modified (6 occurrences) [help]
<modified>2006-03-27T00:01:47Z</modified>
line 24, column 0: Undefined entry element: created (6 occurrences) [help]
<created>2006-03-27T00:01:47Z</created>
line 27, column 0: type attribute must be "text", "html", or "xhtml" (6 occurrences) [help]
<title mode="escaped" type="text/html">You call that Democracy?</title>
line 36, column 0: Missing entry element: updated (5 occurrences) [help]
</entry>
line 153, column 156: XML parsing error: <unknown>:153:156: unbound prefix [help]
... S-X's niceties. If I knew people on the <st1:place st="on">Vista</st1:pl ...
                                             ^
In addition, this feed has issues that may cause problems for some users. We recommend fixing these issues.
line 5, column 134: service.post is not a registered link relationship (2 occurrences) [help]
... hing more" type="application/atom+xml"/>
                                             ^
line 7, column 66: text/html type used for a document fragment [help]
<title mode="escaped" type="text/html">only this, and nothing more</title>
                                                                  ^
line 4, column 0: Missing atom:link with rel="self" [help]
<feed xmlns="http://purl.org/atom/ns#" version="0.3" xml:lang="en-US">
line 18, column 150: service.edit is not a registered link relationship (6 occurrences) [help]
... emocracy?" type="application/atom+xml"/>
                                             ^
line 27, column 63: text/html type used for a document fragment (6 occurrences) [help]
<title mode="escaped" type="text/html">You call that Democracy?</title>
                                                               ^
line 29, column 0: application/xhtml+xml type used for a document fragment (6 occurrences) [help]
<div xmlns="http://www.w3.org/1999/xhtml">

Thanks for listening.

Categories: RSS Bandit | Syndication Technology

February 18, 2006

@ 06:30 PM

Why IE 7 Can't Import OPML Files from RSS Bandit

Recently there was a question asked on the RSS Bandit forums from a user who was Unable to Import RSSBandit-Exported OPML into IE7. The question goes

I exported my feeds from RSSBandit 1.3.0.42 to an OPML file in hopes of trying the feed support in IE7. IE7 seems to try to import, but ultimately tells me no feeds were imported. The exported file must have over a 100 feeds, so it's not that. Has anyone else been able to import feeds from RSSBandit into IE7?

I got an answer for why this is the case from the Internet Explorer's RSS team. The reason is provided in the the RSS Bandit bug report, Support type="rss" for export of feeds, where I found out that somewhere along the line someone came up with the convention of adding a type="rss" attribute to indicate which were the RSS feeds in an OPML file. The Internet Explorer RSS team has decided to enforce this convention for indicating RSS feeds in an OPML file and will ignore entries that don't have this annotation.

Since RSS Bandit supports both RSS/Atom feeds and USENET newsgroups, I can see the need to be able to differentiate which are the feeds in an OPML file without having applications probe each URL. However I do think that type="rss" is a misnomer since it should also apply to Atom feeds. Perhaps type="feed" instead?

Categories: RSS Bandit | Syndication Technology

February 13, 2006

@ 04:24 PM

Edgeio: An eBay Killer or Just Another Lame Startup?

Work has been quite hectic the past few weeks so I've gotten behind on checking out the sexy new startups coming out of Silicon Valley. The startup that has recently caught my interest is Edgeio, the brain child of Mike Arrington of TechCrunch.

The most succint post I've found on the company is Edgeio - Mike’s Little eBay Killer by Pete Cashmore where he writes

Essentially, Edgeio is an aggregator for classified listings. You can write a classified ad on your blog, tag it with "listing" and let Edgeio pick it up from your feed. Add a few more tags to describe your ad and Edgeio will grab those too. The service will pick up anything tagged with "listing" and obviously that raises the question of spam. But after speaking to Mike, I’m pretty sure he’s on top of it. For instance, you can claim your blog on Edgeio, just like on Technorati. Claiming your blog means that you are now a "member" and your listings are considered more trustworthy. There are also automated ways to remove the worst of the spam. And then there are the user-powered methods - "report spam" buttons and the like.
...
Last of all: the business model. Unlike about 90% of the stuff that gets labelled (tagged?) Web 2.0, Edgeio actually has one. Actually it has a few, but the main monetization method appears to be sponsored listings - pay 25 cents a day to get your listing bumped up to the top. I would have been tempted to pursue a transaction-based model (ie. you take a cut from every sale), but I can see why Edgeio isn’t taking that path for now - handling transactions is a huge job and requires a reputation system, among other things. (And if Edgeio did build a reputation system, I’m pretty sure it would be portable).

Calling Edgeio an eBay killer is probably a bit hyperbolic, but I do think it points the way to how decentralization will undermine the centralized business models of old. Your little walled garden will never be as large, rich and varied as the content that exists out on the open web.

As you can expect from a "Web 2.0 blog", Pete Cashmore's post is full of hyperbole and leaps of faith but there are some interesting ideas here nonetheless. From a technology perspective I assume that Edgeio depends on microformats just like other metadata-in-your-blog-post initiatives such as Structured Blogging. This indicates to me that there now seems to be general consensus amongst the Silicon Valley startup crowd that building a company based on searching blogs and screen scraping their HTML such as PubSub and Technorati have done is the new hotness.

The more interesting thing to me is that the folks at Edgeio are implying that there is a market for a 'Make this blog post a classified listing' checkbox in traditional blog posting tools. From my perspective as someone who works closely with the MSN Spaces and Windows Live Expo teams this sounds very interesting. There is already some integration planned between both services but I'm not sure this is one of the options that was considered. I wonder how much user validation of this beilief Mike Arrington and company did before going ahead with launching their startup?

As far as business models go, I find it hard to imagine why anyone would consider this an eBay killer. I'm not going to claim that people posting things for sale on their blog and then having that picked up by classified listing services is inferior to eBay's model. However, I wonder why anyone thinks that services like eBay, Craig's List and Windows Live Expo wouldn't jump into this market if it turned out to be profitable. Since there doesn't seem to be any barrier to entry, ability to write a Web crawler and minor HTML parsing is all that is required, I wouldn't start eulogizing eBay just yet.

Categories: Syndication Technology

February 7, 2006

@ 12:35 AM

Email and RSS

I've recently been thinking about the overlap and differences between applications for reading email and applications for reading RSS. I started thinking more about this topic after reading the following excerpted blog posts.

In his blog post The RSS Experience in IE7 Joshua Allen wrote

Dare says as much; IE7 was not intended to replace tools like RSS Bandit, NewsGator, or Outlook 12. It's not a matter of trying to keep small ISVs in business, as much as a decision to put the RSS-Bandit style reading experience in the products where it belongs; namely Outlook and OE. IE7 doesn't read NNTP feeds either; that's what OE is for.

In his blog post Email is Abused Omar Shahine wrote

I firmly believe that email is a fantastic tool, and that it’s also heavily abused in the work place. More often than not, what you hear when you send an email is deafening silence or a flurry of incomprehensible replies breaking threading and screwing up the conversation flow.
It is my firm belief that many folks don’t have any system for dealing with their email. They get overwhelmed by the amount of mail that they have, and as a result are unpredictable in getting back to you (if they do).
What this means is that not only do you have to manage your inbox, but you have to manage their inbox. I’ve started to write things down that I want to talk to people about, and every so often, walk into their offices and talk about the issues. It’s weird as this is what I used to do long before email got crazy.

On the one hand, Joshua Allen argues that consuming RSS feeds should be the purvey of traditional mail readers. On the other, Omar Shahine points out that traditional mail readers do a poor job of enabling people to manage information overload in environments with high rates of information flow. I agree 100% with the implications of Omar's post. Traditional 3-pane mail readers do a very poor job of enabling people keep on top of the information they consume. Thus, I think it's a bad idea to add yet another fire hose of information into the mix (i.e. making a traditional mail reader like Outlook my primary RSS reader).

I've not always been of this opinion. A few years ago I wrote a blog post entitled RSS, WinFS and Building a Universal Information Client where I discussed the concept of a universal information aggregator and argued that Outlook was the closest application to what I envisioned. Since then I've become familiar with the term digital lifestyle aggregator (DLA) which is similar to and better defined than my idea of a universal information aggregator. I believe that the DLA concept gives a clear idea of what information aggregators such as personal information managers and RSS readers should evolve into.

Why did I change my mind about Outlook being the ideal DLA? Well, the longer I worked on RSS Bandit, the more I felt that mimicing Outlook in its entirety wasn't the right approach for approaching building an RSS reader. I mentioned some of the problems I have with the Outlook model in my post The Problem With RSS Readers Inspired By Outlook where I wrote

The major problem is that the Outlook mail reading paradigm has a fundamental assumption which turns out to be flawed. It assumes you want to read every item you get in your inbox. This flawed assumption leads to the kind of information overload that hampers the productivity of lots of people I know at work. I've met several people who seem to always have hundreds unread items in their email inbox. For this reason I always have to learn who's easier to reach via IM or swinging by their office in person than sending them mail.

Most people I know get four classes of messages in their information aggregators (I am lumping reading email, reading news and reading RSS/Atom feeds into a single category). These are

1. notifications (checkin mails, comments to my blog, etc)
2. headlines (email newsletters, feeds from news sites, etc)
3. messages sent directly to me or that is similarly relevant
4. messages sent to an interest group I am a part of (XML-DEV mailing list, comp.text.xml newsgroup, etc)

The problem is that the typical Outlook inspired information aggregator treats all of the above as being of equal relevance. Even though Outlook does provide mechanisms for managing assigning relevance to incoming messages, they are either hard to find or cumbersome to use.

This is definitely one of the areas that needs to be improved in the world of information aggregators in general and RSS/Atom readers in particular.

The bottom line is that I think that traditional mail readers do a poor job of enabling people to manage the amount of information they consume today. With RSS, we've had the opportunity to experiment with different models of presenting information to users from "river of news" style aggregators to personalized portal pages instead of sticking to the traditional 2 or 3 pane readers which dominate email and news readers.

Unfortunately, the major browser vendors haven't gotten in on the act. Instead of using RSS as an opportunity to explore new ways of presenting information to users we've seen rather lame attempts at RSS integration into the browser such as Firefox's Live Bookmarks feature and the upcoming integration of RSS into IE 7 which is just slightly better.

So where are we? The major browsers have punted on solving the information overload problem caused by RSS while integrating it into their products. Similarly, mail readers already suck at dealing with email information overload let alone when RSS feeds are added to the mix. As it stands, I'm not sure where we're going to from here. In the meantime, I'm going to start exploring alternative Web browsers like Flock. Perhaps they'll be bolder in re-imagining how to improve the overall experience of people using the World Wide Web today.

Categories: Syndication Technology

February 4, 2006

@ 07:55 PM

Feedback on Windows RSS Platform from the Developer of RSS Bandit

Nick Bradbury has a post entitled Feedback on IE7 Beta 2 from the Developer of FeedDemon where he gives a lot of good feedback on the recently released IE 7 beta from his perspective as the developer of an RSS reader. Although I've given some feedback on the RSS reading functionality of the IE 7 beta, I realize it would be more valuable to give my thoughts on the Winows RSS platform since this is supposed to make the job of people like me who've built RSS readers better. Below is a smattering of feedback divided into pros and cons of using the Windows RSS platform versus using the version we've built for RSS Bandit. Note that as Nick says in his post given that I've already written a lot of the ugly code needed to handle feed downloading, caching, parsing, etc. actually switching to use the Windows RSS platform is a load of unnecessary work for me. My feedback is based on the kind of support I'd need from the platform to implement the scenarios currently supported by RSS Bandit .

PRO

COM API was very straightforward to interact with from .NET applications
Built-in support for downloading enclosures in the background is nice
Good support for asynchronously downloading feeds. This means application developers don't need to write a bunch of multithreaded/asynchronous code themselves. That is definitely a godsend.
One can serialize feed objects to XML

CON

No support for application specific feeds. The Common Feed List assumes that user needs to use the same list of feeds in the various applications used for subscribing to feeds. I think this assumption is fundamentally flawed. I might use one application for downloading podcasts (e.g. iTunes), another for reading blogs (e.g. RSS Bandit), and yet another for browsing photo feeds. Since it doesn't make sense for my blogs to show up in iTunes, it would be cool if I could identify either the type of feed (podcast, text-based, etc) or the favored application for reading the feed via the API.
No support for password protected feeds. The number of password protected feeds on the Web continues to grow, Web sites such as GMail and LiveJournal provide authenticated feeds for users today. As the usage of syndication technologies like RSS continues to grow, the need to support authentication by feed readers will also grow as well. I can imagine a day when I can subscribe to a password protected feed from my bank or credit card company. Not having support for this today is a non-starter.
Support for obtaining XML elements which aren't supported by the API. It would be nice if there was a property for obtaining extension elements in a feed that didn't involve having to convert the feed object to XML then using XPath. Being able to perform a call like Feed.GetItem("http://wellformedweb.org/CommentAPI/ ", "commentRss") to get an element which isn't mapped to a property in the Feed object is a lot more desirable than writing DOM or XPath code to extract that element from the results of calling Feed.Xml
No ability to append application specific metadata to feeds. RSS Bandit supports notions like flagging items and we'd need some way to indicate that items are flagged if we are using the API.

Most of this is just based on reading the Using the Microsoft Feeds API document on MSDN. I'm sure I'd have more feedback if I took a pass at replacing all the feed processing code in RSS Bandit with the Windows RSS platform. However I don't think I'll have time to do that anytime soon.

Categories: Syndication Technology

February 3, 2006

@ 12:36 PM

More People Underwhelmed by RSS Support in IE 7

A few days ago, in my blog post entitled Some Thoughts on the IE 7 Beta 2 Preview release I described the RSS features of IE 7 as unsatisfactory and disappointing. It seems I'm not the only one who dislikes what the Internet Explorer team has done with RSS.

In his post RSS Is a Glorified "Favorites" Feature Scott Karp writes

RSS is in Internet Explorer 7!!! The blogosphere is shouting from the rooftops. Yawn. I tried RSS in IE7, and it highlights the true shortcoming of current RSS applications — it’s really not much of an improvement over “favorites” or “bookmarks.”

IE7 goes so far as to put the RSS reader in same menu as favorites (or as TDavid puts it “A separate “Feed Center” exists inside the Favorites area.”), which appears in a left-hand navigation column.

So what’s the real innovation over Favorites/Bookmarks in terms of user experience? That it “automatically updates”? That I can get everything all in one place? That it highlights what’s new?

In his post RSS Really Sucks Paul Kedrosky writes

A while back I wrote that RSS sucks, and now that I've had some more time to think about it I've come to a deeper and more nuanced conclusion: RSS Really Sucks. The point was driven home recently as I read articles by people arguing that IE7 from Microsoft does RSS well enough to kill off a few standalone aggregators. I suppose, although that's a little like saying that buggy whips drive milk-wagons so well that people will soon stop using willow branches to goad horses.

Why? Because, as Scott Karp points out, the IE7 RSS implementation is as glorified "favorites" -- bookmarks, in other words. And they are particularly irritating bookmarks, ones that continually change and needle you as more "information" (I use that advisedly) comes beeping and streaming into your computer.

The main reason I am so irritated by IE 7's lackluster user experience around RSS is that you only get one chance to make a first impression. Using IE 7 will be the first time millions of people will be introduced to RSS and it would be unfortunate if they come away from thinking that is potentially transformative and liberating technology is simply a kind of "bookmarks that nag you all the time" feature.

I've heard some people say that if Microsoft integrates a high quality RSS reader into the browser then it would kill the desktop aggregator market which is the kind of thing Microsoft gets in legal trouble for all the time. My response? That is Death by Risk Aversion. What matters is making end users happy, not worrying about making features that suck just enough that people have to go out and buy software that does the job well so we don't get in legal trouble.

Categories: Syndication Technology

February 2, 2006

@ 07:50 PM

Thank You to Mark Fletcher and the Bloglines Crew

In his post Thanks Bloglines Mike Torres writes

Over the course of the last few days, we noticed a problem in the way Bloglines was displaying feeds from MSN Spaces. This problem was due to our recent URL change and the way we're redirecting visitors from http://spaces.msn.com/members/mike (as an example) to http://spaces.msn.com/mike. Instead of providing the absolute URL to the RSS feed when Bloglines and others requested the feed, we're only returning the relative URL (i.e. "/mike").

Because of this, Bloglines had to turn around a fix to support relative URL redirects in record time. Within just a couple of hours of contacting them, they had diagnosed the issue, fixed up all the Spaces feeds in their entire system, and patched the redirect logic to make sure it wouldn't happen again. During this time, the subscriber lists/counts associated with a feed weren't updated for a little while (my 362 subscribers showed as 9, but my ego wasn't bruised) and they even did the extra work to merge "new" feeds with "old" feeds (because when the feed broke, and someone subscribed to the correct feed, Bloglines then had two records for the 'same' feed).

In short, this was truly great work by Mark Fletcher and the Bloglines staff. Sorry guys for keeping you up so late on a Tuesday night! We'll be making a change to the way we redirect shortly just to make sure this won't be a problem for anyone else in the future. And for you Bloglines users, you should be back to normal for any MSN Spaces feeds in your list!

Mike and I exchanged mail with Mark Fletcher about this issue on Tuesday, and as he writes we were both grateful and impressed at how quickly the Bloglines folks made changes to fix the consequences of a bug in how we were sending HTTP redirects. Mad props to Mark Fletcher and the folks at Bloglines. You guys definitely rock.

Categories: Syndication Technology | Windows Live

February 2, 2006

@ 07:41 PM

RSS Bandit and the Windows RSS Platform

In a recent post on the RSS Bandit forums entitled Microsoft Feeds API, one of our users asked when we plan to take addvantage of the Windows RSS Platform. Specifically the question asked was

I downloaded IE7 beta 2 preview and one of the new features is feed support. There really is no comparison between RSS Bandit and the minimilist support IE7 provideds however it is my understanding that there is more to the feed support then the basic UI IE provides. Apparently the Feed API is highly integrated into Vista and has been backported to XP as part of IE7. I would like to suggest (and I have no idea if this is even possible) that RSS Bandit embrace this new API. It's in it's early stages and I'd like to see it mature with the help of RSS Bandit into a usable common feed store for any number of apps (RSS Bandit, IE, plus anything else that decides to use it like the sample screensaver app in the API documentation.) Perhaps that's asking too much but I figured I should at least put the idea out there. It would be really great if, while surfing the net in IE I could subscribe to rss feeds using the IE mechanism, maybe take a quick look through them in IE, then later go back in RSS Bandit and be able to use the much more powerful features (such as stored searches, and folder aggregation) without having to have two copies of my actual feed lists to maintain.

Here's a link to the only info I could find on it so far:
http://msdn.microsoft.com/library/en-us/FeedsAPI/rss/rss_entry.asp

To the various people who have asked this question [including my friends on the IE team], the answer is YES we will support the Windows RSS Platform. As Walter, wrote in his post on Windows RSS Platform there are 3 main components of the platform; the Common Feed List, Feed Synchronization Engine, and Feed Store. Ideally I'd like to use all 3 in RSS Bandit but suspect that it'll be difficult to switch to using the Feed Synchronization Engine or the Feed Store provided by the Windows RSS platform. For example, our feed synchronization engine supports subscribing to USENET newsgroups which I doubt the Feed Syncrhonization Engine in the Windows RSS platform supports. On the other hand, it should be straightforward to satisfy the scenario requested in the quoted post where items subscribed in IE 7 are reflected in RSS Bandit and vice versa.

I need to work out the user interface with Torsten but it should be easy for us to support the Common Feed List. My current thinking is that we'll have a special folder for "My Internet Explorer Subscriptions" as opposed to mirroring the entire feed list in RSS Bandit within IE 7 and vice versa. Thoughts?

Categories: RSS Bandit | Syndication Technology

December 27, 2005

@ 05:40 PM

Google Reader API Coming Soon?

Niall Kennedy has a blog post entitled Exclusive: Google to offer feed API where he reveals

Google plans to offer a feed reader API to allow third-party developers to build new views of feed data on top of Google's backend. The new APIs will include synchronization, feed-level and item-level tagging, per-item read and unread status, as well as rich media enclosure and metadata handling. Google Reader PM Jason Shellen and engineer Chris Wetherell both confirmed Google's plans after I posted my reverse-engineering analysis of the Google Reader backend.

The new APIs will allow aggregator developers to build new views and interactions on top of Google's data. Google currently has at least two additional Google Reader views running on current development builds.

Google may offer public access to the feed API as early as next month. Shellen said the team wants to nail a few more bugs before publicly making the service available to the world.
...
Google's new offering is direct competition to NewsGator's synchronization APIs but are easier to code against (no SOAP required). Google currently does not have the same reach across devices as NewsGator but an easy-to-use API from the guys who brought you the Blogger API and "Blog This!" might really shake up the feed aggregator ecosystem.

As someone who's been thinking about synchronization between RSS readers for a few years I definitely see this as a welcome development. The Bloglines sync API is too limited in its functionality to be useful while the NewsGator API is both complex and designed with too many assumptions to be widely usable. However, unlike Niall, I blame the complexity of the NewsGator API on the data model and expected data flow than on whether it uses SOAP versus Plain Old XML (POX) as the wire format.

Once the Google Reader API ships, I'll definitely investigate the feasibility of adding support for it to the Jubilee release of RSS Bandit.

Categories: Syndication Technology

December 19, 2005

@ 09:59 PM

How the NewsGator API Ruined my Christmas (Updated)

We were planning to ship a bugfix release of RSS Bandit before Christmas which fixed all the major issues reported in the recently Nightcrawler edition of RSS Bandit.

Unfortunately, it seems that either due to complexity or buginess I simply can't get the NewsGator API to perform the straightforward task of marking something as read in NewsGator Online which was viewed in RSS Bandit. I spent all of yesterday afternoon plus a couple of hours this morning working on it and I've finally given up. This feature simply won't work in the bugfix release shipping later this week. Maybe I'll have better luck when we ship the Jubilee release.

To make myself feel better, I'll work on fixing some of the Atom parsing bugs reported by Phil Ringnalda and the issues with password protected newsgroups. Nothing like having your self-worth defined by how many bugs you close in a database on SourceForge.

Update: So not only have I already fixed the newsgroup issues and the problems with parsing Atom feeds pointed out by Phil Ringnalda but I just got pinged by Gordon Weakliem who is the developer of the NewsGator API. Perhaps my Christmas can be salvaged after all.

Categories: RSS Bandit | Syndication Technology

December 15, 2005

@ 06:58 PM

The Value of Web Platforms: Bloglines vs. Newsgator Online

Alan Kleymeyer has a post entitled NewsGator Online where he writes

I've switched my online news aggregator from Bloglines to Newsgator. First, I wanted to try it out and compare it to Bloglines. I like the interface better, especially in how you mark things as read. I've swithched for good. I mainly switched so that I can continue using RSS Bandit and get the benefit of syncing between it and an online news aggregator (supported in latest RSS Bandit 1.3.0.38 release)

Alan's post describes exactly why creating APIs for your online service and treating it as a Web platform and not just a web site is important. What would you rather use, a web-based aggregator which provides limited integration with a few desktop aggregators (i.e.Bloglines) OR a web-based aggregator which provides full integration with a variety of free and payware aggregators including RSS Bandit, NetNewsWire and FeedDemon? Building out a Web platform is about giving users choice which is what the NewsGator guys have done by providing the NewsGator API.

The canonical example of the power of Web APIs and Web platforms is RSS. Providing an RSS feed liberates your readers from the limitations of using one application (the Web browser) and one user interface (your HTML website) to view your content. They can consume it on their own terms using the applications that best fit their needs. Blogging wouldn't be as popular as it is today if not for this most fundamental of web services.

The topic of my ThinkWeek paper was turning Web sites into Web platforms and I was hoping to get to give a presentation about it at next year's O'Reilly Emerging Technology Conference but it got rejected. I guess I'll just have to keep shopping it around. Perhaps I can get it into Gnomedex or Mix '06. :)

Categories: Syndication Technology | Web Development

December 12, 2005

@ 01:05 AM

The Newsgator API Continues to Frustrate Me

The number one problem that faces developers of feed readers is how to identify posts. How does a feed reader tell a new post from an old one whose title or permalink changed? In general how you do this is to pick a unique identifier from the metadata of the feed item to use to tell it apart from others. If you are using the Atom 0.3 & 1.0 syndication formats the identifier is the <atom:id> element, for RSS 1.0 it is the rdf:about attribute and for RSS 0.9x & RSS 2.0 it is the <guid> element.

The problem is that many RSS 0.9x & 2.0 feeds do not have a <guid> element which usually means a feed reader has to come up with its own custom mechanism for identifying items. In many cases, using the <link> element is enough because most items in a feed map to a single web resource with a permalink URL. In some pathological cases, a feed may not have <guid> or <link> OR even worse may use the same value in the <link> element for each item in the feed. In such cases, feed readers usually resort to heuristics which are guaranteed to be wrong at least some of the time.

So what does this have to do with the Newsgator API? Users of recent versions of RSS Bandit can synchronize the state of their RSS feeds with Newsgator Online using the Newsgator API. Where things get tricky is that this means that both the RSS Bandit and Newsgator Online either need to use the same techniques for identifying posts OR have a common way to map between their identification mechanisms. When I first used the API, I noticed that Newsgator has it's own notion of a "Newsgator ID" which it expects clients to use. In fact, it's worse than that. Newsgator Online assumes that clients that synchronize with it actually just fetch all their data from Newsgator Online including feed content. This is a pretty big assumption to make but I'm sure it made it easier to solve a bunch of tricky development problems for their various products. Instead of worrying about keeping data and algorithms on the clients in sync with the server, they just replace all the data on the client with the server data as part of the 'synchronization' process.

Now that I've built an application that deviates from this fundamental assumption I've been having all sorts of interesting problems. The most recent being that some users complained that read/unread state wasn't being synced via the Newsgator API. When I investigated it turned out that this is because I use <guid> elements to identify posts in RSS Bandit while the Newsgator API uses the "Newsgator ID". Even worse is that they don't even expose the original <guid> element in the returned feed items. So now it looks like fixing the read/unread not being synced bug involves bigger and more fundamental changes than I expected. More than likely I'll have to switch to using <link> elements as unique identifiers since it looks like the Newsgator API doesn't throw those away.

Frustrating.

Categories: RSS Bandit | Syndication Technology

November 30, 2005

@ 05:35 PM

Yahoo! Mail and RSS

From TechCrunch we learn RSS is Now Integrated into Yahoo Mail and Alerts. This is a great addition to the service and something I'll definitely be trying out in my Yahoo! Mail account once it is available. I wonder if they'll provide an API to allow desktop RSS readers to synchronize their state like Bloglines and Newsgator Online do?

The following is an excerpt from Michael Arrington's post on the announcement

Yahoo gathered a small group of bloggers, press and others at Sauce in San Francisco tonight to announce the launch of two new RSS products. They have integrated an RSS reader directly into Yahoo Mail Beta, and are expanding Alerts to include RSS feeds.

These are significant new products, aimed squarely at new and mainstream RSS users. The service is not live as of the time I am posting this. I’ve added a screen shot picture from the live demo.

Mail

Yahoo has deeply integrated RSS into the Yahoo Mail beta experience. Directly below the email folders are “RSS folders”. Clicking on the top folder show all posts in a “river of news” format, meaning all posts for all subscribed feeds are listed in the order they have appeared in feeds.

Each feed also has its own folder, allowing the user to read feeds individually (more like bloglines).

A post from any feed is treated exactly like an email - any post can be forwarded as an email or dragged into a folder and saved. All of the great AJAX functionality already working in Yahoo’s Mail beta works with the new RSS functionality as well.

Adding feeds is straightforward - include the feed URL or choose from a number of popular feeds.

Categories: Syndication Technology

November 21, 2005

@ 07:21 PM

Microsoft Announces Simple Sharing Extensions for RSS and OPML

In his post Really Simple Sharing, Ray Ozzie announced Simple Sharing Extensions for RSS and OPML. He writes

As an industry, we have simply not designed our calendaring and directory software and services for this “mesh” model. The websites, services and servers we build seem to all want to be the “owner” and “publisher”; it’s really inconsistent with the model that made email so successful, and the loosely-coupled nature of the web.
Shortly after I started at Microsoft, I had the opportunity to meet with the people behind Exchange, Outlook, MSN, Windows Mobile, Messenger, Communicator, and more. We brainstormed about this “meshed world” and how we might best serve it - a world where each of these products and others’ products could both manage these objects and synchronize each others’ changes. We thought about how we might prototype such a thing as rapidly as possible – to get the underpinnings of data synchronization working so that we could spend time working on the user experience aspects of the problem – a much better place to spend time than doing plumbing.

There are many great item synchronization mechanisms out there (and at Microsoft), but we decided we’d never get short term network effects among products if we selected something complicated – even if it were powerful. What we really longed for was "the RSS of synchronization" ... something simple that would catch on very quickly.
Using RSS itself as-is for synchronization wasn't really an option. That is, RSS is primarily about syndication - unidirectional publishing - while in order to accomplish the “mesh” sharing scenarios, we'd need bi-directional (actually, multi-directional) synchronization of items. But RSS is compelling because of the power inherent in its simplicity.
...
And so we created an RSS extension that we refer to as Simple Sharing Extensions or SSE. In just a few weeks time, several Microsoft product groups and my own 'concept development group' built prototypes and demos, and found that it works and interoperates quite nicely.
We’re pretty excited about the extension - well beyond the uses that catalyzed its creation. It’s designed in such a way that the minimum implementation is incredibly easy, and so that higher-level capabilities such as conflict handling can be implemented in those applications that want to do such things.

The model behind SSE is pretty straightforward; to sychronize data across multiple sources, each end point provides a feed and the subscribes to the feeds provided by the other end point(s). I hate to sound like a fanboy but SSE is an example of how Ray Ozzie showed up at Microsoft and just started kicking butt. I've been on the periphery of some of the discussions of SSE and reviewed early drafts of the spec. It's been impressive seeing how much quick progress Ray made internally on getting this idea polished and evangelized.

The spec looks good modulo the issues that tend to dog Microsoft when it ships specs like this. For example, is a lack of detail around data types (e.g. nowhere is the date format used by the spec documented although you can assume it's RFC 822 dates based on the examples) and there is also the lack of any test sites that have feeds which use this format so enterprising hackers can quickly write some code to prototype implementations and try out ideas.

Sam Ruby has posted a blog entry critical of Microsoft's practices when it publishes RSS extension specifications in his post This is Sharing? where he writes

The first attribute that the the Simple Sharing Extensions for RSS and OPML is to “treat the item list as an ordered set”. This sounds like something from the Simple List Extensions Specification that was also hatched in private and then unleashed with great fanfare about five months ago. Sure a wiki was set up, but any questions posted there were promptly ignored. The cone of silence has been so impenetrable that even invoking the name Scoble turns out to be ineffective.

Now the Simple List Extensions Specification URI redirects to an ad for vaporware. Some things never change.

Should we wait for version 3.0?

I agree with all of Sam's feedback. Hopefully Microsoft will do better this time around.

Categories: Syndication Technology

November 20, 2005

@ 08:01 PM

The Perils of Premature Standardization: Attention Data and OPML

Nick Bradbury has a post entitled An Attention Namespace for OPML where he writes

In a recent post I said that OPML would be a great format for sharing attention data, but I wasn't sure whether this would be possible due to uncertainty over OPML's support for namespaces.
...
As I mentioned previously, FeedDemon already stores attention data in OPML, but it uses a proprietary fd: namespace which relies on attributes that make little sense outside of FeedDemon. What I propose is that aggregator users and developers have an open discussion about what specific attention data could (and should) be collected by aggregators.
Although there's a lot of attention data that could be stored in OPML, my recommendation is that we keep it simple - otherwise, we risk seeing each aggregator support a different subset of attention data. So rather than come up with a huge list of attributes, I'll start by recommending a single piece of attention data: rank.

We need a way to rank feeds that makes sense across aggregators, so that when you export OPML from one aggregator, the aggregator you import into would know which feeds you're paying the most attention to. This could be used for any number of things - recommending related feeds, giving higher ranked feeds higher priority in feed listings, etc.

Although user interface and workflow differences require each aggregator to have its own algorithm for ranking feeds, we should be able to define a ranking attribute that makes sense to every aggregator. In FeedDemon's case, a simple scale (say, 0-100) would work: feeds you rarely read would get be ranked closer to zero, while feeds you read all the time would be ranked closer to 100. Whether this makes sense outside of FeedDemon remains to be seen, so I'd love to hear from developers of other aggregators about this.

I used be the program manager responsible for a number of XML technologies in the .NET Framework while I was on the XML team at Microsoft. The technology I spent the most time working with was the XML Schema Definition Language (XSD). After working with XSD for about three years, I came to the conclusion that XSD has held back the proliferation and advancement of XML technologies by about two or three years. The lack of adoption of web services technologies like SOAP and WSDL on the world wide web is primarily due to the complexity of XSD. The fact that XQuery has spent over 5 years in standards committees and has evolved to become a technology too complex for the average XML developer is also primarily the fault of XSD. This is because XSD is extremely complex and yet is rather inflexible with minimal functionality. This state of affairs is primarily due to its nature as a one size fits all technology with too many contradictory design objectives. In my opinion, the W3C XML Schema Definition language is a victim of premature standardization. The XML world needed experiment more with various XML schema languages like XDR and RELAX NG before we decided to settle down and come up with a standard.

So what does this have to do with attention data and XML? Lots. We are a long way from standardization. We aren't even well into the experimentation stage yet. How many feed readers do a good job of giving you an idea of which among the various new items in your RSS inbox are worth reading? How many of them do a good job suggesting new feeds for you to read based on your reading habits? Until we get to a point where such features are common place in feed readers, it seems like putting the cart way before the horse to start talking about standardizing the XML representation of these features.

Let's look at the one field Nick talks about standardizing; rank. He wants all readers to track 'rank' using a numeric scale of 1-100. This seems pretty arbitrary. In RSS Bandit, users can flag posts as Follow Up, Review, Read, Reply or Forward. How does that map to a numeric scale? It doesn't. If I allowed users to prioritize feeds, it wouldn't be in a way that would map cleanly to a numeric scale.

My advice to Nick and others who are entertaining ideas around standardizing attention data in OPML; go build some features first and see which ones work for end users and which ones don't. Once we've figured that out amongst multiple readers with diverse user bases, then we can start thinking about standardization.

Categories: RSS Bandit | Syndication Technology

November 20, 2005

@ 07:21 PM

Comments [11]

Reading Lists in OPML and RSS Bandit

We are getting down to the end game for getting the Nightcrawler release of RSS Bandit. This is where all the more unfun parts of the release happen such as dealing with translations and tracking down performance bugs such as memory leaks or issues with multithreading. To take my mind of some of the tedium I'm going to have to deal with today, I've decided to spend some time thinking about the Jubilee release of RSS Bandit which should ship sometime next year

One of the features I'm evaluating is Reading lists for RSS which was discussed by Nick Bradbury in a blog entry he posted last month where he wrote

Last week Dave Winer proposed the idea of reading lists for RSS, which are more-or-less OPML subscriptions. I like this idea - a lot - and in fact a few FeedDemon customers have requested this feature in the past.

In a nutshell, the idea is that you'd subscribe to an OPML document which contains a list of feeds that someone is reading, some organization is recommending, or some service has generated (such as "Top 100" list). Changes to the source OPML document would be synchronized, so that you're automatically subscribed to feeds added to the reading list. Likewise, you'd be unsubscribed from feeds removed from the original OPML.

There are a number of implementation details that would need to be worked out (ex: would a FeedDemon user really want to be automatically unsubscribed from feeds dropped from the source OPML, especially if that user had flagged some posts in those feeds for future reference?), but details aside, I'm curious whether this is something you'd like to see, and if so, how do you think the idea can be improved upon?

This feature initially made me skeptical since it seems like a solution looking for a problem. Then again I thought the same thing about enclosures in RSS and I've been proved wrong by the podcasting phenomenom. So instead of ignoring the idea I'd like to see whether our users think this feature makes sense and if so how they expect us to resolve certain problems that would arise from implementing such a feature.

The first problem that comes up in implementing RSS reading lists based on OPML is what to do when a feed is pulled from the list by the owner of the feed list. Do we automatically delete the subscription? Do we prompt the user and if they decide to stay subscribed to the feed, move it out of the reading list? Another question is how to deal with feeds that the user is already subscribed to that are in the reading list?

What do you think?

Categories: RSS Bandit | Syndication Technology

November 4, 2005

@ 10:18 PM

Comments [11]

Google Desktop is Back with a Vengeance

Google has reintroduced their Google Desktop with a vengeance. It was evil enough the first time around, but this time it’s downright scary. My original complaint was that Google Desktop ignores basic practices amongst RSS readers for saving bandwidth on the sites it is polling. It was pinging my site every 5 minutes asking for updates without caching the results and thus was using an unreasonable proportion of my bandwidth.

Since a new version was recently released, I decided to try it out to see if the issue had been fixed since I sent them mail. I installed Fiddler to monitor the traffic of the application and what I found out surprised me a great deal. Google Desktop not only pings sites every 5 minutes in a manner inconsiderate of their bandwidth but it also does so without the users direction. Below is a screenshot of some of the HTTP traffic generated by Google Desktop

The highlighted requests are requests to URLs of Atom & RSS feeds that were in my browser cache by Google Desktop. I did not configure the application to fetch these feeds. So not only does Google Desktop flood websites with feed requests in a manner bordering on the behavior of a malicious application, it also does this automatically without the end user explicitly subscribing to the feed.

That's messed up.

Categories: Syndication Technology

November 4, 2005

@ 06:33 PM

IE 7 will only accept well-formed XML in web feeds

In the post Feeds and well-formed XML Sean Lyndersay of the IE RSS team writes

Our years of experience in with HTML in Internet Explorer have taught us the long-term pain that results from being too liberal with what you accept from others. Hence, we’ve adopted the following overriding principle for IE 7 and RSS platform in Windows Vista:

We will only support feeds that are well-formed XML.

This principle allows us to build a more predictable feed parser. As a platform, it's important that applications using the platform to consume feeds can rely on the fact that the platform will always be providing information in the way that the publisher intended (trying to guess what a publisher meant to do when there is an error in a feed can be tricky, at best). We also spoke to several people in the RSS and developer community at Gnomedex and at PDC, and they wholeheartedly supported this.

Hell Yeah!!!

Categories: Syndication Technology

October 16, 2005

@ 06:09 PM

Where is the Killer App for Events?

In his post Betty Dylan, Railroad Tavern, Sunday 8PM Jon Udell writes

I wondered why online services like upcoming.org hadn't yet gone viral, and I made a few suggestions, which were well received. But to be honest, the Keene, NH metro in Upcoming is no more lively now -- a day after Yahoo acquired Upcoming -- than it was six months ago.

Case in point: the Betty Dylan band is coming to Keene on Sunday and Monday. I know this because a friend organized the event. But neither of the venues' websites -- Railroad Tavern and Keene State College -- has the information. Nor does the Keene Sentinel. What's more, none of these three websites makes calendar information available as RSS feeds.

Yahoo's acquisition of Upcoming will certainly help move things along. As will the growing visibility of other such services, notably ~~EVDB~~Eventful. But since I expect no single one of these to dominate, or to supplant the existing calendars maintained by newspapers, colleges, and other venues, we have to think in terms of syndication and federation.

RSS is a big part of the story. Calendar publishers need to learn that information made available in RSS format will flow to all the event sites as well as to individual subscribers.

I think, like me, Jon Udell is grabbing a hold on things from the wrong end of the stick. When I first started working on the platform behind MSN Spaces, one of my pet scenarios was making it easier to create blog posts about events then syndicating them easily. One of the things I slowly realized is that unlike blogging which has killer apps for consuming syndicated content (RSS readers) there really isn't anything similiar for calendar events nor is there likely to be anything compelling in that space in the near future. The average home user doesn't utilize calendaring software nor is there incentive to start using such software. Even if every eventing website creates RSS feeds of events, the fact is that my girlfriend, my mom and even me don't maintain calendars which would benefit from being able to consume this data.

The corporate user is easier since calendaring software is part of communications clients like Outlook and Lotus Notes. However those aren't really the targets of sites like Upcoming or Eventful, however I suspect those are their best bets for potential users in the near term.

Categories: Syndication Technology

October 11, 2005

@ 02:32 PM

Trying Out Google Reader

I never got to try out Google Reader last week because the service was too slow, so I gave it a shot again this morning. My thoughts on the application pretty much identical to Dave Winer's thoughts on the application where he wrote

I tried the Google news reader again, this morning, after it had loaded all my feeds (it seems to take quite a few hours to do that).This is the second blog-related product they've come out with recently that appears not to have been touched by human beings before it was introduced to the world (the other was the ridiculous blog search). I think they need to start using their own stuff before releasing it. And maybe look at the competition for ideas. When you're first into a market there's an excuse for being so wrong. But the first of this kind of software shipped six years ago. To give you a comparison, Visicalc shipped in 1979. By 1985 we had been through two generations of spreadsheets with Lotus 1-2-3 and Excel. Google's reader is a huge step backward from what was available in 1999. The arrogance is catching up with them.

I actually tried writing my own review but gave up because it kept seeming too negative and I try not to snipe at products made by our competitors. Still, I am stunned that they let this application out of door in the shape it's in.

Categories: Syndication Technology

September 20, 2005

@ 12:22 PM

Bloglines 1, Google Blog Search 0

I have to agree with Robert Scoble that Google's blog search not as good at link searching.

The only feature I use the various blog search engines like Feedster, Technorati, IceRocket and Google Blog Search for is looking for references to my posts which may not have shown up in my referer logs. Therefore, the only feature I care about is link searching and my main quality criteria is how fresh the index is. Here, Bloglines Citations Search is head and shoulders above everything else out there today. I've been using the various blog search engines every day for the past few weeks and Bloglines is definitely at the head of the pack.

Compare and contrast,

Categories: Syndication Technology

September 18, 2005

@ 10:16 PM

Updated: Some Thoughts on the Start.com's RSS Extensions

I mentioned last week that currently with traditional portal sites like MyMSN or MyYahoo, I can customize my data sources by subscribing to RSS feeds but not how they look. Instead all my RSS feeds always look like a list of headlines. Start.com fundamentally changes this model by turning it on its head. I can create an RSS feed and specify how it should render in Start.com using JavaScript in extension elements which basically makes it a Start.com gadget, no different from the default ones provided by the site. For example, I can create an RSS feed for weekly weather reports and specify that they should rendered as shown below within Start.com

Scott Isaacs gives some descriptions of how the RSS extensions used by Start.com work in his post Declaring Gadgets for Start.com using "RSS". He writes

Introduction to the Gadget Manifest

First, let's look at the Gadget manifest format. For defining manifests, we basically reused the RSS schema. This format decision was driven by the fact we already have a parser in Start.com's application model for RSS, there is broad familiarity with the RSS format, and I personally did not want to invent yet another schema . While we reused the RSS schema, we do recognize that these are not typical RSS feeds as they are not intended to be consumed and directly rendered by an aggregator. Therefore, we are considering whether we should use a different file extension or root element (e.g., replace RSS with Manifest) but still leverage the familiar tags. For the sake of simplicity, we chose to ship reusing RSS as the format and then listen to the community on how to proceed. We are very open to suggestions.

Looking at the Gadget manifest, we extended the RSS schema with one custom tag, and one custom attribute. We defined those under the binding namespace. Below is a sample Gadget manifest:

<?xml version="1.0"?>
<rss version="2.0" xmlns:binding=" http://www.start.com ">
   <channel>
      <title>Derived Hello World</title>
      <link>http://yourhomepage.com</link>
      <description>A sample hello world binding.</description>
      <language>en-us</language>
      <pubDate>Wed, 27 Apr 2005 04:00:00 GMT</pubDate>
      <lastBuildDate>Wed, 27 Apr 2005 04:00:00 GMT</lastBuildDate>
      <binding:type>Demo.MyHelloWorld</binding:type>
      <item>
         <link>http://siteexperts.com/bindings/MyHello.js</link>
      </item>
      <item>
         <link binding:type="inherit">http://siteexperts.com/bindings/hello.xml</link>
      </item>
      <item>
         <link binding:type="css">http://siteexperts.com/bindings/myHelloWorld.css</link>
      </item>
   </channel>
</rss>

Looking at the Gadget manifest, until we reach an RSS item, the semantics of the existed RSS tags is maintained. The title serves as the Gadget title, link typically points to your home page or page about your Gadgets, description is your Gadget's description, and so on. The added binding:type element serves as the Gadget class to instantiate from the associated resources.

Looking at each item, we do know that we left off the required title and description since this file is not intended to be directly viewed. However, adding those tags could be useful to help describe the resources being used.

The last change is we added a binding:type attribute to each resource. We currently support three types: script (the default), css, and inherit. Inherit would point to another "RSS" manifest file that would be further consumed.

Assocating a Manifest with a Feed

Start.com supports loading stand-alone Gadgets directly from a manifest. In addition, You can now define a Gadget that presents a custom experience your feed. This is very useful for a number of scenarios...The custom experiences are defined using the "RSS" Manifest format described above. However, since these Gadgets for RSS feeds are driven by the feed itself, we needed to extend traditional RSS with a single extension. This extension associates a manifest with the feed. We created a new channel element, binding:manifest that can be included in any RSS feed. This element specifies the Gadget manifest to use for the feed.

<binding:manifest environment="Start" version="1.0">
http://siteexperts.com/bindings/rumorcity.xml
</binding:manifest>

We created this element to not be coupled to any single implementation. Hence the required environment element. Aggregators that understand the manifest tag can examine the environment value. If they support the specified environment, they can choose to present the custom experience.

Despite the fact that I kicked off some of the initial discussions with Steve Rider for what are now Start.com gadgets, I haven't paid much attention to the design since Start.com is a work in progress. Based on the current design, I have two primary pieces of feedback.

I'd suggest picking a different namespace URI. XML namespace URIs usually point to documentation about the format, in the cases that they don't it is often a cause of consternation amongst developers. For example, most XML namespaces used by Microsoft are from the schemas.microsoft.com domain which often point to schemas for the various Microsoft XML vocabularies. In the cases where they don't it is likely that they will in future. See http://www.google.com/search?q=+site:schemas.microsoft.com+%22schemas.microsoft.com%22 for some examples.
If Gadget manifests aren't supposed to be consumed by traditional RSS aggregators then Start.com should not use RSS as it's manifest format. The value of using RSS is that even if a client doesn't understand your extensions then the feed is still useful. Start.com currently breaks that assumption which to me is an abuse of RSS.

Scott is currently seeking feedback for the Start.com RSS extensions and I'd suggest that interested parties should post some comments about what they like or dislike so far in response to his blog post.

Update: Since writing this post I've exchanged some mail with the Start.com team and in addition to my feedback we've discussed feedback from folks like Phil Ringnalda and James Snell. The Start.com team used RSS as the gadget manifest file format as an experiment in the spirit of the continuous experiment that is http://www.start.com. Based on the feedback from the community, alternatives will be considered and fully documented when the choices have been made. Given my experience in XML syndication technologies I'll be working with the Start.com team on exploring alternatives to the current techniques used for creating gadget manifests as well as documenting them.

Keep the feedback coming.

Categories: Syndication Technology

September 13, 2005

@ 06:04 PM

Open Letter to Google: Please Fix the RSS Reader in Google Desktop

While perusing my referrer logs I noticed that I was receiving a large number of requests from Google Desktop. In fact, I was serving up more pages to Google Desktop users than to RSS Bandit users. Considering that my RSS feed is included by default in RSS Bandit and there have been about 200,000 downloads of RSS Bandit this year it seemed extremely unlikely that there are more people reading my feed from Google Desktop than RSS Bandit.

After grepping my referrer logs, I noticed an interesting pattern when it came to accesses from the Google Desktop RSS reader. Try and see if you notice it from this snapshot of a ten minute window of time.

2005-09-13 16:31:05 GET /weblog/SyndicationService.asmx/GetRss - 80.58.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:32:13 GET /weblog/SyndicationService.asmx/GetRss - 65.57.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:32:36 GET /weblog/SyndicationService.asmx/GetRss - 209.221.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:33:05 GET /weblog/SyndicationService.asmx/GetRss - 64.116.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:33:12 GET /weblog/SyndicationService.asmx/GetRss - 209.204.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:33:20 GET /weblog/SyndicationService.asmx/GetRss - 68.188.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:34:48 GET /weblog/SyndicationService.asmx/GetRss - 209.221.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:35:25 GET /weblog/SyndicationService.asmx/GetRss - 64.116.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:35:32 GET /weblog/SyndicationService.asmx/GetRss - 209.204.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:35:40 GET /weblog/SyndicationService.asmx/GetRss - 68.188.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:36:14 GET /weblog/SyndicationService.asmx/GetRss - 80.58.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:37:33 GET /weblog/SyndicationService.asmx/GetRss - 65.57.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:37:46 GET /weblog/SyndicationService.asmx/GetRss - 209.204.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:37:55 GET /weblog/SyndicationService.asmx/GetRss - 68.188.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:37:55 GET /weblog/SyndicationService.asmx/GetRss - 64.116.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:39:19 GET /weblog/SyndicationService.asmx/GetRss - 196.36.*.1* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:39:31 GET /weblog/SyndicationService.asmx/GetRss - 12.103.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:39:55 GET /weblog/SyndicationService.asmx/GetRss - 18.241.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:40:11 GET /weblog/SyndicationService.asmx/GetRss - 63.211.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:40:15 GET /weblog/SyndicationService.asmx/GetRss - 64.116.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200
2005-09-13 16:41:23 GET /weblog/SyndicationService.asmx/GetRss - 80.58.*.* Mozilla/4.0+(compatible;+Google+Desktop) - 200

The *'s are there to protect the privacy of the people accessing my RSS feed. However it is clear that not only is Google Desktop fetching my RSS feed every 5 minutes it is also not using HTTP Conditional GET requests. WTF?

Since I couldn't find a place to send feedback about this product, I've posted it to my blog. I hope Google fixes this soon, I'd hate to have to ban their client because it is wasting my bandwidth.

Categories: Syndication Technology

September 1, 2005

@ 07:49 PM

Jon Udell on RSS feed redirection

In his post The saga of RSS (dis)continuity Jon Udell writes

It's been almost three years since I first wrote about the problem of RSS feed redirection. From time to time I'm reminded that it's still a problem, and today I noticed that two of the blogs I read were affected by it. I was subscribed to John Ludwig at www.theludwigs.com/index.rdf, and today's entry says "Feed moved -- pls check out www.theludwigs.com/index.xml." In fact he's got an index.xml and an atom.xml, and the latter seems to correspond to what's actually published on the blog, but either way the issue is that we've still yet to agree on a standard way for newsreaders to follow relocated feeds.

Jon Udell is incorrect. There is a standard way to redirect feeds that is supported by a large number of RSS readers and it is called "just use HTTP". Many RSS readers including RSS Bandit support the various status codes for indicating that the location of a resource has changed temporarily or permanently as well as when the resource is no longer available.

Instead of constantly reinventing the wheel and looking for solutions to problems that have already been solved, a better use of our energy should be evangelizing how to properly use the existing technology.

Jon Udell does point out

So far as I know, that's where things stand today. If you control your server, you can of course do an HTTP-level redirect. But your blog is hosted, you probably can't, in which case you need to use the feed itself to signal the redirect.

This part just boggles my mind. If the user's blog is hosted (e.g. they are a LiveJournal, MSN Spaces or BlogSpot user) then not only can't they control the HTTP headers emitted by the server but they don't control their web feed either. So what exactly is the alternate solution that works in that case? If anything, this points to the fact that blog hosting services should give users the ability to redirect their RSS feed when they leave the service. This is a feature request for the various blog hosting services not an indication that a new technical solution is needed.

Categories: Syndication Technology

August 29, 2005

@ 05:41 PM

I Keep Trying To Use Newsgator Online and I Keep Failing

Since I've been in the process of adding support for synchronization between RSS Bandit and Newsgator Online, I've been trying to eat my own dogfood and use both applications. A ready opportunity presented itself when I travelled to Nigeria a few weeks ago and wanted to keep up with my RSS feeds. While I was in Nigeria, I was always on a dialup connection and used about four different PCs and 1 Mac. It seemed to make sense to favor a web-based RSS reader as opposed to trying to install RSS Bandit and most likely the .NET Framework on all these machines which in some cases I didn't have administrator access to anyways.

After unsuccesfully trying to use Newsgator Online I ended up settling with Bloglines instead for a number of reasons. The first being that Bloglines is a lot faster than Newsgator Online whose interface seems to move at a snail's pace over dial up. The second being that a basic feature like "Mark All Items As Read" seems to be missing from Newsgator Online. Trying to visit every feed individually to mark all its items as read became such an ordeal, I simply gave up.

I'd rather not think that I've wasted the time I've spent working on implementing synchronization between RSS Bandit and Newsgator Online since the current user experience of the latter service leaves much to be desired. I sincerely hope there are some changes in the works for the service.

Categories: Syndication Technology

August 26, 2005

@ 06:15 PM

Microsoft's Simple List Extensions Specification Updated

Sean Lyndersay has posted about an update to the Simple List Extensions specification. The update fixes some of the issues that were pointed out by members of the RSS community such as the problem pointed out in Phil Ringnalda's post MS Embraces RSS because RSS elements were being reused outside their original context. The cf:listinfo element now has the following structure

<cf:listinfo>
<cf:sort
     ns="namespace"
     element="element"
     data-type="date|text|number"
     label="User-readable name for the sort field"
     default="yes|no" />

<cf:group
     ns="namespace"
     element="element"
     label="User-readable name for the grouping" />

</cf:listinfo>

This is a lot better than the original spec* which instead of naming the element being sorted on using attributes of the cf:sort element actually included it as a child element instead. The only problem I have with the spec is that I don't see where it states the date format that is expected to be used if the data type is date. I guess this was problematic since different syndication formats use different date formats. RSS 2.0 uses the RFC 822 format, Atom 1.0 uses the RFC 3339 format while Dublin Core [which is what RSS 1.0 dates typically are] uses the format from the W3C Note on Date and Time formats. So an extension element really can't define what the date format will be in the feed it is embedded in since it may be embedded in an RSS or Atom feed.

That is a potential gotcha for implementers of this spec. I think I'll pass at implementing support for the Simple List Extensions spec in RSS Bandit this time around since my plate is full for this release. I'll add it to the list of features that will show up in the following release [tentatively codenamed Jubilee].

* I would have linked to the spec but as usual MSDN has broken permalinks such as http://msdn.microsoft.com/longhorn/understanding/rss/simplefeedextensions/ . Someone really needs to force everybody that works there to read Cool URIs don't change.

Categories: Syndication Technology

August 18, 2005

@ 04:16 PM

One Click Subscription: The Browsers Will Save Us

Tim Bray has a recent post entitled The Real Problem that opens up the quarterly debate on the biggest usability problem facing XML syndication technologies like RSS and Atom; there is no easy way for end users to discover or subscribe to a website's feed.

Tim writes

One-Click Subscription First of all, most people don’t know about feeds, and most that do don’t subscribe to them. Check out the comments to Dwight Silverman’s What’s Wrong with RSS? (By the way, if there were any doubt that the blogging phenomenon has legs, the fact that so many people read them even without the benefits of RSS should clear that up).

Here’s the truth: an orange “XML” sticker that produces gibberish when you click on it does not win friends and influence people. The notion that the general public is going to grok that you copy the URI and paste it into your feed-reader is just ridiculous.

But, as you may have noticed, the Web has a built-in solution for this. When you click on a link to a picture, it figures out what kind of picture and displays it. When you click on a link to a movie, it pops up your favorite movie player and shows it. When you click on a link to a PDF, you get a PDF viewer.

RSS should work like this; it never has, but it can, and it won’t be very hard. First, you have to twiddle your server so RSS is served up correctly, for example as application/rss+xml or application/atom+xml. If you don’t know what this means, don’t worry, the person who runs your web server can do it in five minutes.

Second, you either need to switch to Atom 1.0 or start using <atom:link rel="self"> in RSS. If our thought leaders actually stepped up and started shouting about this, pretty well the whole world could have one-click subscriptions by next summer, using well-established, highly-interoperable, wide-open standards.

As long as people expect one click subscription to depend on websites using the right icons, the right HTML and the right MIME types for their documents it won't become widespread. On the other hand, this debate is about to become moot anyway because every major web browser is going to have a [Subscribe to this website] button on it in a year or so. Firefox already has Live Bookmarks, there's Safari RSS for Mac OS X users and Internet Explorer 7 will have Web Feeds.

As far as I'm concerned, the one click subscription problem has already been solved. I guess that's why Dave Winer is now arguing about what to name the feature across different Web browsers. After all, RSS geeks must always have something to argue about. :)

Categories: Syndication Technology

August 15, 2005

@ 02:20 AM

Podcasting with Atom 1.0: More Than One Way to Skin a Cat

Today I was working on completing the support for Atom 1.0 in the next version of RSS Bandit and decided to make the changes for parsing out enclosure/podcast elements while I was in that part of the code. RSS 2.0 is pretty straightforward, there is an <enclosure> element that is a child of the <item> element.

On the other hand, the Atom 1.0 specification has two completely different mechanisms for creating podcasts. Both mechanisms are described in the article by James Snell entitled An overview of the Atom 1.0 Syndication Format. From the article

Support for enclosures

Listing 4. Atom 1.0 podcasting example

						
								
										       
<feed xmlns="http://www.w3.org/2005/Atom">
  <id>http://www.example.org/myfeed</id>
  <title>My Podcast Feed</title>
  <updated>2005-07-15T12:00:00Z</updated>
  <author>
    <name>James M Snell</name>
  </author>
  <link href="http://example.org" />
  <link rel="self" href="http://example.org/myfeed" />
  <entry>
    <id>http://www.example.org/entries/1</id>
    <title>Atom 1.0</title>
    <updated>2005-07-15T12:00:00Z</updated>
    <link href="http://www.example.org/entries/1" />
    <summary>An overview of Atom 1.0</summary>
    <link rel="enclosure" 
          type="audio/mpeg"
          title="MP3"
          href="http://www.example.org/myaudiofile.mp3"
          length="1234" />
										
												
														  <link rel="enclosure"
          type="application/x-bittorrent"
          title="BitTorrent"
          href="http://www.example.org/myaudiofile.torrent"
          length="1234" />
												
    <content type="xhtml">
      <div xmlns="http://www.w3.org/1999/xhtml">
        <h1>Show Notes</h1>
        <ul>
          <li>00:01:00 -- Introduction</li>
          <li>00:15:00 -- Talking about Atom 1.0</li>
          <li>00:30:00 -- Wrapping up</li>
        </ul>
      </div>
    </content>
  </entry>
</feed>

Atom enclosures allow you to do more than just distribute audio content. Enclosure links can reference any type of resource. Listing 5, for instance, uses multiple enclosures within a single entry to reference translated versions of a single PDF document that's accessible through FTP. The hreflang attribute identifies the language that each PDF document has been translated into.

Content-by-reference

In addition to support for links and enclosures, Atom introduces the ability to reference entry content by URI. Listing 6, for instance, illustrates how an Atom feed for a photo weblog might appear. The content element references each individual photograph in the blog. The summary element provides a caption for the image.

Listing 6. A simple list of images using Atom 1.0

						
								
										        
<feed xmlns="http://www.w3.org/2005/Atom"
      xml:base="http://www.example.org/">
  <id>http://www.example.org/pictures</id>
  <title>My Picture Gallery</title>
  <updated>2005-07-15T12:00:00Z</updated>
  <author>
    <name>James M Snell</name>
  </author>
  <entry>
     <id>http://www.example.org/entries/1</id>
     <title>Trip to San Francisco</title>
     <link href="/entries/1" />
     <updated>2005-07-15T12:00:00Z</updated>
     <summary>A picture of my hotel room in San Francisco</summary>
     <content type="image/png" src="/mypng1.png" />
  </entry>
  <entry>
    <id>http://www.example.org/entries/2</id>
    <title>My new car</title>
    <link href="/entries/2" />
    <updated>2005-07-15T12:00:00Z</updated>
    <summary>A picture of my new car</summary>
    <content type="image/png" src="/mypng2.png" />
  </entry>
</feed>

This content-by-reference mechanism provides a very flexible means of expanding the types of content that one can syndicate through Atom.

After looking at this from all angles for about 30 minutes the only conclusion I can come to is that Atom provided two completely different mechanisms of achieving the same goal. This is likely a potential gotcha for aggregator authors who might end up supporting one or the other of the mechanisms instead of both.

After this, I still have to add some code to also support Yahoo! Media RSS and then track down some feeds that actually use all the various enclosure techniques so I can test my code with actual real world scenarios. I'd appreciate any pointers to test feeds especially for the Yahoo! Media extensions to RSS [which I'm considering not supporting if there aren't that many feeds that use it].

No rest for the wicked. ;)

Categories: Syndication Technology

July 21, 2005

@ 04:23 PM

Atom 0.3 vs. Atom 1.0: End Users as Collateral Damage

In a post entitled Atom 0.3 Denouement begins his advocacy for developers to stop supporting Atom 0.3 and states his intent to start flagging such feeds as being invalid in the Feed Validator come the fall. I planned to avoid blogging about his post until I saw the following comment by Mark Pilgrim where he wrote

Atom 1.0 will shortly be an IETF RFC, which makes it as much of a web standard as HTTP. Atom 0.3 was just some guys (and gals) dicking around on a wiki. As it turned out, some guys dicking around on a wiki were able to produce a relatively decent standard, but that isnt saying much given the competition. Atom 1.0 is a great standard, worthy of the label and worthy of being pushed by standards advocacy groups like WaSP...

Although what Mark Pilgrim has written is factual it is misleading as well. Although Atom 0.3 was not backed by a standards body (and neither has any flavor of RSS by the way) it still became a de facto standard thanks to the advocacy of people like Mark & Sam. Specifically once Google decided to switch their RSS feeds to Atom 0.3 feeds they used their power as a dominant content producer to force every major aggregator to support Atom 0.3.

At the time I blogged about how this was a stupid thing to do since it basically guaranteed that there would be two conflicting versions of Atom for the immediate future. Now there are hundreds of thousands to millions of aggregator users who will potentially be screwed when Google decides to switch to Atom 1.0. These end users are sacrificial pawns in what has basically been a battle of [male] geek egos over whether a blog post in an XML feed should be contained in an element named atom:entry or item.

The only bright light in all this crap is that a few years after everyone else figured it out some of these XML syndication geeks are now realizing that instead of arguing over XML element names it is more interesting to figure out what other kinds of data can be syndicated beyond blog posts and news stories. See Danny Ayers's post Brownian Motion and Bill de hra on Atoms in a small world for examples of some of the Atom geeks finally getting it.

Better late than never, I guess.

Categories: Syndication Technology

July 17, 2005

@ 08:02 PM

Why Do People Use Technorati.com?

Robert Scoble has posted a series of entries comparing the Bloglines Citations feature with Technorati.com for finding out how many sites links to a particular URL. His conclusion seems to be that Technorati sucks compared to Bloglines which has led to an interesting back & forth discussion between him and David Berlind.

I've been frustrated by Technorati.com for quite a while and have been quietly using Bloglines Citations as an alternative when I want to get results from a web search and PubSub for results I want to subscribe to in my favorite RSS reader. Technorati seems to lack the breadth of either service when it comes to finding actual blog posts that link to a site and neither site brings up unrelated crap such as blogrolls in their results.

The only problem with Bloglines is that their server can't handle the load and the citations feature is typically down several times during the day. Technorati has also had similar problems recently.

At this point all that Technorati seems to have going for it is first mover advantage. Or is there some other reason to use Technorati over competitors like Bloglines or PubSub that I've missed?

Categories: Syndication Technology

July 17, 2005

@ 05:54 AM

Atom 1.0 Almost Done

From Tim Bray's post entitled Atom 1.0 we learn

There are a couple of IETF process things to do, but this draft (HTML version) is essentially Atom 1.0. Now would be a good time for implementors to roll up their sleeves and go to work.

I'll add this to the list of things I need to support in the next version of RSS Bandit. The Longhorn RSS team will also need to update their implementation as well. :)

I couldn't help but notice that Tim Bray has posted an entry entitled RSS 2.0 and Atom 1.0, Compared which is somewhat misleading and inaccurate. I find it disappointing that Tim Bray couldn't simply announce the upcoming release of Atom 1.0 without posting a FUD style anti-RSS post as well.

I'm not going to comment on Tim Bray's comparison post beyond linking to other opinions such as those from Alex Bosworth on Atom Failings and Don Park on Atom Pendantics.

Categories: Syndication Technology

July 7, 2005

@ 07:57 AM

Liberal RSS Parsing and Apple iTunes

Since Sam Ruby asked, I feel I must oblige. There have been a bunch of posts in Sam's blog pointing out that the RSS parser used by Apple's iTunes handles invalid RSS feeds which in turn encourages content producers to publish invalid RSS feeds which only work in iTunes.

In the post entitled Insensitive iTunes Sam wrote

Mark Pilgrim: it appears that iTunes uses a real, draconian, namespace-aware XML parser... except that namespaces are case-insensitive.

What’s worse, is that the high profile Disney The Gears Behind the Ears feed appears to be depending on this functionality, as well as on other non- standard element definitions.

There are a couple of more issues with the iTunes parser mentioned by Mark Pilgrim in the comments to that post. The reason this is actually an issue at all is spelled out by Mark in another response to Sam's post where he wrote

Am I the only one who doesn’t think this is such a big deal?

Apple is an 800-lb. gorilla in this space (at least until Microsoft releases an RSS-enabled IE in Longhorn). iTunes is to podcasting as Internet Explorer is to HTML. RSS interoperability, at least as far as podcasting goes, now means “works with iTunes.” Thousands of people and companies will begin making podcasts that “work with iTunes,” but unintentionally rely on iTunes quirks (e.g. Disney’s incorrect namespace). This in turn will affect every developer who wants to consume RSS feeds, and who will be required to emulate all the quirks of iTunes to remain competitive.

Apple has effectively redefined the entire structure of an RSS feed, added multiple core RSS elements, made all RSS elements case-insensitive, made XML namespaces case-insensitive, created a new date format, made several previously required attributes optional, and created a morass of undocumented and poorly-documented extensions... to what was already a pretty messy format to begin with.

Case in point: my Universal Feed Parser, which already has 2751 test cases and is so incredibly liberal that it can parse an ill-formed EBCDIC-encoded RDF feed with regular expressions, will require hundreds of new test cases to cover all the schlock that iTunes accepts. And I’m one of the lucky ones.

The supreme irony of all this is that I remember Dave Hyatt (Apple Safari developer) bitching and moaning about all the work he had to do to make Safari emulate the buggy, undocumented behavior of Internet Explorer, and how the world would be so much better if only everything used XML and everyone implemented draconian error handling. Never mind the fact that the vast majority of problems that iTunes creates have nothing to do with XML well-formedness; iTunes doesn’t even require well-formed XML in the first place. Utopia, it seems, will have to wait another decade.

Just like the browser wars I suspect this is going to get a lot worse before it gets any better. Hopefully the folks working on RSS at Apple [and at Microsoft] are paying attention to this discussion and will do the right thing.

The main problem is that every RSS reader is "liberal" to some degree. The problem that causes is that aggregator developers end up being asked to be bug compatible with some other popular RSS reader. I get complaints that RSS Bandit is more strict than RSS readers like Sharpreader all the time but often resist making changes to copy every quirk in other RSS readers. Once an RSS reader rises to dominance, the definition of what it means to be a valid RSS feed won't be what is in the spec but will be whatever that reader supports. This is what often happens in the software industry from web browsers to C compilers. It's great to see Sam fighting to prevent this from happening in the RSS space and his Feed Validator has gone a long way in preventing this from happening. I can only hope that the iTunes folks realize that it is best for everyone if they favor spec compliance to being liberal in what they receive.

Categories: Syndication Technology

June 28, 2005

@ 05:08 PM

Apple Embraces and Extends RSS with iTunes 4.9

Today I learned that Apple brings podcasts into iTunes which is excellent news. This will definitely push subscribing to music and videos via RSS feeds into the mainstream. I wonder how long it'll take MTV to start providing podcast feeds.

One interesting aspect of the announcement which I didn't see in any of the mainstream media coverage was pointed out to me in Danny Ayers's post Apple - iTunes - Podcasting where he wrote

Apple - iTunes - Podcasting and another RSS 2.0 extension (PDF). There are about a dozen new elements (or “tags” as they quaintly describe them) but they don’t seem to add anything new. I think virtually everything here is either already covered by RSS 2.0 itself, except maybe tweaked to apply to the podcast rather than the item.
They’ve got their own little category taxonomy and this delightful thing:

<itunes :explicit>
This tag should be used to note whether or not your Podcast contains explicit material.
There are 2 possible values for this tag: Yes or No

I wondered at first glance whether this was so you could tell when you were dealing with good data or pure tag soup. However, the word has developed a new meaning:

If you populate this tag with “Yes”, a parental advisory tag will appear next to your Podcast cover art on the iTunes Music Store
This tag is applicable to both Channel & Item elements.

So, in summary it’s a bit of a proprietary thing, released as a fait accompli. Ok if you’re targetting for iTunes, for anything else use Yahoo! Media RSS . I wonder where interop went.

This sounds interesting. So now developers of RSS readers that want to consume podcasts have to know how to consume the RSS 2.0 <enclosure> element, Yahoo!'s extensions to RSS and Apple's extensions to RSS to make sure they cover all the bases. Similarly publishers of podcasts also have to figure out which ones they want to publish as well.

I guess all that's left is for Real Networks and Microsoft to publish their own extensions to RSS for dealing with providing audio and video metadata in RSS feeds to make it all complete. This definitely complicates my plans for adding podcasting support to RSS Bandit. And I thought the RSS 1.0 vs. RSS 2.0 vs. Atom discussions were exciting. Welcome to the world of syndication.

PS: The title of this post is somewhat tongue in cheek. It was inspired by Slashdot's headline over the weekend titled Microsoft To Extend RSS about Microsoft's creation of an RSS module for making syndicating lists work better in RSS. Similar headlines haven't been run about Yahoo! or Apple's extensions to RSS but that's to be expected since we're Microsoft. ;)

Categories: Syndication Technology | XML

June 27, 2005

@ 02:49 PM

Comments [18]

The Consequences of Microsoft's Simple List Extensions for RSS to Aggregator Developers

As the developer of an RSS aggregator I'm glad to see Microsoft's Simple List Extensions for RSS. Many of the aggregator developers I spoke to at Gnomedex this weekend felt the same way. The reason for being happy about these extensions is that they provide a way to fix a number of key feeds that are broken in RSS aggregators today. This includes feeds such as the MSN Music Top 100 Songs feed, iTunes Top 25 Songs feed and Netflix Top 100 Movies feed.

The reasons these feeds appear broken in every aggregator in which I have tried them is covered in a previous post of mine entitled The Netflix Problem: Syndicating Ordered Lists in RSS. For those who don't have time to go back and read the post, the following list summarizes the problems with the feeds

When the list changes some items change position, new ones enter the list and old one's leave. An RSS reader doesn't know to remove items that have left the list from the display and in some cases may not know to eliminate duplicates. Eventually you have a garbled list with last week's #25 song and this weeks #25 song and last month's #25 song all in the same view.
There is no way to know how to sort the list. Traditionally RSS ggregators sort entries by date which doesn't make sense for an ordered list.

The RSS extensions provided by Microsoft are meant to solve these problems and improve the current negative user experience of people who subscribe to ordered lists using RSS today.

To solve the first problem Microsoft has provided the cf:treatAs element with the value "list" to be used as a signal to aggregators that whenever the feed is updated that the previous contents should be dumped or archived and replaced by the new contents of the list. That way we no longer have last week's Top 25 song list comingled with this week's list. The interesting question for me is whether RSS Bandit should always refresh the contents of the list view when a list feed is updated (i.e. the feed always contains the current list) or whether to keep the old version of the list perhaps grouped by date. My instinct is to go with the first option. I know Nick Bradbury also had some concerns about what the right behavior should be for treating lists in FeedDemon.

To solve the second problem Microsoft has provided the cf:sort element which can be used to specify what elements on an item should be used for sorting, whether the field is numeric or textual so we know how to sort it and what the human readable name of the field should be when displayed to the user. I'm not really sure how to support this in RSS Bandit. Having every feed be able to specify what columns to show in the list view complicates the user interface somewhat and requires a degree of flexibility in the code. Changing the code to handle this should be straightforward although it may add some complexity.

On the other hand there are some user interface problems. For one, I'm not sure what should be the default sort field for lists. My gut instinct is to add a "Rank" column to the list of columns RSS Bandit supports by default and have it be a numeric field that is numbered using the document order of the feed. So the first item has rank 1, the second has rank 2, etc. This handles the case where a feed has a cf:treatAs element but has no cf:sort values. This will be needed for feeds such as the Netflix Top 100 feed which doesn't have a field that can be used for sorting. The second problem is how to show the user what columns can be added to a feed. Luckily we already have a column chooser that is configurable per feed in RSS Bandit. However we now have to make the list of columns in that list configurable per feed. This might be confusing to users but I'm not sure what other options we can try.

Categories: RSS Bandit | Syndication Technology

June 26, 2005

@ 09:31 PM

Gnomedex 5.0 Trip Report: Mark Fletcher, Scott Rafer, Bob Wyman on Tomorrow's RSS

I missed the first few minutes of this talk.

Bob Wyman of PubSub stated he believed Atom was the future of syndication. Other formats would eventually be legacy formats that would be analogous to RTF in the word processing world. They will be supported but rarely chosen for new efforts in the future.

Mark Fletcher of Bloglines then interjected and pleaded with the audience to stop the practice of providing the same feed in multiple formats. Bob Wyman agreed with his plea and also encouraged members of the audience to pick one format and stick to it. Having the same feed in multiple syndication formats confuses end users who are trying to subscribe to the feed and leads to duplicate items showing up in search engines that specialize in syndication formats like PubSub, Feedster or the Bloglines search features.

A member of the audience responded that he used multiple formats because different aggregators support some formats better than others. Bob Wyman replied that bugs in aggregators should result in putting pressure on RSS aggregator developers to fix them instead of causing confusion to end users by spitting multiple versions of the same feed. Bob then advocated using picking Atom since a lot of lessons had been learned via the IETF process to improve the format. Another audience member mentioned that 95% of his syndication traffic was for his RSS feed not his Atom feed so he knows which format is winning in the market place.

A question was raised about whether the admonition to avoid multiple versions of feed also included sites that have multiple feeds for separate categories of content. The specific example was having a regular feed and a podcast feed. Bob Wyman thought that this was not a problem. The problem was the same content served in different formats.

The discussion then switched to ads in feeds. Scott Rafer of Feedster said that he agreed with Microsoft's presentation from the previous day that Subscribing is a new paradigm that has come after Browsing and Searching for content. Although we have figured out how to provide ads to support Browse & Search scenarios we are still experimenting with how to provide ads to support the Subscribe scenarios. Some sites like the New York Times uses RSS to draw people to its website by providing excerpts in its feeds. Certain consultants have full text feeds which they view as advertising their services. While others put ads in their feeds. Bob Wyman mentioned that PubSub is waiting to see which mechanism the market settles on for having advertising in feeds before deciding on approach. Bob Wyman added that finding a model for advertising and syndication was imperative so that intermediary services like PubSub, Feedster and Bloglines can continue to exist. An audience member then followed up and asked why these services couldn't survive by providing free services to the general public and charging corporate users instead of resorting to advertising. The response was that both PubSub and Feedster already have corporate customers who they charge for their services but this revenue is not be enough for them to continue providing services to the general public. The Bloglines team considered having fee-based services but discarded the idea because they felt it would be a death-knell for the service given that most service providers on the Web are free not fee-based.

An audience member asked if any of the services would have done anything different two years ago when they started given the knowledge they had now. The answers were that Feedster would have chosen a different back-end architecture, Bloglines would have picked a better name and PubSub would have started a few months to a year sooner.

I asked the speakers what features they felt were either missing in RSS or not being exploited. Mark Fletcher said that he would like to see more usage of the various comment related extensions to RSS which currently aren't supported by Bloglines because they aren't in widespread use. The other speakers mentioned that they will support whatever the market decides is of value.

Categories: Syndication Technology | Trip Report

June 26, 2005

@ 05:53 PM

Gnomedex 5.0 Trip Report: Steve Gillmor, Dave Sifry, Scott Gatz on Tomorrow's Syndication

Scott Gatz of Yahoo! started by pointing out that there are myriad uses for RSS. For this reason he felt that we need more flexible user experiences for RSS that map to these various uses. For example, a filmstrip view is more appropriate for reading a feed of photos than a traditional blog and news based user interface typically favored by RSS readers. Yahoo! is definitely thinking about RSS beyond just blogs and news which is why they've been working on Yahoo! Media RSS which is an extension to RSS that makes it better at syndicating digital media content like music and videos. Another aspect of syndication Yahoo! believes is key is the ability to keep people informed about updates independent of where they are or what device they are using. This is one of the reasons Yahoo! purchased the blo.gs service.

Dave Sifry of Technorati stated that he believed the library model of the Web where we talk about documents, directories and so on is outdated. The Web is more like a river or stream of millions of state changes. He then mentioned that some trends to watch that emphasized the changing model of the Web were microformats and tagging.

BEGIN "Talking About Myself in the Third Person"

Steve Gillmor of ZDNet began by pointing out Dare Obasanjo in the audience and saying that Dare was his hero and someone he admired for the work he'd done in the syndication space. Steve then asked why in a recent blog posting Dare had mentioned that he would not support Bloglines proprietary API for synchronizing a user's subscriptions with a desktop RSS reader but then went on to mention that he would support Newsgator Online's proprietary API. Specifically he wondered why Dare wouldn't work towards a standard instead of supporting proprietary APIs.

At this point Dare joined the three speakers on stage.

Dare mentioned that from his perspective there were two major problems that confronted users of an RSS reader. The first was that users eventually need to be able to read their subscriptions from multiple computers. This is because many people have multiple computers (e.g. home & work or home & school) where they read news and blogs from. The second problem is that eventually, due to the ease of subscribing to feeds, people eventually succumb to information overload and need a way to see only the most important or interesting content in the feeds to which they are subscribed. This is the "attention problem" that Steve Gillmor is a strong advocate of solving. The issue discussed in Dare's blog post is the former not the latter. The reason for working with the proprietary APIs provided by online RSS readers instead of advocating a standard is that the online RSS readers are the ones in control. At the end of the day, they are the ones that provide the API so they are the ones that have to decide whether they will create a standard or not.

Dare rejoined the audience after speaking.

END "Talking About Myself in the Third Person"

Dave Sifry followed up by encouraging cooperation between vendors to solve the various problems facing users. He gave an example of Yahoo! working with Marc Canter on digital media as an example.

Steve Gillmor then asked audience members to raise their hand if they felt that the ability to read their subscriptions from multiple computers was a problem they wanted solved. Most of the audience raised their hands in response.

A member of the audience responded to the show of hands by advocating that people us web based RSS readers like Bloglines. Scott Gatz agreed that using a web based aggregator was the best way to access one's subscriptions from multiple computers. There is some disagreement between members of the audience and the speakers whether there are problems using Bloglines from mobile devices which prevent it from being the solution to this problem.

From the audience, Dave Winer asks Dave Sifry why Technorati invented Attention.Xml instead of reusing OPML. The response was that the problem was beyond just synchronizing the list of feeds the user is subscribed to.

Steve Gillmor ended the session by pointing out that once RSS usage becomes widespread someone will have to solve the problem once and for all.

Categories: Syndication Technology | Trip Report

June 25, 2005

@ 04:14 AM

Gnomedex 5.0 Trip Report: Dean Hachamovitch on Longhorn, IE 7 and RSS

This was a keynote talk given by Dean Hachamovitch and Amar Gandhi that revealed the the RSS platform that will exist in Longhorn, the level of RSS support in Internet Explorer 7 as well as showed some RSS extensions that Microsoft is proposing.

Dean started by talking about Microsoft's history with syndication. In 1997, there was Active Desktop and channels in IE 4.0 & IE 5.0 which wasn't really successful. We retreated from the world of syndication for a while after that. Then In 2002, Don Box starts blogging on GotDotNet. In 2003, we hired Robert Scoble. In 2004, Channel 9 was launched. Today we have RSS feeds coming out of lots of places from Microsoft. This includes the various feeds on the 14 15 million blogs on MSN Spaces, the 1500 employee blogs on http://blogs.msdn.com and http://blogs.technet.com, 100s of feeds on the Microsoft website and even MSN Search which provides RSS feeds for search results.

Using XML syndication is an evolution in the way people interact with content on the web. The first phase was browsing the Web for content using a web browser. Then came searching the Web for content using search engines. And now we have subscribing content using aggregators. Each step hasn't replaced the latter but instead has enhanced user experience while using the Web. In Longhorn, Microsoft is betting big on RSS both for end users and for developers in three key ways

Throughout Windows various experiences will be RSS-enabled and will be easy for end users to consume
An RSS platform will be provided that makes it easy for developers to RSS-enable various scenarios and applications
Increasing the number of scenarios that RSS handles by proposing extensions

Amar then demoed the RSS integration in Internet Explorer 7. Whenever Internet Explorer encounters an RSS feed, a button in the browser chrome lights up which indicates that a feed is available. Clicking on the button shows a user friendly version of the feed that provides rich search, filtering and sorting capabilities. The user can then hit a '+' button and subscribe to the feed. Amar then navigated to http://search.msn.com and searched for "Gnomedex 5.0". Once he got to the search results, the RSS button lit up and he subscribed to the search results. This shows how one possible workflow for keeping abreast of news of interest using the RSS capabilities of Internet Explorer 7 and MSN Search.

At this point Amar segued to talk about the Common RSS Feed List. This is a central list of feeds that a user is subscribed to that is accessible to all applications not just Internet Explorer 7. Amar then showed a demo of an altered version of RSS Bandit which used the Common RSS Feed List and could pick up both feeds he'd subscribed to during the previous demo in Internet Explorer 7. I got a shout out from Amar at this point and some applause from the audience for helping with the demo. :)

Dean then started to talk about the power of the enclosure element in RSS 2.0. What is great about it is that it enables one to syndicate all sorts of digital content. One can syndicate video, music, calendar events, contacts, photos and so on using RSS due to the flexibility of enclosures.

Amar then showed a demo using Outlook 2003 and an RSS feed of the Gnomedex schedule he had created. The RSS feed had an item for each event on the schedule and each item had an iCalendar file as an enclosure. Amar had written a 200 line C# program that subscribed to this feed then inserted the events into his Outlook calendar so he could overlay his personal schedule with the Gnomedex schedule. The point of this demo was to show that RSS isn't just for aggregators subscribing to blogs and news sites.

Finally, Dean talked about syndicating lists of content. Today lots of people syndicate Top 10 lists, ToDo lists, music playlists and so on. However RSS is limited in how it can describe the semantics of a rotating list. Specifically the user experience when the list changes such as when a song in a top 10 list leaves the list or moves to another position is pretty bad. I discussed this very issue in a blog post from a few months ago entitled The Netflix Problem: Syndicating Ordered Lists in RSS.

Microsoft has proposed some extensions to RSS 2.0 that allows RSS readers deal with ordered lists better. A demo was shown that used data from the Amazon Web Services to create an RSS feed of an Amazon wish list (the data was converted to RSS feeds with the help of Jeff Barr). The RSS extensions provided information that enabled the demo application to know which fields to use for sorting and/or grouping the items in the wish list feed.

The Microsoft Simple List Extensions Specification is available on MSDN. In the spirit of the RSS 2.0 specification, the specification is available under the Creative Commons Attribution-ShareAlike License (version 2.5).

A video was then shown of Lawrence Lessig where he commended Microsoft for using a Creative Commons license.

The following is a paraphrasing of the question & answer session after their talk

Q: What syndication formats are supported?
A: The primary flavors of RSS such as RSS 0.91, RSS 1.0 and RSS 2.0 as well as the most recent version of Atom.

Q: How granular is the common feed list?
A: The Longhorn RSS object model models all the data within the RSS specification including some additional metadata. However it is fairly simple with 3 primary classes.

Q: Will Internet Explorer 7 support podcasting?
A: The RSS platform in Longhorn will support downloading enclosures.

Q: What is the community process for working on the specifications?
A: An email address for providing feedback will be posted on the IE Blog. Robert Scoble also plans to create a wiki page on Channel 9.

Q: What parts of the presentation are in IE 7 (and thus will show up in Windows XP) and what parts are in Longhorn?
A: The RSS features of Internet Explorer 7 such as autodiscovery and the Common RSS Feed List will work in Windows XP. It is unclear whether other pieces such as the RSS Platform Sync Engine will make it to Windows XP.

Q: What are other Microsoft products such as Windows Media Player doing to take advantage of the RSS platform?
A: The RSS platform team is having conversation with other teams at Microsoft to see how they can take advantage of the platform.

Categories: Syndication Technology | Trip Report

May 21, 2005

@ 02:35 AM

RSS vs. Atom: Episode #2472

From Bob Wyman's post Microsoft to support Atom!

Robert Scoble, a Microsoft employee/insider very familiar with Microsoft's plans for syndication, declares in comments on his blog " we are supporting Atom in any aggregator we produce ." Microsoft's example in supporting Atom should be followed by all other aggregator developers in the future and Microsoft should be commended for supporting the adoption of openly defined standards for syndication.

Given that virtually every aggregator in use today and virtually every blog hosting and syndication platform (expect MSN Spaces) already supports both RSS and Atom, it is clear that the heyday for the historical RSS format has passed. RSS is a historical format, Atom represents the future. We don't need two formats -- or twenty... We should consolidate on that format which incorporates the most learning and experience with the syndication problem. That format is Atom V1.0.

I'm surprised that Bob Wyman is crowing at such a non-issue. Supporting both versions of Atom (Atom 0.3 and Atom 1.0) is a must for any information aggregator that wants to be taken seriously. This is all covered in my post from a year and a half ago (damn, has this debate been going on for that long?) entitled Mr. Safe's Guide to the RSS vs. ATOM debate where I wrote

The Safe Syndication Consumer's Perspective
If you plan to consume feeds from a wide variety of sources then one should endeavor to support as many syndication formats as possible. The more formats a feed consumer supports the more content is available for its users.

Based on their current popularity, degree of support and ease of implementation one should consider supporting the major syndication formats in the following order of priority

RSS 0.91/RSS 2.0

RSS 1.0

Atom

RSS 0.91 support is the simplest to implement and most widely supported by websites while Atom is the most difficult to implement being the most complex and will be least supported by websites in the coming years.

The Safe Syndication Producer's Perspective
...
The average user of a news aggregator will not be able to tell the difference between an Atom or RSS feed from their aggregator if it supports both. However users of aggregators that don't support Atom will not be able to subscribe to feeds in that format. In a few years, the differences between RSS and Atom will most likely be the same as those that are different between RSS 1.0 and RSS 0.91/RSS 2.0; only of interest to a handful of XML syndication geeks. Even then the simplest and safest bet would still be to use RSS as a syndication format. This is the same as the fact that even though the W3C has published XHTML 1.0 & XHTML 1.1 and is working on XHTML 2.0, the safest bet to get the widest reach with the least problems is to publish a website in HTML 3.2 or HTML 4.01.

So far this thinking has aligned with the thinking around RSS I have seen at MSN, so it is extremely unlikely that MSN Spaces will do something as disruptive as switching its RSS feeds to Atom feeds or as confusing to end users as providing multiple feeds in different formats.

Categories: Life in the B0rg Cube | Syndication Technology

May 21, 2005

@ 02:15 AM

Contextual Ads in RSS Feeds Need Some Work...

From the RSS feed for the post MSN: RSS Everywhere

Categories: Syndication Technology

May 20, 2005

@ 02:00 PM

RSS and AJAX as Disruptive Technologies

MSDN just published my article, Fun with IXMLHttpRequest and RSS. The article attempts to illuminate two growing trends; using DHTML & IXMLHttpRequest to build dynamic web applications and the building of interesting applications layered on top of RSS.

In my recent post Ideas for my next Extreme XML column on MSDN, I asked what people would like to see me write about next. Although this topic came second, I felt that it highlighted some interesting disruptive trends that warranted writing about sooner rather than later.

Coincidentally, I checked my favorite RSS reader this morning and find out that our CEO decided to downplay the importance of RSS this morning in favor of XML Web Services in a Q&A on the RSS weblog. I find it interesting that his core argument against RSS is that it is not as complex as XML Web Service technologies. On the flip side, we have Mark Lucovsky who in his post Don Box and Hailstorm argues that the simple technologies and techniques of RSS may succeed in building an ecosystem of applications built on open data access where his attempt with Hailstorm at Microsoft failed. Combining this with the thinking in Adam Bosworth's Web of Data, it seems clear that key people at Google are beginning to understand the power of REST in combination with the flexible nature of RSS.

This all seems like classic Innovator's Dilemma stuff. Thankfully, in this case even though there are lots of people who want us [Microsoft] to bury our heads in the sand when it comes to recognizing these emerging trends, there are also annoying people like me at work who keep preaching this stuff to anybody who is willing to listen.

Will RSS change the world? That's a silly question, it already has.

Categories: Life in the B0rg Cube | Syndication Technology | XML Web Services

May 17, 2005

@ 03:53 PM

Newsgator Purchases FeedDemon

From Greg Reinacker we find out that Newsgator Acquires FeedDemon and Nick Bradbury confirms this in his post NewsGator Acquires FeedDemon, TopStyle...and Me!. I think this is a great acquisition for Newsgator. Acquisitions are usually about getting great people, key technology or lots of users. With this acquisition Newsgator gets all three although it'll be interesting seeing how they manage to deal with rationalizing the existence of two desktop clients even if one of them is a Microsoft Office Outlook plugin.

I think this will have an interesting ripple effect on the aggregator market. Both Nick & Greg have already raised the bar for RSS readers to include synchronization with a web-based aggregator. Desktop aggregators that don't do this will eventually be left in the dust. So far Newsgator Online and Bloglines have been the premiere web-based aggregators but its been difficult building applications that synchronize with them. Newsgator Online doesn't seem to have any public documentation about their API while the Bloglines API is not terribly useful for synchronization.

This definitely puts pressure on Bloglines to provide a richer API since two of the most popular desktop aggregators on the Windows platform will now have a richer synchronization story with its most notable competitor. It also puts pressure on other desktop aggregators to figure out a strategy for their own synchronization stories. For example, I had planned to add support for synchronizing with both services in the Nightcrawler release of RSS Bandit but now it is unclear whether there is any incentive for Newsgator to provide an API for other RSS readers. Combining this with the fact that the Bloglines API isn't terribly rich means I'm between a rock and a hard place when it comes to providing synchronization with a web-based aggregator in the next version of RSS Bandit.

Sometimes I wonder whether my energies wouldn't be better spent convincing Steve Rider to let me hack an API for http://www.start.com/1. :)

Categories: RSS Bandit | Syndication Technology

May 12, 2005

@ 01:18 PM

Creating a Search Engine That Excludes Blogs Using MSN Search RSS Feeds

A few months ago Robert Scoble wrote a post titled Yahoo announces API for its search engine where he asked

Seriously. Blogs are increasing noise to lots of searches. We already have good engines that let you search blogs (Feedster, Pubsub, Newsgator, Technorati, and Bloglines all are letting you search blogs). What about an engine that lets you search everything BUT blogs? Where's that?

Is Yahoo's API good enough to do that? It doesn't look like it. It looks like Yahoo just gave us an API to embed its search engine into our applications. Sigh. That's not what I want. OK, MSN, your turn. Are you gonna really give us an API that'll let us build a custom search engine and let us have access to the variables that determine the result set?

The first question Robert asks is hard but you can take shortcuts to get approximate results. How do you determine what a blog is? Do you simply exclude all results from LiveJournal, Blogspot and MSN Spaces? That would exclude millions of blogs but it wouldn't catch the various blogs on self hosted domains like mine. Of course, you could get even trickier by always asking to exclude pages that match certain words like "DasBlog", "Movable Type" or "WordPress" which would probably take out another large chunk. By then the search results would probably blog free as you can get without resorting to expensive matching techniques. For icing on the cake it would probably be useful to also be able to skew results by popularity or freshness.

The second question Scoble asks is whether there is a search engine that gives you an API that can do all this stuff. Well MSN Search gives you RSS feeds which as I've mentioned in a previous post is sometimes the only API your website needs. More importantly, as pointed out in a recent post by Andy Edmonds entitled Search Builder Revealed, one can control how variables such as popularity or freshness affect search results. For example,

One could probably write a first cut at the search engine Robert is asking for using the MSN Search RSS feeds in about an hour or so. In a day, it could be made to be quite polished with most of the work being in the user interface. Yet another coding project for a rainy day.

Categories: MSN | Syndication Technology

May 3, 2005

@ 04:02 PM

Explaining Why Old Posts on MSN Spaces are Marked as New in Bloglines

Over the past few weeks there have been a bunch of reports on internal mailing lists about problems with MSN Spaces RSS feeds and Bloglines. The specific problem is that every once in a while old posts containing photos are marked as being new in Bloglines. There have also been some complaints that indicate this problem also manifests itself in Newsgator as well.

After some investigation we discovered that this problem seemed to only occur in RSS items containing links to photos hosted on our storage servers such as blog posts with photo attachments or photo albums. This led to a hunch that this problem only affected RSS readers that mark old posts as new if any content in the <description> element changes. Once this was confirmed then we had our answer. For certain reasons, the permalink URL to an image stored on our storage servers changes over time*. Whenever one of these changes to the URLs of images takes place, then RSS readers that detect changes to the content <description> element of a feed will indicate that this post has been altered.

A brief discussion with the folks behind Bloglines indicates that there isn't a straightforward solution to this problem. It is unlikely that they will change their RSS parsing code to deal with the idiosyncracies of RSS feeds provided by MSN Spaces. Being the author of an RSS reader as well, I can understand not wanting to litter the code with special cases. Similarly it is unlikely that we will be changing the behavior that causes URLs to images hosted on our servers to change in the short term.

After chatting with Mike and Jason about this one of the solutions we came up with was to use the dcterms:modified element in our RSS feeds. The element would contain the date of the last time a user directed change was made to the item, in this case the item would be a blog post or photo album. This means that RSS readers can simply test the value of the dcterms:modified element to determine if a post was changed by the user instead of performing inefficient textual comparisons of the contents of the post. In fact, the main reason I don't provide support for detecting changes in RSS items in RSS Bandit is the high rate of false positives as well as slowdowns caused by performing lots of text comparisons. Having this element in RSS feeds would make it a lot easier for me to support detecting changes to the contents of items in an RSS feed without degrading the user experience in the general case.

Of course, without RSS readers deciding to support the use of the dcterms:modified element in RSS feeds this will continue to be a problem. I need to send some mail to Mark Fletcher and the RSS-AggDev mailing list to see what people think about supporting this element as a way to get around the "bogus new items" problem.

* Note that this doesn't break links that reference that image with the old URL.

Categories: MSN | Syndication Technology

April 27, 2005

@ 12:52 PM

Adam Bosworth's Web of Data: Is RSS the only API your Website Needs?

Daniel Steinberg has a an article entitled Bosworth's Web of Data where he discusses some of the ideas Adam Bosworth evangelized in his keynote at the MySQL Users Conference 2005. Daniel writes,

Bosworth explained that the key factors that enabled the web began with simplicity. HTTP was simple enough that any "P" language or JavaScript programmer could build applications. On the consumption side, web browsers such as Internet Explorer 4 were committed to rendering whatever they got. This meant that people could be sloppy and they didn't need to be high priests of syntax. Because it was a sloppy standard, people who otherwise couldn't have authored content did. The fact that it was a standard allowed this single, simple, sloppy, open wire format to run on every platform.
...
The challenge is to take a database and do for the web what was done for content. Bosworth explained that you "need a model that allows for massively linear scalability and federation of information that can spread effortlessly across a federated web."

Solutions that were suggested were to use XML and XQuery. The problem with XML is that unlike HTML, there is not a single grammar. This removed the simple and sloppy aspects of the web. The problem with XQuery is the time it took to finish the specification. Bosworth noted that it took more than four years and that "anything that takes four years is not worth doing. It is over-designed. Intead, take six months and learn from customers."
...
The next solution used web services, which began as an easy idea: you send an XML request and you get XML back. Instead, the collection of WS-* specs were huge and again, overly complicated. Bosworth said that this was a deliberate effort on the part of the companies that control the specs, like IBM and Microsoft, which deliberately made the specification hard, because then only they could deliver technology to do it.
...
Bosworth predicts that RSS 2.0 and Atom will be the lingua franca that will be used to consume all data from everywhere. These are simple formats that are sloppily extensible. Anyone who wants to can use these formats to consume content or to author content. Contrast this with the Semantic Web, which requires that you get a large group of people to agree on the schema of everything.

There are lots of interesting ideas here. I won't dwell on the criticisms of XQuery & WS-* mainly because I tend to agree that they are both overdesigned and complicated. I also wont dwell on the apparent contradiction inherent in claiming that the Semantic Web is doomed because it requires people to agree on the same schema for everything then proposing that everyone agree on using RSS as the schema for all data on the Web. I have a suspicion of what he sees as the difference but I'll wait for a blog post from him clarifying that.

What I find very interesting is using RSS is the data access format for the Web. RSS gained popularity as a way to syndicate blog posts and news sites but its turned out to be a lot more versatile than that. Sites like Feedster and Amazon's OpenSearch technology show you can use RSS as a mechanism for providing search results and integrating search engines respectively. Podcasting shows you can use RSS to syndicate digital media content instead of just plain old text or HTML. With Amazon's syndicated feeds one can keep abreast of when new CDs, books and more are released.

Over the weekend I wrote the MSN Spaces photo album browser page which displays slideshows of all the photos in the various albums on a particular user's MSN Spaces space. This page also can display the photos on a randomly selected space. This webpage is entirely powered by RSS. The photos are obtained from the RSS feed for the Space and the list of random spaces is obtained by querying MSN search with the query "site:spaces.msn.com photo album" and requesting the results as RSS. In fact, the information from the MSN Spaces RSS feeds is enough to build something like the Flickr related tags browser, where instead of showing related tags one could show spaces related to the user from the information in their blog roll which happens to also be provided in the RSS feed. Pretty nifty and all without requiring building a REST, SOAP or XML-RPC API.

In situations where one simply wants to expose read-only data via a service on the Web, it's looking like RSS is the technology to beat. As more and more information is exposed as RSS feeds, there will be even more interesting things people will be able to do with this technology. At Microsoft we definitely are gung ho about exposing as much data as possible via RSS and I've been amazed at how much enthusiasm there is around the opportunites in this area.

Side Note: Yesterday while at the Microsoft Research Social Computing Symposium I was chatting with Randy Farmer, who's one of the guys behind Yahoo! 360° and Yahoo's purchase of Flickr, and I mentioned that it seemed like 2003 was the year that RSS really started to take off. This was also the year that Dave Winer froze the RSS 2.0 spec and Sam Ruby gathered all the malcontents in the XML syndication space and gave them a shiny new toy to play with in Atom. Coincidence?

Categories: Syndication Technology | XML

April 14, 2005

@ 02:37 PM

Technorati Tags, Blogspot Spam and Metacrap

I've recently ben thinking about the problems facing search and navigation systems that depend on metadata applied to content provided by the creator of the content. This includes systems like Technorati Tags which searches the <category> elements in various RSS feeds and folksonomies like del.icio.us which searches tags applied to links submitted by users.

A few months ago I wrote a post entitled Technorati Tags: Why Do Bad Ideas Keep Resurfacing? which pointed out that Technorati Tags had the same problems that had plagued previous metadata self-annotation schemes on the Web such as HTML META tags. The main problem being that People Lie. Since then I've seen a number of complaints from developers of search engines that depend on RSS metadata.

In a comment to a post entitled Blogspot Spam in Matthew Mullenweg's weblog, Bob Wyman of PubSub.com writes

A very high percentage of the spam blogs that we process at PubSub.com also come from blogspot. We’ve got more serious “problems” in Japan and China, however, for the English language, blogspot is pretty much “spamspot.” It is, as always, disappointing to see people abuse a good and free service like that offered by Google/Blogspot in such a way.

In a post entitled Turning Blogspot Off Scott Johnson of Feedster wrote

All Blogspot blogs right now are included in every Feedster search by default. And now, due to the massive problems with spam on Blogspot, we're actually at the point of saying "Why don't we make searching Blogspot optional for all Feedster users". What's going on is that spammers have learned how to massively exploit Blogspot -- to the point where at times 90% of the blog traffic we get from Blogspot is spam.

Now that's bad. Actually this spam issue just plain sucks. And its starting to ruin the user experience that people have with Feedster.

The main reason these spam blogs haven't started affecting the Technorati Tags feature is that Blogspot doesn't support categories. However it is clear that the same problems search engines faced when they decided to trust HTML metadata are beginning to show up when it comes to searching RSS metadata. This is one place where established search engines would have a leg up on upstarts like Feedster and PubSub if they got into the RSS search market since they've already had to adapt to all sorts of 'search engine optimization' tricks.

On a related note, combining the above information about the high number of spam blogs on Google's Blogspot service with the recent article Bloggers Pitch Fits Over Glitches which among other things states

In fact, enter "Blogger sucks" in Google and you get 720,000 results, with most of the entries on the first few pages (read: the most popular) dedicated to these exasperating tech snafus. It can make for some pretty ugly reading. Imagine what they might say if they actually paid for the service?

But if you look at Blogger's status page, which lists service outages, you can see why they are so mad.

It seems that Doc Searles may have been onto something about Google quiting innovating in Blogger.

Categories: Syndication Technology

April 3, 2005

@ 06:22 PM

Attention.xml and Collaborative Filtering

In his post Waiting for Attention… or something like it Steve Gillmor describes our conversation at ETech and responds to some of the thoughts in my post Nightcrawler Thoughts: Thumbs Up, Thumbs Down and Attention.xml. My post ignored some of the collaborative aspects to the solution to the attention problem that Steve would like to see. Specifically

First I go to my reputational thought leaders, the subs and recurring items that bubble to the top of my attention list. It’s a second-degree-of-separation effect, where the feeds and items that a Jon Udell and a Doc Searls and a Dave Winer prioritize are gleaned for hits and duplicates, and returned as a weighted stream. In turn, each of those hits can be measured for that author’s patterns and added in to provide a descending algorithim of influence. All the while, what is not bubbling up is driven further down the stack, making more room for more valuable content.

It’s important to remember that this is an open pool of data, massaged, sliced, and diced by not just a Technorati but a PubSub or Feedster or Google or Yahoo or any other service, and my inforouter will gladly consume any return feed and weight it according to the success (or lack of it) that the service provides in improving my results. Proprietary services can play here as well, by providing their unique (and contractually personalized) streams as both a value proposition for advertisers and marketers and as an attractor for more users of their service.

The part of the attention problem I focused on in my previous post was "Based on my reading habits, tell me what new stuff I should read" but Steve Gillmor points out that the next level beyond that is "Based on the reading habits of the people whose opinion I trust, tell me what new stuff I should read". People already do this to a lesser extent by hand today. People who subscribe to Robert Scoble's link blog or various individual RSS feeds in del.icio.us are basically trusting a member of their social network to filter out the blogosphere for them.

Once one knows how to calculate the relative importance of various information sources to a reader, it does make sense that the next step would be to leverage this information collaboratively.

The only cloud I see on the horizon is that if anyone figures out how to do this right, it is unlikely that it will be made available as an open pool of data. The 'attention.xml' for each user would be demographic data that would be worth its weight in gold to advertisers. If Bloglines could figure out my likes and dislikes right down to what blog posts I'd want to read, I find it hard to imagine why the Bloglines team would make that information available to anyone including the user. For comparison, it's not like Amazon makes my 'attention.xml' for books and CDs available to myself or their competitors.

By the way, why does every interesting wide spanning web service idea eventually end up sounding like Hailstorm?

Categories: Syndication Technology

March 31, 2005

@ 03:45 PM

The Universal Inbox

Bloglines just published a new press release entitled Bloglines is First to Go Beyond the Blog with Unique-to-Me Info Updates which is excerpted below

Oakland, CA -- March 30, 2005 -- Ask Jeeves®, Inc. (Nasdaq: ASKJ), today announced that Bloglines™ (www.bloglines.com), the world’s most popular free online service for searching, subscribing, publishing and sharing news feeds, blogs and rich web content has released the first of a wave of new capabilities that help consumers monitor customized kinds of dynamic web information. With these new capabilities, Bloglines is the first web service to move beyond aggregating general-audience blogs and RSS news feeds to enable individuals to receive updates that are personal to their daily lives.

Starting today, people can track the shipping progress of package deliveries from some of the world’s largest parcel shipping companies—FedEx, UPS, and the United States Postal Servicewithin their Bloglines MyFeeds page. Package tracking in Bloglines encompasses international shipments, in English. Bloglines readers can look forward to collecting more kinds of unique-to-me information on Bloglines in the near future, such as neighborhood weather updates and stock portfolio tracking.

“Bloglines is a Universal Inbox that captures all kinds of dynamic information that helps busy individuals be more productive throughout the day—at the office, at school, or on the go,” said Mark Fletcher, vice president and general manager of Bloglines at Ask Jeeves. “With an index of more than 370 million blog and news feed articles in seven languages, we’re already one of the largest wells of dynamic web information. With unique-to-me news updates we’re aiming to be the most comprehensive and useful personalized information resource on the web.”

So it looks like Bloglines is evolving into MyYahoo! or MyMSN which already provide a way to get customized personal information from local news and weather reports to RSS feeds and email inboxes.

I've been pitching the concept of the digital information hub to folks at work but I think the term 'universal inbox" is a more attractive term. As a user spends more and more time in front of an information consumption tool be it an email reader, RSS reader or online portal, the more data sources the user wants supported by the tool. Online portals are now supporting RSS. Web-based RSS readers are now supporting content that would traditionally show up in a personalized view at an online portal.

At MSN, specifically with http://www.start.com/2/, we are exploring what would happen if you completely blurred the lines between a web-based RSS reader and the traditional personalized dashboard provided by an online portal. It is inevitable that both mechanisms of consuming information online will eventually be merged in some way. I suspect the result will look more like what Steve Rider's team is building than MyYahoo! or Bloglines do today.

As I mentioned before we'd love feedback about all the stuff we are doing at start.com. Don't be shy send your feedback.

Categories: MSN | Syndication Technology

March 27, 2005

@ 08:47 PM

WordPress Users Add Another RSS Gotcha to My List

Recently I was chatting with Steve Rider on the Start.com team about the various gotchas awaiting them as they continue to improve the RSS aggregator at http://www.start.com/1/. I mentioned issues like feeds which don't use title's like Dave Winer's and HTML showing up in titles.

I thought I knew all the major RSS gotchas and RSS Bandit handled them pretty well. However I recently got two separate bug reports from users of WordPress about RSS Bandit's inability to handle extension elements in their feed. The first complaint was about Kevin Devin's RSS feed which couldn't be read at all. A similar complaint was made by Jason Bock who was helpful enough to debug the problem himself and provide an answer in his post RSS Bandit problem fixed where he wrote

I can't believe what it took to fix my feed such that Rss Bandit could process it.

I'm just dumbfounded.

Basically, this is the way it looked:
						<rss xmlns:blog="urn:blog">
   <blog:info directory="JB\Blog" />
   <channel>
      
   </channel>
</rss>

				
This is what I did to fix it:
						<rss xmlns:blog="urn:blog">
   <channel>
      
   </channel>
   <blog:info directory="JB\Blog" />
</rss>

				
After debugging the Rss Bandit code base, I found out what the problem was. Rss Bandit reads the file using an XmlReader. Basically, it goes through the elements sequentially, and since the next node after <rss> wasn't <channel>, it couldn't find any information in the feed, and that's what was causing the choke. Moving <blog:info> to the end of the document solved it.

The assumption I made when developing the RSS parser in RSS Bandit was that the top level rss element would have a channel element as its first child element. I handle extension elements if they appear as children of the channel or item element since these seem logical but never thought anyone would apply an extension to the rss element. I took a look at what the RSS 2.0 specification says about where extension elements can appear and it seems my assumption was wrong since it states

RSS originated in 1999, and has strived to be a simple, easy to understand format, with relatively modest goals. After it became a popular format, developers wanted to extend it using modules defined in namespaces, as specified by the W3C.

RSS 2.0 adds that capability, following a simple rule. A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace.

Since there is no explicit restriction of where extension elements can appear it looks like I'll have to make changes to be able to expect extension elements anywhere in the feed.

My apologies to the folks who've had problems reading feeds because of this oversight on my part. I'll fix the issue today and refresh the installer later this week.

Categories: RSS Bandit | Syndication Technology

March 24, 2005

@ 06:13 PM

Nightcrawler Thoughts: Thumbs Up, Thumbs Down and Attention.xml

While at ETech I got to spend about half an hour chatting with Steve Gillmor about what he's called "the attention problem" which isn't the same thing as the attention.xml specification. The attention problem is the problem that faces every power users of XML syndication clients such as RSS Bandit or Bloglines. It is so easy to subscribe to various feeds that eventually readers get overwhelmed by the flood of information hitting their aggregator's inbox. Some have used the analogy "drinking from a firehose" to describe this phenomenon.

This problem affects me as well which is the impetus for a number of features in the most recent release of RSS Bandit such as newspaper views which allow one to view all the unread posts in a feed in single pane, adding more sortable columns such as author and comment count to the list view, and skim mode ('mark all items as read on exiting a feed or category'). However the core assumption behind all these features is that the user is reading every entry.

Ideally a user should be able to tell a client, "Here are the sites I'm interested in, here are the topics I'm interested in, and now only show me stuff I'd find interesting or important". This is the next frontier of features for RSS/ATOM aggregators and an area I plan to invest a significant amount of time in for the next version of RSS Bandit.

In my post Some Opinions on the Attention.xml Specification I faulted the attention.xml specification because it doesn't seem to solve the problems it sets out to tackle and some of the data in the format is unrealistic for applications to collect. After talking to Steve Gillmor I realize another reason I didn't like the attention.xml spec; it ignores all the hard problems and assumes they've been solved. Figuring out what data or what algorithms are useful for determining what items are relevant to a user is hard. Using said data to suggest new items to the user is hard. Coming up with an XML format for describing an arbitrary set of data that could be collected by an RSS aggregator is easy.

There are a number of different approaches I plan to explore over the next few months in various alphas of the Nightcrawler release of RSS Bandit. My ideas have run the gamut from using Bayesian filtering to using the Technorati link cosmos feature for weighting posts [in which case I'd need batch methods which is something I briefly discussed with Kevin Marks at Etech last week]. There is also weighting by author that needs to be considered, for example I read everything written by Sam Ruby and Don Box. Another example is a topic that may be mundane (e.g. what I had for lunch) and something I'd never read if published by a stranger but would be of interest to me if posted by a close friend or family member.

We will definitely need a richer extensibility model so I can try out different approaches [and perhaps others can as well] before the final release. Looks like I have yet another spring and summer spent indoors hacking on RSS Bandit to look forward to. :)

Categories: RSS Bandit | Syndication Technology

March 21, 2005

@ 02:05 AM

Find Out More About MSN's Web-based RSS Aggregator at Start.com

Steve Rider, one of the great folks behind start.com, has started a category on his blog devoted to the site. His first post discusses some of the changes they've made to the site in the past week. He writes

As soon as I finish this post I'll be digging in my heels for the afternoon and working on OPML import support and increasing the number of headlines per feed. Hey, what are rainy Sunday afternoons for?

Here are some of the improvements we've made since we were "discovered" a week and a half ago:

Start.com/1

Full Firefox support

Migrated from cookie-based solution to back-end store for feeds and preferences

Removed the restriction on the number of feeds that can be added

Added ability to delete items from My Feeds and Recent Searches

Title of module is now hyperlinked (oops) and also gets updated if the title in the RSS feed is different

Show search history in correct order

Lots of fit and finish and minor cosmetic changes

Start.com/2

Fixed a few problems with the ActiveX control that were causing boomarks not to be imported (there are still a couple of issues affecting some users)

Added OPML import support

Increased performance when fetching from server by making more async calls

One of the features I asked Steve for was OPML import so it's good to see that it's already being added to the site. I didn't realize how fast they'd be turning around on feature requests. Looks like I should dust off my list of feature requests for online aggregators and swing by Steve's office sometime this week. Sweet.

Categories: MSN | Syndication Technology

March 18, 2005

@ 07:21 PM

ETech 2005 Trip Report: Odeo -- Podcasting for Everyone

These are my notes from the Odeo -- Podcasting for Everyone session by Evan Williams.

Evan Williams was the founder of Blogger and Odeo is his new venture. Just as in his post How Odeo Happened Evan likens podcasting to audioblogging and jokingly states that he and Noah Glass invented podcasting with AudioBlogger. Of course, the audience knew he was joking and laughed accordingly. I do wonder though, how many people think that podcasting is simply audioblogging instead of realizing that the true innovation is the time shifting of digital media to the user's player of choice.

The Odeo interface has three buttons across the top; Listen, Sync and Create. Users can choose to listen to a podcast from a directory of podcasts on the site directly from the Web page. They can choose to sync podcasts from the directory down to their iPod using a Web download tool which also creates Odeo specific playlists in iTunes.

The Odeo directory also contains podcasts that were not created on the site so they can be streamed to users. If third parties would rather not have their podcasts hosted on Odeo they can ask for them to be taken down.

The Create feature was most interesting. The website allows users to record audio directly on the website without needing any desktop software. This functionality seems to be built with Flash. Users can also save audio or upload MP3s from their hard drive which can then be spliced into their audio recordings. However one cannot mix multiple audio tracks at once (i.e. I can't create an audio post then add in background music later, I can only append new audio).

The revenue model for the site will most likely be by providing hosting and creating services that allow people to charge for access to their podcasts. There was some discussion on hosting music but Evan pointed out that there were already several music sites on the Web andd they didn't want to be yet another one.

Odeo will likely be launching in a few weeks but will be invitation-only at first.

Categories: Syndication Technology | Trip Report

March 15, 2005

@ 08:40 PM

ETech 2005 Trip Report: Build Contentcentric Applications on RSS, Atom, and the Atom API

These are my notes on the Build Contentcentric Applications on RSS, Atom, and the Atom API session by Ben Hammersley.

This was a 3.5 hour tutorial session [which actually only lasted 2.5 hours].

At the beginning, Ben was warned the audience that the Atom family of specifications are still being worked on but should begin to enter the finalization stages this month. The specs have been stable for about the last 6 months, however anything based on work older than that (e.g. anything based on the Atom 0.3 syndication format spec) may be significantly outdated.

He indicated that there were many versions of syndication formats named RSS, mainly due to acrimony and politics in the online syndication space. However there are basically 3 major flavors of syndication formats; RSS 2.0, RSS 1.0 and Atom.

One thing that sets Atom apart from the other formats is that a number of optional items from RSS 1.0 and RSS 2.0 are mandatory in Atom. For example, in RSS 2.0 an item can contain only a <description> and be considered valid while in RSS 2.0 an item with a blank title and a RDF:about (i.e. link) can be considered valid. This is a big problem for consumers of feeds, when basic information like the date of the item isn't guaranteed to show up.

There then was a slide attempting to show when to use each syndication format. Ben contended that RSS 2.0 is good for machine readable lists but not useful for much else outside of displaying information in an aggregator. RSS 1.0 is useful for complex data mining, not useful for small ad-hoc web feeds. Atom is best of both worlds, a simple format yet strictly defined data.

I was skeptical of this definition especially since the fact that people are using RSS 2.0 for podcasting flies in the face of his contentions about what RSS 2.0 is good for. In talking with some members of the IE team, who attended the talk with me, about this part of the talk later they agreed that Ben didn't present any good examples of use cases that the Atom syndication format satisfied that RSS 2.0 didn't.

Atom has a feed document and an entry document, the latter being a new concept in syndication. Atom also has a reusable syntax for generic constructs (person, link, text, etc). At this point Marc Canter raised the point that there weren't constructs in Atom for certain popular kinds of data on the Web. Some example Marc gave were that there are no explicit constructs handle tags (i.e. folksonomy tags) or digitial media. Ben responded that the former could be represented with category elements while the latter could be binary payloads that were either included inline or linked from an entry in the feed.

Trying a different tack, I asked how one represented the metadata for digital content within an entry. For example, I asked about doing album reviews in Atom. How would I provide the metadata for my album review (name, title, review content, album rating) as well as the metadata for the album I was reviewing (artist, album, URL, music sample(s), etc) and his response was that I should use RSS 1.0 since it was more oriented to resources talking about other resources.

The next part of the talk was about the Atom API which is now called the Atom publishing protocol. He gave a brief history of weblog APIs starting with the Blogger API and ending with the MetaWeblog API. He stated that XML-RPC is inelegant while SOAP is "horrible overkill" for solving the problem of posting to a weblog from an API. On the other hand REST is elegant. The core principles of REST are using the HTTP verbs like GET, PUT, POST and DELETE to manipulate representations of resources. In the case of Atom, these representations are Atom entry and feed documents. There are four main URI endpoints; the PostUri, EditUri, FeedUri, and the ResourcePostUri. In a technique reminiscent of RSD, websites that support Atom can place pointers to the API end points by using <link> tags with appropriate values for the rel attribute.

At the end of the talk I asked what the story was for versioning both the Atom syndication format and the publishing protocol. Ben floundered somewhat in answering this question but eventually pointed to the version attribute in an Atom feed. I asked how would an application tell from the version attribute if it had encountered a newer but backwards compatible version of the spec or was the intention that clients should only be coded against one version of Atom? His response was that I was 'looking for a technological solution to social problem' and more importantly there was little chance that the Atom specifications would change anyway.

Yeah, right.

During the break, Marc Canter and I talked about the fact that both the Atom syndication format and Atom publishing protocol are simply not rich enough to support existing blogging tools let alone future advances in blogging technologies. For example, in MSN Spaces we already have data types such as music lists and photo albums which don't fit in the traditional blog entry syndication paradigm that Atom is based upon. More importantly it is unclear how one would even extend to do this in an acceptable way. Similar issues exist with the API. The API already has less functionality existing APIs such as the MetaWeblog API. It is unclear how one would perform the basic act of querying one's blog for a list of categories to populate the drop down list used by a rich client which is a commonly used feature by such tools. Let alone, doing things like managing one's music list or photo album which is what I'd eventually like us to do in MSN Spaces.

The conclusion that Marc and I drew was that just to support existing concepts in popular blogging tools, both the Atom syndication format and the Atom API would need to be extended.

There was a break, after which there was a code sample walkthrough which I zoned out on.

Categories: Syndication Technology | Trip Report

March 13, 2005

@ 07:48 PM

What's Wrong with Podcasting?

This time tomorrow I'll be at the O'Reilly Emerging Technology Conference. Checking out the conference program, I saw that Evan Williams will be hosting a session entitled Odeo -- Podcasting for Everyone. I've noticed the enthusiasm around podcasting among certain bloggers and the media but I am somewhat skeptical of the vision folks like Evan Williams have espoused in posts such as How Odeo Happened.

In thinking about podcasting, it is a good thing to remember the power law and the long tail. In his post Weblogs, Power Laws and Inequality, Clay Shirky wrote

The basic shape is simple - in any system sorted by rank, the value for the Nth position will be 1/N. For whatever is being ranked -- income, links, traffic -- the value of second place will be half that of first place, and tenth place will be one-tenth of first place. (There are other, more complex formulae that make the slope more or less extreme, but they all relate to this curve.) We've seen this shape in many systems. What've we've been lacking, until recently, is a theory to go with these observed patterns.
...
A second counter-intuitive aspect of power laws is that most elements in a power law system are below average, because the curve is so heavily weighted towards the top performers. In Figure #1, the average number of inbound links (cumulative links divided by the number of blogs) is 31. The first blog below 31 links is 142nd on the list, meaning two-thirds of the listed blogs have a below average number of inbound links. We are so used to the evenness of the bell curve, where the median position has the average value, that the idea of two-thirds of a population being below average sounds strange. (The actual median, 217th of 433, has only 15 inbound links.)

The bottom line here is that a majority of weblogs will have small to miniscule readership. However the focus of the media and the generalizations made about blogging will be on popular blogs with large readership. But the wants and needs of popular bloggers often do not mirror those of the average blogger. There is a lot of opportunity and room for error when trying to figure out where to invest in features for personal publishing tools such as weblog creation tools or RSS reading software. Clay Shirky also mentioned this in his post where he wrote

Meanwhile, the long tail of weblogs with few readers will become conversational. In a world where most bloggers get below average traffic, audience size can't be the only metric for success. LiveJournal had this figured out years ago, by assuming that people would be writing for their friends, rather than some impersonal audience. Publishing an essay and having 3 random people read it is a recipe for disappointment, but publishing an account of your Saturday night and having your 3 closest friends read it feels like a conversation, especially if they follow up with their own accounts. LiveJournal has an edge on most other blogging platforms because it can keep far better track of friend and group relationships, but the rise of general blog tools like Trackback may enable this conversational mode for most blogs.

The value of weblogging to most bloggers (i.e. the millions of people using services like LiveJournal, MSN Spaces and Blogger) is that it allows them to share their experiences with friends, family & strangers on the Web and it reduces the friction for getting content on the Web when compared to managing a personal homepage which was the state of the art in personal publishing on the Web last decade. In addition, there are the readers of weblogs to consider. The existence of RSS syndication and aggregators such as RSS Bandit & Bloglines have made it easy for people to read multiple weblogs with ease. According to Bloglines, their average user reads just over 20 feeds.

Before going into my list of issues with podcasting, I will point out that I think the current definition of podcasting which limits it to subscribing to feeds of audio files is fairly limiting. One could just as easily subscribe to other digital content such as video files using RSS. To me podcasting is about time shifting digital content, not just audio files.

With this setup out of the way I can list the top three reasons I am not as enthusiastic about podcasting as folks like Evan Williams

Creating digital content and getting it on the Web isn't easy enough: The lowest friction way I've seen thus far for personal publishing of audio content on the Web is the phone posting feature of LiveJournal but it is still a sub optimal solution. It gets worse when one considers how to create and share richer digital content such as videos. I suspect mobile phones will have a big part to play in the podcast creation if it becomes mainstream. On the other hand, sharing your words with the world doesn't get much easier than using the average blogging tool.
Viewing digital content is more time consuming than reading text content: I believe it takes the average person less time to read an average blog posting than to listen to an average audio podcast. This automatically reduces the size of the podcast market compared to plain old text blogging. As mentioned earlier, the average Bloglines user subscribes to 20 feeds. Over the past two years, I've gone from subscribing to about 20 feeds to subscribing to around 160. However it would be impossible for me to find the time to listen to 20 podcast feeds a week, let alone scaling up to 160.
Digital content tends to be opaque and lack metadata: Another problem with podcasting is that there are no explicit or implicit metadata standards forming around syndicating digital media content. The fact that an RSS feed is structured data that provides a title, author name, blog name, a permalink and so on allows one to build rich applications for processing RSS feeds both globally like Technorati & Feedster or locally like RSS Bandit. As long as digital media content are just opaque blobs of data hanging of an item in a feed, the ecosystem of tools for processing and consuming them will remain limited.

This is not to say that podcasting won't go a long way in making it easier for popular publishers to syndicate media content to users. It will, however it will not be the revolution in personal publishing that the combination of RSS and weblogging have been.

I'll need to remember to bring some of these up during Evan Williams' talk. I'm sure he'll have some interesting answers.

Categories: Syndication Technology

March 3, 2005

@ 03:20 PM

The Sorry State of Affairs in Weblog APIs

In his post Maybe a better posting api is needed James Robertson writes

I've had harsh words to say about Atom in the past, but that was mostly over the feed format. I haven't looked at the posting API yet - maybe I should. The Blogger API and the MetaqWebLog API are simply nightmares. There doesn't seem to be any standard way for client tools to interact with a server - I was debugging the interaction between a client and my server last night via IRC. Even better - the client was set to use the MetaWebLog api, but was sending requests to blogger.apiNameHere names. Sheesh. There was also an interesting difference in api points - I had implemented 'getUserBlogs', and the client was sending 'getUsersBlogs'. A quick Google search turned up references to both. Sigh.

I implemented both names, pointing to the same method. I had to map blogger names over to MetaWebLog entry points, at least for the tool being tested last night - who knows what oddness will turn up next. What a complete mess...

I've been similarly stunned by the complete and utter mess the state of weblogging APIs is in. As I mentioned in my post What Blog Posting APIs are supported by MSN Spaces? one of my duties at work is to investigate the options and design the blogging API story for MSN Spaces. In doing this, I have discovered all the issues James Robertson brought up and more. Mark Pilgrim has an ApacheCon presentation entitled The Atom API which highlights some of the various issues. One of the lowlights from his presentation is the fact that the MetaWeblog API spec significantly contradicts itself by stating that the data model of structs passed between client and server is based on RSS 2.0 then includes examples of requests and responses that show that it clearly isn't.

My personal favorite bit of information that can only be discovered by trial and error is the existence of the blogger.deletePost method which isn't listed in the Blogger API documentation but is supported by a number of blog posting clients and weblog servers.

I can't believe that anyone who wants to write a client or server that uses the standard weblogging APIs has to go through this crap. It almost makes me want to go join in the atom-protocol discussions. Almost.

Categories: MSN | Syndication Technology | XML Web Services

January 29, 2005

@ 03:38 PM

One Click Subscription: The World is Full of Bad Ideas

I had promised myself I wouldn't get involved in this debate but it seems every day I see more and more people giving credence to bad ideas and misconceptions. The debate I am talking about is one click subscriptions to RSS feeds which Dave Winer recently called the Yahoo! problem. Specifically Dave Winer wrote

Yahoo sends emails to bloggers with RSS feeds saying, hey if you put this icon on your weblog you'll get more subscribers. It's true you will. Then Feedster says the same thing, and Bloglines, etc etc. Hey I did it too, back when Radio was pretty much the only show in town, you can see the icon to the right, if you click on it, it tries to open a page on your machine so you can subscribe to it. I could probably take the icon down by now, most Radio users probably are subscribed to Scripting News, since it is one of the defaults. But it's there for old time sake, for now.

Anyway, all those logos, when will it end? I can't imagine that Microsoft is far behind, and then someday soon CNN is going to figure out that they can have their own branded aggregator for their own users (call me if you want my help, I have some ideas about this) and then MSNBC will follow, and Fox, etc. Sheez even Best Buy and Circuit City will probably have a "Click here to subscribe to this in our aggregator" button before too long.

That's the problem.

Currently I have four such one click subscription buttons on my homepage; , , , and . Personally, I don't see this as a problem since I expect market forces and common sense to come into play here. But let's see what Dave Winer proposes as a solution.

Ask the leading vendors, for example, Bloglines, Yahoo, FeedDemon, Google, Microsoft, and publishers, AP, CNN, Reuters, NY Times, Boing Boing, etc to contribute financially to the project, and to agree to participate once it's up and running.
...
Hire Bryan Bell to design a really cool icon that says "Click here to subscribe to this site" without any brand names. The icon is linked to a server that has a confirmation dialog, adds a link to the user's OPML file, which is then available to the aggregator he or she uses. No trick here, the technology is tried and true. We did it in 2003 with feeds.scripting.com.

This 'solution' to the 'problem' boggled my bind. So every time someone wants to subscribe to an RSS feed it should go through a centralized server? The privacy implications of this alone are significant let alone the creation of a central point of failure. In fact Dave Winer recently posted a link that highlights a problem with centralized services related to RSS.

Besides my issues with the 'solution' I have quite a few problems with the so-called 'Yahoo problem' as defined. The first problem is that it only affects web based aggregators that don't have a client application installed on the user's desktop. Desktop aggregators already have a de facto one-click subscription mechanism via the feed URI scheme which is supported by at least a dozen of the most popular aggregators across multiple platforms including the next version of Apple's web browser Safari. Recently both Brent Simmons (in his post feed: protocol) and Nick Bradbury (in his post Really Simple Subscription) reiterated this fact. In fact, all four services whose buttons I have on my personal blog can utilize the feed URI scheme as a subscription mechanism since all four services have desktop applications they can piggyback this functionality onto. Yahoo and MSN have toolbars, Newsgator has their desktop aggregator and Bloglines has its notifier.

The second problem I have with this is that it aims to stifle an expression of competition in the market place. If the Yahoo! aggregator becomes popular enough that every website puts a button for it beside a link to their RSS feed then they deserve kudos for spreading their reach since it is unlikely that this would happen without some value being provided by their service. I don't think Yahoo! should be attacked for their success nor should their success be termed some sort of 'problem' for the RSS world to fix.

As I mentioned earlier in my post I was going to ignore this re-re-reiteration of the one click subscription debate until I saw rational folks like Tim Bray in his post One-Click subscription actually discussing the centralized RSS subscription repository idea without giving it the incredulity it deserves. Of course, I also see some upturned eyebrows at this proposal from folks like Bill de hÓra in his post A registry for one click subscription, anyone?

Categories: Syndication Technology

January 13, 2005

@ 02:32 PM

The Netflix Problem: Syndicating Ordered Lists in RSS

I recently found a complaint about how NetFlix's RSS feeds appear in RSS Bandit from Danny Glasser, a dev manager on my team, in his post Netflix sucks less?. He wrote

Netflix has recently created RSS feeds for subscribers' current queues and recent rental activity, so in theory I can exchange the URLs with friends and view their queues in an RSS aggregator. I've been playing with this a bit and unfortunately it doesn't render particularly well in RSS Bandit. It doesn't sort nicely and old entries aren't expired properly. I'm not sure if this is true with other aggregators but I suppose I could ask Dare.

I decided to take a look at the various Netflix RSS feeds and the problem became instantly obvious. Below is an excerpted version of the Netflix Top 100 RSS feed which I'll use the discuss the various problems with syndicating lists in RSS.

<rss version="2.0"> <channel> <title>Netflix Top 100</title> <ttl>20160</ttl> <link>http://www.netflix.com/Top100</link> <description>Top 100 Netflix movies, published every 2 weeks.</description> <language>en-us</language> <item> <title>1- Mystic River</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031232&trkid=134852</link> <description><![CDATA[Three childhood friends, Sean (Kevin Bacon), Dave (Tim Robbins) and Jimmy (Sean Penn) are reunited in Boston 25 years later when they are linked together in the murder investigation of Jimmy's daughter. ]]></description> </item> <item> <title>2- The Last Samurai</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031274&trkid=134852</link> <description><![CDATA[Tom Cruise stars as Captain Nathan Algren in this epic movie set in 1870s Japan. ]]></description> </item> <item> <title>3- Something's Gotta Give</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031278&trkid=134852</link> <description><![CDATA[Sixty and still sexy, Harry (Jack Nicholson) is having the time of his life, wining, dining and bedding women half his age.]]></description> </item> </channel> </rss>

There are several problems with the above feed. The first is a combination of the fact that no mechanism is provided for uniquely identifying items in the feed using GUIDs and the lack of dates in the feed. The problem manifests itself when two weeks from now the top 100 list is refreshed. Using the above feed as an example imagine that a new entry becomes number 1 thus moving Mystic River and Last Samurai one notch down. Now several things break at once.

The first problem is that the user has no way of grouping together top 100 lists for each week so I can't have last month's top 100 list and this week's top 100 list in my aggregator in any sort of meaningful way. Even if there were dates the fact that there are no GUIDs means that the aggregator will likely use the <link> element to uniquely identify the item for determining whether the user has seen it or not. This means that only the new entrant to the list will be marked as unread while movies that were already in the list and have been seen remain unhighlighted. I can see arguments for both viewpoints. On the one hand Netflix may expect that the aggregator should always have 100 items in it with only the new entrants in the list being marked as unread and positions of movies changing from week to week. On the other hand, a user may want to keep the top 100 feeds for each time period in their aggregator so they can see a timeline of the movie rankings in their aggregator. In that case, every two weeks there should be a 100 new items waiting for the user. Unfortunately neither of these happens in RSS Bandit or a number of other aggregators with Netflix's current implementation. Instead old entries in the feed and new entries show up munged together with no separation of them based on date so users can't group by date. Another problem is that he link to the movie's page is the only thing used to uniquely identify the item. So when the feed is fetched and the position of a movie changes (i.e. the title changes) instead of creating a new item in the aggregator, RSS Bandit assumes it is a post whose title has been changed and simply updates the feed in place. This makes sense in 99% of aggregator scenarios when changing the title usually means a typo was fixed in a blog post. However in the Netflix case this means a movie will always show up with its most recent position in the top 100 list. BUT once the movie leaves the list (i.e. is dropped off the feed) the movie will remain at its last position seen in the feed within the aggregator.

The second problem is the fact that there is no way to tell the aggregator how to sort the list of movies. Sorting using the title won't work because it will be an alphabetical sort, ditto for using the description. Even if there were dates, using those for sorting wouldn't make much sense either. Ideally there would have to be some way for the item to specify its position relative to other items in the same list with it at a given point in time. Again, this would require the dates should be attached to the items in the feed.

There are a number of issues raised by the Netflix problem. One could look at the problem as an indication that there should be an item expiry mechanism in RSS so the aggregator should know to dump the list every 2 weeks and refresh it with the new list. Others could argue that this could be solved by giving each item a unique ID independent of the movie and specify its date as well as a sort position. This would allow the user to track changing lists over time even if the same item appears in the list multiple times.

I don't think I've seen anyone raise any of the various problems with the Netflix feeds online. This is surprising since I'd be hard pressed to imagine how any aggregator does the 'right' thing with these feeds. More importantly the Netflix feeds show a significant hole in RSS as well as syndication formats like Atom whose primary goal seems to be RSS feature parity.

I'm going to bring this up on the RSS-AggDev mailing list and see what the other aggregator developers think about this problem.

Categories: Syndication Technology

December 27, 2004

@ 06:40 PM

Some Opinions on the Attention.xml Specification

Thanks to Danny Ayers post entitled Attention, Attention.xml I finally found a link to the attention.xml specification that was referenced in Robert Scobles post Gillmor's report on Attention.xml is done where he wrote

One of my 2005 predictions is coming true. Steve Gillmor's report on Attention.xml is included in Esther Dyson's Release 1.0. Thanks to Mike Manuel for letting us know the report is now available for $80. I'll have to check our corporate library and see if it's available there (I believe it is).

Danny Ayers does a good job of taking a critical look at the syntax chosen for the attention.xml format. I on the other hand, have fundamental questions about the purpose of the format and how it expects to solve the problems highlighted in its problem statement. As at the time I wrote this post the attention.xml problem statement stated

How many sources of information must you keep up with?

Tired of clicking the same link from a dozen different blogs?

RSS readers collect updates, but with so many unread items, how do you know which to read first?

Attention.XML is designed to to solve these problems and enable a whole new class of blog and feed related applications.

These are rather lofty goals and as the author of a moderately popular RSS reader I am interested in solutions to these problems. Looking at the attention.xml format ~~schema~~ description it seems the format is primarily a serialization of the internal state of an RSS reader including information such as

what feeds the user reads
when feeds were added or removed from the users subscription list
the last time a user read a feed
the amount of time the user spent reading a post
which links in the post the user cliecked on
the users rating for a post or feed
etc

This list of data seems suspiciously like a format for synchronizing the state between multiple aggregators or an aggregator client and server. This makes it very similar to the Synchronization of Information Aggregators using Markup (SIAM) format which I authored with input from a number of aggregator authors including Luke Hutteman (author of SharpReader), Morbus Iff (author of AmphetaDesk) and Brent Simmons (author of NetNewsWire).

Before going into some of the details around the technical difficulties in recording some of the information that the attention.xml format requires I want to go back and address the problem statement. I can't see how the internal state of an RSS reader serialized to some XML format solves problems like users seeing multiple blogs posts from people linking to the same item or determining the relative importance of various unread items in a users queue. The former can be solved quite readily by aggregators today (I don't do it in RSS Bandit because I think the performance cost is too high and it is unclear that this feature is actually beneficial) while the latter is bordering on an AI problem which isn't going to be solved with the limited set of information contained in the attention.xml format. In short, I can't see how the information in an attention.xml document actually solves the problems described in the problem statement.

Now on the technical and social difficulties of creating the attention.xml format. The first problem is that not every aggregator can record all the information that is required by the format. Some aggregators don't have post rating features, some won't or can't track how long a user was reading an item [which will vary from user to user anyway due to people's different reading speeds], and others don't record the user's relationship to the author do the feed. So attention.xml requires a lot of new features from RSS readers. Assuming that the spec gets some traction, I expect that different aggregators will add support for different features while ignoring others (e.g. I can see myself adding post rating features to RSS Bandit but I doubt I'll ever track reading times) which is the case with support for RSS itself within various RSS readers today. The fact that various RSS readers will most likely support different subsets of the attention.xml format is one problem. There is also the fact that logging all this information may be cumbersome in certain cases which would also reduce how likely it is that all the information described in the spec will be recorded. Then the problem is what to do when clients speak different dialects of attention.xml. Are they expected to round trip? If I send Bloglines an attention.xml file with rating information even though it doesn't have that feature, should it track that information for the next time it is asked for my attention.xml by Newsgator which supports ratings?

Don't take this post to mean that I don't think something like attention.xml isn't necessary. As it stands now I want to increase the number of synchronization sources supported by RSS Bandit to include the Bloglines sync API and Newsgator Online synchronization but they use different web services. It looks like Technorati is proposing a third with attention.xml. I'd love for there to be some standardization in this area which would make my life as an aggregator author much easier. Client<->server synchronization of user subscriptions is something that users of information aggregators really would like to see (I get requests for this feature all the time) and it would be good to see some of the major players in this area get together with aggregator authors to see how we can make the ecosystem healthier and provide a better story for users all around.

I don't believe that attention.xml is a realistic solution to the problems facing aggregator authors and users of RSS readers. I just hope that some solution shows up soon as opposed to the current fragmentation that exists in the syndication market place.

Categories: Syndication Technology

December 22, 2004

@ 05:27 PM

The Value of RSS

It seems there's been some recent hubbub in the world of podcasting about how to attach multiple binary files to a single post in an RSS feed In a post entitled Multiple-enclosures on RSS items? Dave Winer weighs in on the issue. He writes

This question comes up from time to time, and I've resisted answering it directly, thinking that anyone who really read the spec would come to the conclusion that RSS allows zero or one enclosures per item, and no more. The same is true for all other sub-elements of item, except category, where multiple elements are explicitly allowed. The spec refers to "the enclosure" in the singular. Regardless, some people persist in thinking that you may have more than one enclosure per item.

Okay, let's play it out. So if I have more than one enclosure per item, how do I specify the publication date for each enclosure? How do I specify the title, author, a link to comments, a description perhaps, or a guid? The people who want multiple enclosures suggest schemes that are so complicated that they're reduced to hand-waving before they get to the spec, which I would love to read, if it could be written. Some times some things are just too hard to do. This is one of them.

And there's a reason why it's too hard. Because you're throwing out the value of RSS and then trying to figure out how to bring it back. There's no need for items any more, so you might as well get rid of them. At the top level of channel would be a series of enclosures, and then underneath each enclosure, all the meta-data. Voila, problem solved. Only what have you actually solved? You've just re-created RSS, but instead of calling the main elements "item" we now call them "enclosure".

The value of RSS is fairly self evident to me but it seems that given the amount of people who keep wanting to reinvent the wheel it may not be as clear to others. As someone who used to work on core XML technologies at Microsoft, the value of XML was obvious to me. It allowed developers to agree to use the same data format for information interchange which led to a proliferation of a wide and uniform set of tools for working with data formats. XML is not an optimal format for most of the tasks it is used for but it more than makes up for this with the plethora of tools and technologies that exist for processing XML.

My expectation about XML was always that the software industry would move on to agreeing on other higher level protocols built on XML for application information interchange. So I've always been frustrated to see many attempts by various parties, including the W3C with efforts such as XML 1.1 and binary XML, take us steps back by wanting to fragment the interoperability promise of XML.

RSS is a wonderful example of the higher level of interoperability that can be built upon XML formats. Instead of information sources using various incompatible mechanisms for providing information to end users such as NOAA's SOAP web service and the Microsoft.com web services which each require a separate custom application to consume them, sites can all standardize on RSS. This standardization creates an ecosystem of applications that produce and consume RSS feeds which is a lot larger than what would exist for each site specific web services or market specific XML syndication formats. Specifically, it allows for the evolution of the digital information hub where users can view data from the various information sources they care about (blogs, news, weather reports, etc) in their choice of applications.

Additionally, RSS is extensible. This means that even if the core elements and attributes do not satisfy all the requirements of a particular problem domain, then domain-specific information can be added to the feed. This allows for regular consumers of RSS to still be able to consume the content while domain specific applications can give users a richer experience. This is a much better solution for both content producers and consumers than coming up with domain specific applications.

As a user I want less formats not more. I want my email to come in my RSS aggregator, I want my favorite newsgroups to show up in my RSS aggregator, I'm tired of having a separate application for what is essentially the same kind of data. In fact, it seems Google agrees with me as evidenced by them exposing XML feeds for your GMail inbox and for USENET newsgroups via Google Groups. Unfortunately, if you have a plain old RSS reader, you can't view these feeds and instead have to find an aggregator that supports Atom 0.3. Two steps forward, one step back.

We need less data interchange formats not more. It is better for content producers, better for end users and better for developers of applications that use these formats. Existing problems in syndication should focus on how to make the existing formats work for us instead of inventing new formats.

Vive la RSS.

Categories: Syndication Technology | XML

December 18, 2004

@ 07:06 AM

Yahoo's Media RSS and Yet More Syndication Geeks Misunderstanding How XML Works

I saw a recent post from Dave Winer berating Yahoo! where he wrote

Yahoo is the strangest most jealous and behind-the-scenes plotting and scheming of tech companies. When any of the other "giants" moves in RSS space I get plenty of advance notice so that I can help them promote it, maybe even make it better before it's announced. Yahoo, as a company seems jealous and insecure, seems to have as a goal, replacing me. Hey it's been tried before, probably isn't worth the trouble. And it's amazing for all the lack of respect, how much of my (unpatented) work they're using to reshape their company. If I didn't know better I might think that someone inside the company is claiming credit for my work and doesn't want the boss to know. ";->"

I wasn't sure what this post was about so I did a little Googling and came upon a post on the atom-syntax mailing list entitled Yahoo and "Media RSS" which points out that Yahoo! has created a specification entitled "Media RSS" Specification Version .9 (DRAFT). I found it interesting that Yahoo! is throwing its weight behind a spec to replace the current mechanisms used for podcasting. I am not surprised that Dave Winer was irritated especially since some of the stuff in the spec seems extremely questionable (the media:people is a single element that can contain multiple people separated by the '|' character, attributes like playerWidth & playerHeight that are supposed to control how big the media player window used to consume content should be, etc).

However before getting deeper into the Yahoo! specification I stumbled on a post by Danny Ayers on the atom-syntax mailing list which expressed some confusion about how XML vocabularies are defined

Correct me if I'm wrong, but it looks a little broken:

<media:content url="http://www.foo.com/movie.mov"; fileSize="12216320"
type="video/quicktime"
playerUrl="http://www.foo.com/player?id=1111"; playerHeight="200"
playerWidth="400"
isDefault="true" expression="full" bitrate="128" duration="185">

The attributes aren't namespace-qualified, yet aren't defined in the
RSS 2.0 spec.

Danny Ayers seems to think that the absence of a namespace name on an attribute is equivalent to the attribute being in some 'empty' namespace along with other types that are in no namespace in that vocabulary. That is actually incorrect. The best documentation to put one straight on how to consider to elements and attributes in today's age of XML namespaces is the W3C XML Schema Primer. The XML Schema recommendation is the primary specification which describes how defining XML vocabularies in a namespace aware manner is supposed to work.

An attribute with an explicit namespace name (i.e. that has a prefix) is a global attribute which belongs to a particular vocabulary. There is only one declaration of an attribute with that name (namespace URI & local name pair) in the vocabulary. On the other hand, an attribute without a namespace name is scoped locally to the element it is declared on and is only defined in the context of that element. This means in a particular vocabulary multiple definitions of an attribute with a particular name can exist if it is un-namespaced since it is scoped locally to its owner element.

Since Danny Ayer's is the co-author of the upcoming book entitled Beginning RSS and Atom Programming I hope he does some more research on designing XML vocabularies before the book is published. A lot of the power of RSS is the ability authors have of defining their own vocabularies as RSS modules and I'd hate to see a new generation of RSS module designers inherit a bunch of bad habits because they read the wrong stuff in a book.

Categories: Syndication Technology

December 10, 2004

@ 04:21 PM

The Problem With RSS Readers Inspired By Outlook

When I first started working on RSS Bandit I wanted an application that looked and acted as much like Microsoft Outlook as possible. Two years and over a hundred thousand downloads latter I realize that there are a number of drawbacks to using this model for reading feeds [or any information for that matter]. Mike Torres describes some of these reasons in his post Why I dig Bloglines, he writes

Part of the problem for me is that applications that look and feel like Microsoft Outlook tend to make me feel like I am working, and I am immediately in "information overload" mode (we get hundreds of pieces of email each day at Microsoft.) Catching up with friends, reading Scripting.com, or checking out Engadget shouldn't be tedious. But for some reason, it was. Until I switched to Bloglines.
...
Anyway, here is what I like about Bloglines:
...

I can scan dozens of feeds in less than a minute. With NewsGator for Outlook and other Outlook-style interfaces, it just simply took longer. Probably because Bloglines shows me the feed in the way it is supposed to be presented - reverse chronological order on a single page. Not as individual messages that I have to click through.
...

This is the bane of the current information viewing model paradigm favored by email and newsgroup readers which many RSS aggregators have decided to inherit. The major problem is that the Outlook mail reading paradigm has a fundamental assumption which turns out to be flawed. It assumes you want to read every item you get in your inbox. This flawed assumption leads to the kind of information overload that hampers the productivity of lots of people I know at work. I've met several people who seem to always have hundreds unread items in their email inbox. For this reason I always have to learn who's easier to reach via IM or swinging by their office in person than sending them mail.

Most people I know get four classes of messages in their information aggregators (I am lumping reading email, reading news and reading RSS/Atom feeds into a single category). These are

1. notifications (checkin mails, comments to my blog, etc)
2. headlines (email newsletters, feeds from news sites, etc)
3. messages sent directly to me or that is similarly relevant
4. messages sent to an interest group I am a part of (XML-DEV mailing list, comp.text.xml newsgroup, etc)

The problem is that the typical Outlook inspired information aggregator treats all of the above as being of equal relevance. Even though Outlook does provide mechanisms for managing assigning relevance to incoming messages, they are either hard to find or cumbersome to use.

This is definitely one of the areas that needs to be improved in the world of information aggregators in general and RSS/Atom readers in particular. There are a number of features that I'm working on for the next version of RSS Bandit aimed at making it easier for people to consume information from various sources in a flexible manner according to what relevance they place on the information source.

Categories: RSS Bandit | Syndication Technology

November 26, 2004

@ 08:03 PM

What Happened to SIAM?

Several months ago I wrote a draft spec entitled Synchronization of Information Aggregators using Markup (SIAM) which was aimed at providing a lightweight mechanism for aggregators to synchronize state across multiple machines. There was a flurry of discussion about this between myself and a number of other aggregator authors on the [now defunct] information_aggregators mailing list.

However although there was some interest amongst aggregator authors, there wasn't much incentive to implement the functionality for a number of reasons. The reasons range from the fact that it makes it easier for users to migrate aggregators which payware aggregator authors aren't enthusiastic about to the fact that there was no server side infrastructure for supporting such functionality. Ideally this feature would have been supported by a web service end point exposed by a person's weblog or online aggregator. So not much came of it.

Since then I've implemented syncing in RSS Bandit in an application specific manner. So also have the authors of Shrook and NewsGator. There is also the Bloglines Sync API which provides a web service end point to doing limited syncing of feed state whose limitations I pointed out in my post Thoughts on the Bloglines Sync API.

This post is primarily meant to answer questions asked by Dmitry Jemerov, the creator of Syndirella who is now the dev lead for JetBrains' Omea Reader.

Categories: Syndication Technology

November 19, 2004

@ 09:40 AM

Some Thoughts on Adam Bosworth's ISCOC04 Talk

Adam Bosworth has posted his ISCOC04 talk on his weblog. The post is interesting although I disagreed with various bits and pieces of it. Below are some comments in response to various parts of his talk

On the one hand we have RSS 2.0 or Atom. The documents that are based on these formats are growing like a bay weed. Nobody really cares which one is used because they are largely interoperable. Both are essentially lists of links to content with interesting associated metadata. Both enable a model for capturing reputation, filtering, stand-off annotation, and so on. There was an abortive attempt to impose a rich abstract analytic formality on this community under the aegis of RDF and RSS 1.0. It failed. It failed because it was really too abstract, too formal, and altogether too hard to be useful to the shock troops just trying to get the job done. Instead RSS 2.0 and Atom have prevailed and are used these days to put together talk shows and play lists (podcasting) photo albums (Flickr), schedules for events, lists of interesting content, news, shopping specials, and so on. There is a killer app for it, Blogreaders/RSS Viewers.

Although it is clear that RSS 2.0 seems to be edging out RSS 1.0, I wouldn't say it has failed per se. I definitely wouldn't say it failed for being too formal and abstract. In my opinion it failed because it was more complex with no tangible benefit. This is the same reason XHTML has failed when compared to HTML. This doesn't necessarily mean that more rigid sysems will fail to take hold when compared to less rigid systems, if so we'd never have seen the shift from C to C++ then from C++ to C#/Java.

Secondly, it is clear It seems Adam is throwing out some Google spin here by trying to lump the nascent and currently in-progress Atom format in the same group as RSS 2.0. In fact, if not for Google jumping on the Atom bandwagon it would even be more of an intellectual curiousity than RSS 1.0.

As I said earlier, I remember listening many years ago to someone saying contemptuously that HTML would never succeed because it was so primitive. It succeeded, of course, precisely because it was so primitive. Today, I listen to the same people at the same companies say that XML over HTTP can never succeed because it is so primitive. Only with SOAP and SCHEMA and so on can it succeed. But the real magic in XML is that it is self-describing. The RDF guys never got this because they were looking for something that has never been delivered, namely universal truth. Saying that XML couldn't succeed because the semantics weren't known is like saying that Relational Databases couldn't succeed because the semantics weren't known or Text Search cannot succeed for the same reason. But there is a germ of truth in this assertion. It was and is hard to tell anything about the XML in a universal way. It is why Infopath has had to jump through so many contorted hoops to enable easy editing. By contrast, the RSS model is easy with an almost arbitrary set of known properties for an item in a list such as the name, the description, the link, and mime type and size if it is an enclosure. As with HTML, there is just enough information to be useful. Like HTML, it can be extended when necessary, but most people do it judiciously. Thus Blogreaders and aggregators can effortlessly show the content and understanding that the value is in the information. Oh yes, there is one other difference between Blogreaders and Infopath. They are free. They understand that the value is in the content, not the device.

Lots of stuff to agree with and disagree with here. Taking it from the top, the assertion that XML is self-describing is a myth. XML is a way to attach labels to islands of data, the labels are only useful if you know what they mean. Where XML shines is that one can start with a limited set of labels that are widely understood (title, link, description) but attach data with labels that are less likely to be understood (wfw:commentRss, annotate:reference, ent:cloud) without harming the system. My recent talk at XML 2004, Designing XML Formats: Versioning vs. Extensibility, was on the importance of this and how to bring this flexibility to the straitjacketed world of XML Schema.

I also wonder who the people are that claim that XML over HTTP will never succeed. XML over HTTP already has in a lot of settings. However I'd question that it is all you need. The richer the set of interactions allowed by the web site the more an API is needed. Google, Amazon and eBay all have XML-based APIs. Every major blogging tool has an XML-based API even though those same tools are using vanilla XML over HTTP for serving RSS feeds. XML over HTTP can succeed in a lot of settings but as the richness of the interaction between client and server grows so also does the need for a more powerful infrastructure.

The issue is knowing how to pick right tool for the job. You don't need the complexity of the entire WS-* stack to build a working system. I know a number of people at Microsoft realize that this message needs to get out more which is why you've begun to see things like Don Box's WS-Why Talk and the WS Kernel.

What has been new is information overload. Email long ago became a curse. Blogreaders only exacerbate the problem. I can't even imagine the video or audio equivalent because it will be so much harder to filter through. What will be new is people coming together to rate, to review, to discuss, to analyze, and to provide 100,000 Zagat's, models of trust for information, for goods, and for services. Who gives the best buzz cut in Flushing' We see it already in eBay. We see it in the importance of the number of deals and the ratings for people selling used books on Amazon. As I said in my blog, My mother never complains that she needs a better client for Amazon. Instead, her interest is in better community tools, better book lists, easier ways to see the book lists, more trust in the reviewers, librarian discussions since she is a librarian, and so on.
This is what will be new. In fact it already is. You want to see the future. Don't look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler' Will it be Attention.XML as Steve Gillmor and Dave Sifry hope' Or something else less formal and more organic' It doesn't matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn't going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.

I tend to agree with most of this although I'm unsure why he feels the need to knock Longhorn and Java. What he seems to be overlooking is that part of the information overload problem is the prevalance of poor data visualization and user interface metaphors for dealing with significant amounts of data. I know believe that one of the biggest mistakes I made in the initial design of RSS Bandit was modelling it after mail readers like Outlook even though I knew lots of people who had difficulty managing the flood of email they get using them. This is why the next version of RSS Bandit will borrow a leaf from FeedDemon along with some other tricks I have up my sleeve.

A lot of what I do in RSS Bandit is made easy due to the fact that it's built on the .NET Framework and not C++/MFC so I wouldn't be as quick to knock next generation GUI frameworks as Adam is. Of course, now that he works for a Web company the browser is king.

Categories: Syndication Technology | XML

November 9, 2004

@ 01:28 PM

The Blogging API Problem

A recent post entitled Finally, a Use for Atom by Charles Miller got me thinking about the usefulness or lack thereof of the IETF Atom effort. It seems I wasn't the only one who started thinking this given a mail thread started by Tim Bray on the atom-syntax list entitled Posted PaceDeclareVictoryOnFormat where he writes

To: Atom WG <atom-syntax@xxxxxxx>
Subject: Posted PaceDeclareVictoryOnFormat
From: Tim Bray <Tim.Bray@xxxxxxx>
Date: Mon, 08 Nov 2004 14:13:17 -0800

See http://www.intertwingly.net/wiki/pie/PaceDeclareVictoryOnFormat

The world can use Atom, sooner rather than later. The return-on-investment of further WG time invested in polishing something that's already pretty good is starting to be very unattractive. Particularly when the Protocol draft seriously needs work and progress.

Note that this has not been formally placed at the front of the queue yet. -Tim

I posted some comments to the thread that reflect the same opinions from my post Mr. Safe's Guide to the RSS vs. ATOM debate, the relationship between the Atom syndication format and RSS is the same as that of XHTML and HTML; geeks will like it but there's no real concrete reason to use it over the old stuff that already works pretty well for the most part.

However I also reiterated that I think the Atom API is a worthwhile addition to the world of blogging technologies. I listed the problems with the current crop of blog posting APIs such as the Blogger API and MetaWeblog API in my post What's Wrong with the MetaWeblog API? from a year and a half ago

Security: The MetaWeblog API has no concept of security. Passwords are sent in plaintext as parameters to XML-RPC functions (i.e. they are sent in plain text on the wire as part of the XML message).

Strongly Coupled To XML-RPC: RSS and Joe Gregorio's CommentAPI have shown that one can build applications that retrieve and send XML documents from client to server directly using HTTP GET and POST instead of going through an added layer of indirection by using exlicit RPC mechanisms.
...
I also believe that there should be the API should not just be tied to XML-RPC but should have interfaces that utilize the XML Web Services family of technologies like SOAP and WSDL not just XML-RPC. There are many products and toolkits that support SOAP/WSDL/etc plus more are being built every day. It makes little sense to me that almost everywhere else in the software industry people are either exchanging XML documents using RESTian principles (i.e. HTTP GET and POST) or the XML Web Services family of technologies but when it comes to web content related technologies there is this anachronism where an arbitrarily different methodology is used.

Limited Functionality: The MetaWeblog API only allows one to either post and edit blog entries, fetch information about a specific user or change the website template. This is a drop in the bucket considering all the things one would like to do with a weblog engine which can be supported by the engine.

As time has passed some of my opinions on this matter have mellowed. Security is a big problem and one I don't think can be ignored. The fact that existing APIs depend on XML-RPC instead of more accepted industry practices such as using RESTian principles or SOAP+WSDL isn't that great but it isn't that big a deal. The issue of limited functionality is probably something that has to be lived with since for the API to be widely adopted it has to support a lowest common denominator of features. As long as the API can be extended then the fact that there isn't some functionality in the core isn't that bad.

So for me, the high order bit is security. I can see at least two ways to solve this problem listed in order of least disruptive to most disruptive

Blog editing tools and blog vendors moving towards using XML-RPC over HTTPS/SSL or at least use digest HTTP authentication instead of HTTP.
Blog editing tools and blog vendors moving towards using the Atom API over HTTPS/SSL or at least use digest HTTP authentication instead of HTTP.

A number of blog hosting services such as Blogger/Google and SixApart have moved towards doing (2) above. However it is unclear to me how much this has been embraced by builders of popular blog editing tools such as BlogJet and w::bloggar. Looking at the list of Atom Enabled client software I only see aggregators listed but not blog editing tools.

So I was curious, are there any major blog editing tools that support the Atom API? If not, do these tools support using the Blogger/MetaWeblog API over HTTPS/SSL? If not, is there any interest in doing the former or the latter any time soon?

Categories: Syndication Technology

November 2, 2004

@ 02:35 PM

Podcasting, RSS 2.0 Enclosures and RSS Bandit

I've been watching the hype about podcasting with some wariness but it looks like it is here to stay. I just noticed that Greg Reinacker (NewsGator) and Nick Bradbury (FeedDemon) have announced that they will have better support for RSS 2.0 enclosures and thus podcasting. This weekend I also started the roots of getting podcasting support into RSS Bandit, Torsten will likely finish this work once he is done with the GUI for NNTP newsgroup support.

Speaking of podcasting and RSS 2.0 enclosures, I agree 100% with Joshua Allen's points in his post, History of Podcasting. He wrote

Dave Winer doesn't want to end up like Eric Bina, written out of the history of a creation he helped usher into reality. Adam steps up to make sure Dave gets credit. This time, there is less reason to worry. First, the WWW (which Eric helped enable) is now an independent and democratic public record which can triangulate the major media. And blogs, which Dave helped enable, are one source of that public record. The public record shows that Dave was planning “Radio” via RSS for a very long time. Dave has talked about these ideas for a long time, but I have to admit that I wasn't quite prepared for how fast it would actually happen. I believe credit goes to Adam for such a fast and effective bootstrap, but it also proves that all of the work on RSS laid a good foundation for quick incremental innovation.

I also think that one of the major success factors was that the nattering nabobs ignored podcasting and dismissed it until it was too late to inject their stop energy. Many of the nabobs were so convinced of their own stories about “RSS is broken”, that it never occured to them that something like podcasting could be successful. They were so busy trying to reinvent RSS that they ignored an idea that Dave has been giving away for free for years.

There's a lot of innovation and interesting end user applications that can be built on RSS today. However many XML syndication geeks are prideful and would rather reinvent the wheel than use existing technology to solve real world problems.

Categories: RSS Bandit | Syndication Technology

October 3, 2004

@ 06:38 PM

RSS Feeds That Suck: Sun Bloggers

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds that are often technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators. This is the second in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website.

This week's gem is the Sun Bloggers RSS feed. This RSS feed is a combined feed for all the blogs hosted at http://blogs.sun.com. This means that at any given time the feed most likely contains posts by multiple authors.

To highlight the problem with the feed I present the following two item elements taken from the feed a few minutes ago.

<item> <title>Something fishy...</title> <description>A king was very fond of fish products. He went fishing in the only river of his kingdom. While fishing he accidently dropped his diamond ring presented by his wife - The Queen. A fish in the river mistook the sparkling ring for an insect and swallowed it. The fisherman caught the fish and sold it to a chef. The King on the other side was very sad and apologistic. Took the Queen to a restaurant for a dinner and ordered a fried fish. The chef presented the same which had the diamond ring inside. King was happy to find the ring back and rewarded the restaurant. The restaurant rewarded the chef and the Chef rewarded the fisherman. The fisherman then went back to the river, killed all the fishes in search of another diamond ring. I never understood the motto of the story but there is certainly something fishy about it!</description> <category>General</category> <guid isPermaLink="true">http://blogs.sun.com/roller/page/ashish/20041002#something_fishy</guid> <pubDate>Sat, 2 Oct 2004 08:53:15 PDT</pubDate> </item> <item> <title>Another one bytes the dust...</title> <description>Well, more like another one got bitten. Accoring to <a href="http://www.heise.de/newsticker/meldung/51749">this</a> (german) article from <a href="http://www.heise.de">Heise</a> Mr. Gates got himself some Spyware on his personal/private systems, and has now decided to take things into his own hand (or at least into those of his many and skilled engineers). Bravo!<p> Spyware or other unwanted executables like e.g. <a href="http://securityresponse.symantec.com/avcenter/expanded_threats/dialers/">dialers</a> are puzzeling me for some time now, since I simply don't understand how those thinks can be kept legal at all. No one needs dialers. There are enough good ways for online payment. No one in their right mind can honestly belive, that anyone with a serious business would need any of that crap. It's a plain ripoff scheme.<p></description> <category>General</category> <guid isPermaLink="true">http://blogs.sun.com/roller/page/lars/20041002#another_one_bytes_the_dust</guid> <pubDate>Sat, 2 Oct 2004 07:32:18 PDT</pubDate> </item>

The problem with the feed is that even though the RSS 2.0 specification has a provision for an author element and the Dublin Core RSS module has the dc:creator element which can be used in its stead the Sun Bloggers RSS feed eschews directly identifying the author of the post in the feed.

The obvious benefits of identifying authors in collaborative feeds include enabling the reader to better determine whether the speaker is an authority on the topic at hand or begin to ascribe authority to the author if the reader was previously unaware of the author. Then there are aggregator specific benefits such as the fact that readers could then group or filter items in the feed based on the author thus improving their reading experience.

A solution to this problem is for the webmaster of the Sun Bloggers site to begin to use author elements to identify the authors of the various posts in the Sun Bloggers feed.

Categories: Syndication Technology

September 29, 2004

@ 08:33 AM

Thoughts on the Bloglines Sync API

The Bloglines press release entitled New Bloglines Web Services Selected by FeedDemon, NetNewsWire and Blogbot to Eliminate RSS Bandwidth Bottleneck has this interesting bit of news

Redwood City, Calif.--September 28, 2004 -- Three leading desktop news feed and blog aggregators announced today that they have implemented new open application programming interfaces (API) and Web Services from Bloglines (www.bloglines.com) that connect their applications to Bloglines' free online service for searching, subscribing, publishing and sharing news feeds, blogs and rich web content. FeedDemon (www.bradsoft.com), NetNewsWire (www.ranchero.com), and Blogbot (www.blogbot.com) are the first desktop software applications to use the open Bloglines Web Services.
Bloglines Web Services address a key issue facing the growing RSS market by reducing the bandwidth demands on sites serving syndicated news feeds. Now, instead of thousands of individual desktop PCs independently scanning news sources, blogs and web sites for updated feeds, Bloglines will make low-bandwidth requests to each site on behalf of the universe of subscribers and cache any updates to its master web database. Bloglines will then redistribute the latest content to all the individuals subscribed to those feeds via the linked desktop applications -- FeedDemon, NetNewsWire or Blogbot -- or via Bloglines' free web service.
...
Bloglines Web Services Enable Synchronization for Desktop News Aggregators "Our customers have been looking for the ability to synchronize their feed subscriptions across multiple computers," said Nick Bradbury, founder of Bradbury Software and creator of FeedDemon, the leading RSS aggregator for Windows. "By partnering with Bloglines, we are now able to offer the rich desktop functionality FeedDemon customers have come to expect, with the flexible mobility and portability of a web service."

There are two aspects of this press release I'm skeptical about. The first is that having desktop aggregators fetch feeds from Bloglines versus the original sources of the feeds somehow "eliminates the RSS bandwidth bottleneck". It seems to me that the Bloglines proposal does the opposite. Instead of thousands of desktop aggregators fetching tens of thousands to hundreds of thousands of feeds from as many websites instead it is proposed that they all ping the Bloglines server. This seems to be creating a bottleneck to me, not the other way around.

The second aspect of the proposal is that I call into question is the Bloglines Sync API. The information on this API is quite straightforward

The Bloglines Sync API is used to access subscription information and to retrieve blog entries. The API currently consists of the following functions:

listsubs - The listsubs function is used to retrieve subscription information for a given Bloglines account.
getitems - The getitems function is used to retrieve blog entries for a given subscription.

All calls use HTTP Basic authentication. The username is the email address of the Bloglines account, and the password is the same password used to access the account through the Bloglines web site.

I was interested in using this API to round out the existing feed synchronization support in RSS Bandit. In current versions a user can designate a file share, WebDAV server or FTP server as the central location for synchronizing multiple instances of RSS Bandit. I investigated what it would take to add Bloglines as a fourth synchronization point after reading the aforementioned press release and came to the conclusion that the API provided by Bloglines falls short of providing the functionality that exists in RSS Bandit today with the other synchronization sources.

The problems with the Bloglines Sync API include

The Bloglines Sync API only allows clients to retrieve the subscribed feeds. The user has to login to the Bloglines site to perform feed management tasks like adding, deleting or modifying the feeds to which they they are subscribed.
No granular mechanism to get or set the read/unread state of the items in the users feed list.

These limitations don't make using the Bloglines Sync API a terribly useful way for synchronizing between two desktop aggregators. Instead, it primarily acts as a way for Bloglines to use various desktop aggregators as a UI for viewing a user's Bloglines subscriptions without the Bloglines team having to build a rich client application.

Thanks, but I think I'm going to pass.

Categories: RSS Bandit | Syndication Technology

September 26, 2004

@ 07:10 PM

RSS Feeds That Suck: Cafe con Leche

As an author of a news reader that supports RSS and Atom, I often have to deal with feeds designed by the class of people Mark Pilgrim described in his post Why specs matter as assholes. These are people who

read specs with a fine-toothed comb, looking for loopholes, oversights, or simple typos. Then they write code that is meticulously spec-compliant, but useless. If someone yells at them for writing useless software, they smugly point to the sentence in the spec that clearly spells out how their horribly broken software is technically correct

This is the first in a series of posts highlighting such feeds as an example to others on how not to design syndication feeds for a website. Feeds in these series will often be technically valid RSS/Atom feeds but for one or more reasons cause unnecessary inconvenience to authors and users of news aggregators.

This week's gem is the Cafe con Leche RSS feed. Instead of pointing out what is wrong with this feed myself I'll let the author of the feed do so himself. On September 24th Elliotte Rusty Harold wrote

I've been spending a lot of time reviewing RSS readers lately, and overall they're a pretty poor lot. Latest example. Yesterday's Cafe con Leche feed contained this completely legal title element:
<title>I'm very pleased to announce the publication of XML in a Nutshell, 3rd edition by myself and W.
          Scott Means, soon to be arriving at a fine bookseller near you.
          </title>
Note the line break in the middle of the title content. This confused at least two RSS readers even though there's nothing wrong with it according to the RSS 0.92 spec. Other features from my RSS feeds that have caused problems in the past include long titles, a single URL that points to several stories, and not including more than one day's worth of news in a feed.

Elliote is technically right, none of the RSS specs says that the <link> element in an RSS feed should be unique for each item so he can reuse the same link for multiple items and still have a valid RSS feed. So why does this cause problems for RSS aggregators?

Consider the following RSS feed

<rss version="0.92"> <channel> <title>Example RSS feed</title> <link>http://www.example.com</link> <description>This feed contains an example of how not to design an RSS feed</description> <item> <title>I am item 1</title> <link>http://www.example.com/rssitem</link> </item> <item> <title>I am item 2</title> <link>http://www.example.com/rssitem</link> </item> </channel> </rss>

Now consider the same feed fetched a few hours later

<rss version="0.92"> <channel> <title>Example RSS feed</title> <link>http://www.example.com</link> <description>This feed contains an example of how not to design an RSS feed</description> <item> <title>I am item one</title> <link>http://www.example.com/rssitem</link> </item> <item> <title>I am item 3</title> <link>http://www.example.com/rssitem</link> </item> <title>I am item 2</title> <link>http://www.example.com/rssitem</link> </item> </channel> </rss>

Now how does the RSS aggregator tell whether the item with the title "I am item 1" is the same as the one named "I am item one" with a typo in the title fixed or a different one? The simple answer is that it can't. A naive hack is to look at the content of the <description> element to see if it is the same but what happens when a typo was fixed or some update to the content of the <description>?

Every RSS aggregator has some sort of hack to deal with this problem. I describe them as hacks because there is no way that an aggregator can 100% accurately determine when items with the same link and no guid are the same item with content changed or different items. This means the behavior of different aggregators with feeds such as the Cafe con Leche RSS feed is extremely inconsistent.

A solution to this problem is for Elliotte Rusty Harrold to upgrade his RSS feed to RSS 2.0 and use guid elements to distinctly identify items.

Categories: Syndication Technology

September 24, 2004

@ 10:38 AM

News Aggregators As Denial of Service Clients (part 2)

In her post Blog Activity Julia Lerman writes

There must be a few people who have their aggregators set to check rss feeds every 10 seconds or something. I very rarely look at my stats because they don't really tell me much. But I have to say I was a little surprised to see that there were over 14,000 hits to my website today (from 12am to almost 5pm).

So where do they come from?

10,000+ are from NewzCrawler then a whole lot of other aggregators and then a small # of browsers.

This problem is due to the phenomenon originally pointed out by Phil Ringnalda in his post What's Newzcrawler Doing? and expounded on by me in my post News Aggregators As Denial of Service Clients. Basically

According to the answer on the NewzCrawler support forums when NewzCrawler updates the channel supporting wfw:commentRss it first updates the main feed and then it updates comment feeds. Repeatedly downloading the RSS feed for the comments to each entry in my blog when the user hasn't requested them is unnecessary and quite frankly wasteful.

Recently I upgraded my web server to using Windows 2003 Server due to having problems with a limitation on the number of outgoing connections using Windows XP. Recently I noticed that my web server was still getting overloaded with requests during hours of peak traffic. Checking my server logs I found out that another aggregator, Sauce Reader, has joined Newzcrawler in its extremely rude bandwidth hogging behavior. This is compounded by the fact that the weblog software I use, dasBlog, does not support HTTP Conditional GET for comments feeds so I'm serving dozens of XML files to each user of Newzcrawler and SauceReader subscribed to my RSS feed every hour.

I'm really irritated at this behavior and have considered banning Sauce Reader & Newzcrawler from fetching RSS feeds on my blog due to the fact that they significantly contribute to bringing down my site on weekday mornings when people first fire up their aggregators at work or at home. Instead, I'll probably end up patching my local install of dasBlog to support HTTP conditional GET for comments feeds when I get some free time. In the meantime I've tweaked some options in IIS that should reduce the amount of times the server is inaccessible due to being flooded with HTTP requests.

This doesn't mean I think this feature of the aforementioned aggregators is something that should be encouraged. I just don't want to punish readers of my blog because of decisions made by the authors of their news reading software.

Categories: Syndication Technology

September 12, 2004

@ 02:17 AM

The RSS Sky is Falling...Again

In his post Full text RSS on MSDN gets turned off Robert Scoble writes

Steve Maine: what the hell happened to blogs.msdn.com?

RSS is broken, is what happened. It's not scalable when 10s of thousands of people start subscribing to thousands of separate RSS feeds and start pulling down those feeds every few minutes (default aggregator behavior is to pull down a feed every hour).

Bandwidth usage was growing faster than MSDN's ability to pay for, or keep up with, the bandwidth. Terrabytes of bandwidth were being used up by RSS.

So, they are trying to attack the problem by making the feeds lighter weight. I don't like the solution (I've unsubscribed from almost all weblogs.asp.net feeds because they no longer provide full text) but I understand the issues.

This is becoming a broken record. Every couple of months some web site that hasn't properly prepared for the amount of bandwidth consumed by having a popular RSS feed loudly complains and the usual suspects complain that RSS is broken. This time the culprit is Weblogs @ ASP.NET and their mistake was not providing HTTP compression to clients speaking HTTP 1.0. This meant that they couldn't get the benefits of HTTP compression when talking to popular aggregators like Straw, FeedDemon, SharpReader, NewsGator and RSS Bandit. No wonder their bandwidth usage was so high.

But lets ignore that the site wasn't properly configured enough to utilize all the bandwidth saving capabilities of HTTP. Instead lets assume Weblogs @ ASP.NET had done all the right things but was still getting to much bandwidth being consumed. Mark Nottingham covered this ground in his post The Syndication Sky is Falling!

A few people got together in NYC to talk about Atom going to the W3C this morning. One part of the minutes of this discussion raised my eyebrows a fair amount;

sr: […] Lots of people are saying RSS won’t scale. Somebody is going to say I told you so.
bw: Werner Vogels at Cornell has charted it out. We're at the knee of the curve. I don’t think we have 2 years.
sr: I have had major media people who say, until you solve this, I’m not in.
bw: However good the spec is, unless we deal with the bag issues, it won’t matter. There are fundamental flaws in the current architecture.

Fundamental flaws? Wow, I guess I should remind the folks at Google, Yahoo, CNN and my old colleagues at Akamai that what they’re doing is fundamentally flawed; the Web doesn’t scale, sorry. I guess I’ll also have to tell the people at the Web caching workshops that what they do is futile, and those folks doing Web metrics are wasting their time. What a shame...

Bad Reasons to Change the Web Architecture

But wait, there’s more. "Media people" want to have their cake and eat it too. It’s not good enough that they’re getting an exciting, new and viable (as compared to e-mail) channel to eyeballs; they also have to throw their weight around to reduce their costs with a magic wand. What a horrible reason to foist new protocols, new software, and added complexity upon the world.

The amusing new wrinkle is that every body's favorite leader of the "RSS is broken let's start all over" crowd, Sam Ruby, has decided it is time to replace blogs pinging weblogs.com when they update and using HTTP to fetch RSS feeds. Hopefully, this will be more successful than his previous attempts to replace RSS and the various blogging APIs with Atom. It's been over a year and all we have to show from the creation of Atom is yet another crufty syndication format with the promise of one more incompatible one on the way.

Anyway, the point is that RSS isn't broken. After all, it is just an XML file format. If anything is broken it is using HTTP for fetching RSS feeds. But then again do you see people complaining every time some poor web site suffers the effects of the Slashdot Effect about how HTTP is broken and needs to be replaced? If you are running a popular web site, you will need to spend money to afford the traffic. AOL.com, Ebay.com and Microsoft.com are all serving terrabytes of content each month. If they were serving these with the same budget that I have for serving my website these sites would roll over and die. Does this mean we should replace using web browsers and HTTP for browsing the Web and resort to using BitTorrent for fetching HTML pages? It definitely would reduce the bandwidth costs of sites like AOL.com, Ebay.com and Microsoft.com.

The folks paying for the bandwidth that hosts Weblogs @ ASP.NET (the ASP.NET team not MSDN as Scoble incorrectly surmises) decided they had reached their limits and reduced the content of the feeds. It's basically a non-story. The only point of interest is that if they had announced this with enough warning internally folks wuld have advised them to turn on HTTP compression for HTTP 1.0 clients before resorting to crippling the RSS feeds. Act in haste, repent at leisure.

Categories: Syndication Technology

July 21, 2004

@ 07:15 AM

The RSS Traffic and DDoS Meme

It seems every 3 months some prominent online publication complains about the amount of traffic RSS news readers cause to websites that provide RSS feeds. This time it is Slashdot with their post When RSS Traffic Looks Like a DDoS which references a post by Chad Dickerson, the CTO of Infoworld, entitled RSS growing pains. Chad writes

Several months ago, I spoke to a Web architect at a large media site and asked why his site didn’t support RSS. He raised the concern that thousands (or even millions) of dumb clients could wreak havoc on a popular Web site. Back when I was at CNN.com, I recall that our servers got needlessly pounded by a dumb client (IE4) requesting RSS-like CDF files at frequent intervals regardless of whether they had changed. As the popularity of RSS feeds at InfoWorld started to surge, I began to notice that most of the RSS clients out there requested and downloaded our feeds regardless of whether the feeds themselves had changed. At the time, we hadn’t quite reached the RSS tipping point, so I filed these thoughts away for later -- but “later” came sooner than I thought.

At this point I'd like to note that HTTP provides two mechanisms for web servers to tell clients if a network resource has changed or not. The basics of this mechanism is explained in the blog post HTTP Conditional Get for RSS Hackers which provides a way to prevent clients such as news readers from repeatedly downloading a Web document if it hasn't been updated. At this point I'd like to point out that at the current time, the InfoWorld RSS feed supports neither.

Another technique for reducing bandwidth consumption by HTTP clients is to use HTTP compression which greatly reduces the amount of data that has to be sent to a client when the feed has to be downloaded. For example, the current InfoWorld feed is 7427 bytes which shrinks to 2551 bytes when zipped using GZip on my home machine. This is a reduction by a factor of 3, on larger files the ratio of the reduced size to the original size is even better. Again, InfoWorld doesn't support this technique for reducing bandwidth consumption.

It is unsurprising that they are seeing significant bandwidth consumption from news aggregators. An RSS reader polling the InfoWorld site once an hour over an 8 hour period would download about 60 kilobytes of XML, on the other hand if they supported HTTP conditional GET requests and HTTP compression via GZip encoding this number would be under 3 kilobytes.

The one thing that HTTP doesn't provide is a way for clients to deal with numerous connections being made to the site at once. However this problem isn't much different from the traditional scaling problem that web sites have to deal with today when they get a lot of traffic from regular readers.

Categories: Syndication Technology

June 23, 2004

@ 02:25 PM

Properly escaping HTML in RSS description elements

Dave Winer recently wrote

We added a link to a page of encoding examples for descriptions, under Elements of <item>. The change is also noted on the Change Notes page.

I was one of the people that gave feedback on making this clarification to the RSS 2.0 specification and I'm glad it's made it in. Funny enough, not even a week goes before I've had a need to forward the link to an RSS feed producer explaining how to properly escape the content in <description> elements. In this case it was the Microsoft Research RSS feeds. Its pretty clear that this clarification was needed if the folks at MSR didn't get it right the first time they took a shot at it.

Categories: Syndication Technology

June 5, 2004

@ 06:34 AM

Common Misconceptions About RSS

I find it interesting how often developers tend to reinvent because of looking at a problem from only one perspective. Today I read a blog post by Sean Gephardt called RSS and syndication Ideas? where he repeats two common misconception about RSS and syndication technologies. He wrote

What if I only want certain folks to has access to my RSS?

I could require the end user to signin to my site, then provide them access to my RSS feeds, but then they would be required to sign in everytime they tried to update thier view.

More specifically, how could a company track people that have subscribed to a particular RSS feed once they are viewing it in an aggregator? Obviously, if someone actually views the page referenced, then web site tracking applies, but some aggregators I've seen simply render the contents of the description, which if it contains a URL to somewhere, and the user clicks that link, the reader gets taken over to that URL, bypassing the orignal.

Since there is no security around RSS and aggregrators, and no way to prompt users for say, a Passport authentication, should RSS be used only for "public" information? Do you make people sign in once they try to access the “deeper” content? Do you keep the RSS content limited to help drive people to the “real“ content?

Am I missing something glaringly obvious?

Considering that fetching an RSS feed is simply fetching an XML document over the Web using HTTP and there are existing technologies for authenticating and encrypting HTTP requests, I'd have to say "Yes, you have missed something glaringly obvious Sean". In fact, not only can you authenticate and encrypt RSS feeds with the same authentication means used by the rest of the World Wide Web, aggregators like RSS Bandit already support this functionality. In fact, here is a list of aggregators that support private RSS feeds.

As for how to how to track readership of content in RSS feeds. A number of tools already support tracking such statistics using web bugs such as dasBlog and .TEXT. One could also utilize alternate approaches if the feeds are private feeds since one could assign a separate URL to each user.

All of this is stuff that already works on today's World Wide Web when interacting with HTML and HTTP. It is interesting that some people think that once you swap out HTML with XML, entire new approaches must be built from the ground up.

Categories: Syndication Technology

May 31, 2004

@ 03:07 AM