Real-time, Distributed Conversations: Some Thoughts on the Salmon Protocol

November 2, 2009

@ 02:50 PM

Last week John Panzer, who works on Blogger at Google, wrote about some of the work he’s been doing on creating a protocol for syndicating comments associated with activity streams in his post The Salmon Protocol: Introducing the Salmon Project. Key parts of his post are excerpted below

A few days ago, at the Real Time Web Summit, we had a session about Salmon, a protocol for re-aggregated distributed conversations around web content. I was hoping for some feedback and to generate some interest, and I was overwhelmed by the positive reactions, especially after Louis Gray's post "Proposed Salmon Protocol aims to unify Conversations on the Web". Adina Levin's "Salmon - Re-assembling distributed conversations" is a good, insightful review as well. There's clearly a great deal of interest in this, and so I've gone ahead and expanded Salmon's home at salmon-protocol.org with an open source project, salmon-protocol.googlecode.com, and a mailing list, groups.google.com/group/salmon-protocol.

Louis Gray’s post on the topic includes an embedded presentation which captures the essence of the protocol

Before talking about the technical details of the protocol it is a good idea to understand the end user problem the protocol solves. For me, it solves a problem I have in the way that RSS Bandit integrates with Facebook. The problem is that although there is a way to get regular updates on changes to the user’s news feed by polling Facebook’s stream and getting data back in the Activity Stream format there isn’t a mechanism today to get updates on the comments on items in the feed. What it means in practice today is that once an item rolls off of the news feed, there is no way to keep the comments up to date in RSS Bandit.

The Salmon Protocol aims to address this problem by piggybacking on PubSubHubBub as a way for applications to get real-time updates on comments on items in an activity stream not just updates on new activities.

There have also been several mentions of Salmon being a way to aggregate distributed conversations on an item (e.g. this blog post is syndicated to FriendFeed and there are comments there as well as in the comments on my blog) but I am less clear on those scenarios or whether Salmon is enough to solve the various tough problems that need to be solved to make that work end to end.

Any API for posting comments to a site needs to solve two problems; identity and dealing with comment spam. I decided to take a look at the Salmon Protocol Summary to see how it addresses these problems.

The meat of the Salmon Protocol format is excerpted below

A source provides an RSS/Atom feed of content. It includes a Salmon link in its feed:

<link rel="salmon" href="http://example.org/salmon-endpoint"/>

An aggregator reads the feed (ideally via a push mechanism such as PubSubHubbub), and sees from the link that it is Salmon-enabled. It remembers the endpoint URL for later use.

When an aggregator's user leaves a comment on a feed item, the aggregator stores the comment as usual, and then also POSTs a salmon version of it to the source's Salmon endpoint:

POST /salmon-endpoint HTTP/1.1

Host: example.org

Content-Type: application/atom+xml

<?xml version='1.0' encoding='UTF-8'?>

    <entry xmlns='http://www.w3.org/2005/Atom'>

    <author>

      <name>John Doe</name>

      <uri>acct:johndoe@aggregator-example.com</uri>

    </author>

    <content>Yes, but what about the llamas?</content>

    <id>tag:aggregator-example.com,2009:cmt-441071406174557701</id>

    <updated>2009-09-28T18:30:02Z</updated>

    <thr:in-reply-to xmlns:thr='http://purl.org/syndication/thread/1.0'

       ref='tag:example.org,1999:id-22717401685551851865'/>

    <sal:signature xmlns:sal='http://salmonprotocol.org/ns/1.0'>

        e55bee08b4c643bc8aedf122f606f804269b7bc7

    </sal:signature>

    <title/>

</entry>

The commenter is identified in the published comment using the atom:uri element. How this author is authenticated in situations outside of public comments on a blog such as RSS Bandit posting a comment to Facebook on my behalf isn’t really discussed. I noticed an offhand reference to OAuth headers which seems to imply that the publishing application should also be sending authentication headers as well when publishing the comment. How these authentication headers would flow through the systems involved is unclear to me especially given the approach Salmon has taken to deal with spam prevention.

The workflow for dealing with spam comments is described as follows

A major concern with this type of distributed protocol is how to prevent spam and abuse. Salmon provides building blocks to allow in-depth defense against attacks. Specifically, every salmon has a verifiable author and user agent. The basic security flow when salmon swims upstream looks like this:

aggregator-example.com: "Here is a salmon, authored and signed by 'acct:johndoe@aggregator-example.com'; please accept it."

Recipient: "I know that this is really aggregator-example.com due to its OAuth headers, and it has a good reputatation, but I do not trust it completely; I will do a double check."

Recipient: Uses Webfinger/XRD to discover salmon validation service for acct:johndoe@aggregator-example.com, which turns out to be hosted by aggregator-example.com.

Recipient: "Given that johndoe has delegated Salmon validation to aggregator-example, and I know I'm talking to aggregator-example already, I'll skip the actual check." (Returns HTTP 200 to aggregator-example.com)

The flow can get more complicated, especially if the aggregator is not also providing identity services for the user. In the most general case, the recipient needs to take the salmon, discover a salmon validator service for the author via XRD discovery on the author's URI, and POST the salmon to the validator service. The validator service does an integrity / signature check against the salmon and returns 200 if the salmon checks out, 400 if not. The signature check means that the given author (johndoe in this case) signed the salmon with the given id, parent id, and timestamp. It does not attempt to do a full, XML-DSig style verification, though such a service is another reasonable extension.

This flow seems weird and it is unclear to me that it actually solves the problems involved in distributed commenting. So let’s say I post a comment to Facebook from RSS Bandit, in step 3 above they are now supposed to use WebFinger to lookup my email address provider and determine which service I use for digitally signing comments. Then they ask it if the comment looks like it was from me.

Hmm, this looks like a user authentication workflow in disguise as a comment validation workflow. Shouldn’t the service receiving the comment (i.e. Facebook) in the example above be responsible for validating my identity not some third party service? Maybe this protocol wasn’t meant for sites like Facebook?

Let’s say this protocol is really meant for situations when the comment recipient doesn’t intend to be the sole identity provider such as commenting on Robert Scoble's blog where he allows comments from anyone with just an email address and an optional web page URL as identifiers. So each commenter needs to provide an email address on an email service provider that supports WebFinger and validates digital signatures in the specific situation related to the Salmon protocol? Sounds like boiling the ocean. I wonder why this can’t work with OpenID validation or some other authentication protocol that has already been validated by developers and is seeing some adoption?

At the end of the day, I think the problem Salmon attempts to solve is one that needs solving as activity streams become a more popular and intrinsic feature across the Web. However in its current form it’s hard for me to see how it actually solves the real problems that exist today in a practical way.

Of course, this may just be my misunderstanding of the protocol documents currently published and I look forward to being corrected by one of the protocol gurus if that is the case.

Note Now Playing: Chris Brown - I Can Transform Ya (feat. Lil Wayne) Note

Categories: Social Software | Syndication Technology

« Keeping the Stream Pure: FriendFeed vs. ... | Home | New MSN Homepage with Activity Streams f... »

Monday, 02 November 2009 22:05:17 (GMT Standard Time, UTC+00:00)

Yeah, I think they went overboard with identity/spam handling. IMO the natural place to deal with those issues is at the point-of-entry (POE), site or client through which comment is submitted. POE injects what it knows of the commenter, signing where it needs to, then forward the tagged salmon up the stream. As it moves up stream, further tags are added as needed to reduce scope of validation at destination.

Don Park

Tuesday, 03 November 2009 17:53:37 (GMT Standard Time, UTC+00:00)

Dare, thanks for the post. I'll re-read this more carefully because I think you're raising good points that deserve in-depth consideration and replies. Note: I've tried to bang existing specs together to make them work for all of the requirements we have, and have concluded that at least some invention is needed. If not, nobody would be happier then I!

John Panzer

Comments are closed.

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Real-time, Distributed Conversations: Some Thoughts on the Salmon Protocol - Dare Obasanjo's weblog