I've just finished the first draft of a specification for Synchronization of Information Aggregators using Markup (SIAM) which is the result of a couple of weeks of discussion between myself and a number of others authors of news aggregators. From the introduction

A common problem for users of desktop information aggregators is that there is currently no way to synchronize the state of information aggregators used on different machines in the same way that can be done with email clients today. The most common occurence of this is a user that uses a information aggregator at home and at work or at school who'd like to keep the state of each aggregator synchronized independent of whether the same aggregator is used on both machines.

The purpose of this specification is to define an XML format that can be used to describe the state of a information aggregator which can then be used to synchronize another information aggregator instance to the same state. The "state" of information aggregator includes information such as which feeds are currently subscribed to by the user and which news items have been read by the user.

This specification assumes that a information aggregator is software that consumes an XML syndication feed in one of the following formats; ATOM, [RSS0.91], [RSS1.0] or [RSS2.0]. If more syndication formats gain prominence then this specification will be updated to take them into account.

This final draft owes a lot of its polish to comments from Luke Hutteman (author of SharpReader), Brent Simmons (author of NetNewsWire) and Kevin Hemenway aka Morbus Iff (author of AmphetaDesk ). There are no implementations out there yet although once enough feedback has been gathered about the current spec I'll definitely add this to RSS Bandit and deprecate the existing mechanisms for subscription harmonization.

Brent Simmons has a post which highlights some of the the various issues that came up in our discussions entitled The challenges of synching.


Categories: Technology | XML
Tracked by:
http://www.rendelmann.info/blog/PermaLink.aspx?guid=f971b2a4-2488-479f-93a0-6005... [Pingback]
http://jawf5j3.net/storage/sitemap1.html [Pingback]
http://viezbaq.net/clock/index.html [Pingback]
http://weujmru.net/storage/sitemap1.html [Pingback]
http://vy3i7wz.net/clock/index.html [Pingback]
http://fwmwly7.net/alabama/sitemap1.html [Pingback]
http://otjjblj.net/04/index.html [Pingback]
http://restablog.dreamhosters.com/toys/sitemap1.html [Pingback]
http://restablog.dreamhosters.com/electronics/sitemap1.html [Pingback]
http://jszsfc9.net/01/sitemap1.html [Pingback]
http://jszsfc9.net/00/sitemap1.html [Pingback]
http://kiva.startlogic.com/sitemap1.html [Pingback]
http://bombaylogger.web.aplus.net/02/index.html [Pingback]
http://tulanka.readyhosting.com/bmw/sitemap1.php [Pingback]
http://tkru7ln.net/sitemap1.html [Pingback]
http://restablog.dreamhosters.com/garage/sitemap1.html [Pingback]
http://host239.hostmonster.com/~blogford/sitemap2.html [Pingback]
http://host239.hostmonster.com/~blogford/sitemap4.html [Pingback]
http://fastblog.sc101.info/auction/sitemap1.html [Pingback]
http://gator413.hostgator.com/~digital/dental/sitemap1.html [Pingback]
http://umatutman.com/events/sitemap1.html [Pingback]
http://lyxlyiy.net/furniture/index.html [Pingback]
http://gh9kwkn.net/blogger/sitemap1.php [Pingback]
http://gh9kwkn.net/youporn/sitemap1.php [Pingback]
http://zvbvids.net/sitemap1.html [Pingback]
http://qyaq5qm.net/online/sitemap1.php [Pingback]
http://box432.bluehost.com/~zbloginf/sitemap1.html [Pingback]
http://d579737.u108.floridaserver.com/sitemap2.html [Pingback]
http://gator442.hostgator.com/~hockteam/windows/sitemap1.html [Pingback]
http://gator442.hostgator.com/~hockteam/games/sitemap1.html [Pingback]

Tuesday, 06 January 2004 05:24:20 (GMT Standard Time, UTC+00:00)
Um.. I grok the problem you're trying to address, but why can't this be done using Atom? It would need just a few relatively minor tweaks, maybe a couple of new element types, but that's about all. I don't see the need for a brand new format.
Tuesday, 06 January 2004 06:14:10 (GMT Standard Time, UTC+00:00)
It doesn't use ATOM for the same reason it doesn't use RSS or the MetaWeblog API, they are solving different problems.
Tuesday, 06 January 2004 07:42:55 (GMT Standard Time, UTC+00:00)
First, THANK YOU! I've been waiting for something like this for awhile.

- The title hash must be of the title encoded in UTF-8. Does that include the UTF-8 marker (U+FEFF)?
- Can the preferred synchronization endpoint URI be included in the schema? This makes sense to me: I could reconfigure my aggregator completely with a single file (or perhaps this file plus an OMPL file).
- Speaking of which, would extensions to OPML be a better solution? Obviously it isn't usually used on an item-level.
- I suggest that there be at least one more built-in "status" value for items: "marked" or "flagged". Most aggregators have ways of flagging items for preservation or call-out, synching that state I think is important for those who use those features.
Tuesday, 06 January 2004 13:52:35 (GMT Standard Time, UTC+00:00)
- titleHash doesn't include the UTF-8 marker.
- this is a nice idea but I'm wary of putting such information in the format for the cases where a user may have multiple synchronization end points. I'll see what the other aggregator authors think.
- we considered and discarded extensions to OPML. We would have had to extend OPML significantly and in fact I wrote a draft spec which used extensions to OPML but we felt was trying to fit a square peg in a round hole.
- the problem with "flagged" is that it is orthogonal to "read" and "unread". In which case the value of that attribute starts looking like it should be a list.
Tuesday, 06 January 2004 18:39:03 (GMT Standard Time, UTC+00:00)
While it may seem obvious, you may want to explicitly state how i tell if two items refer to the same thing, for example it can't just be a straight string match on the URI's as the host name part of the URI is case insensitive. [also as one of the synderilla dev's how do i join the mailing list, i tried but got denied]
Tuesday, 06 January 2004 19:19:03 (GMT Standard Time, UTC+00:00)
To figure out if two items are identical you compare the values of all their attributes and they must have the same attributes. Considering that there is no normative way to compare 2 URIs (see http://www.textuality.com/tag/uri-comp-4 for details) I don't think anything beyond a string match will end up being specified. I'll explicitly specify this in the spec.

The informal process for getting invited to join the mailing list is for someone to suggest an invitee and for another to concur with the suggested invitee. I'm leery of having non-developers of news aggregators on the list or people who're merely "interested" in aggregator technology as opposed to actual developers. From a little bit of Googling it doesn't seem that there is anything that ties you to Syndirella development besides the fact that you've made some changes to a personal copy.
Tuesday, 06 January 2004 23:34:41 (GMT Standard Time, UTC+00:00)
See http://sourceforge.net/project/memberlist.php?group_id=81544 which is the developers list for Syndirella on sourceforce.
Wednesday, 07 January 2004 11:13:40 (GMT Standard Time, UTC+00:00)
Hello! I've found about this from Luke Huttman's blog, and coulndn't believe it.
The RFC seems very cool to me, I'm going to study it more deeply as soon as I have time.

Your idea is interesting to me because I'm writing a site/application in .NET to share and synch one's OPMLs.
Now I'm going to redesign it, in the wake of SIAM, even if it's not clear to me if it will support the definitive version of SIAM or will be totally rewritten to be BASED ON it.
I think it will depend mainly on the spreading of the technology among aggregators.

I'll keep checking the spec for advance, and will notify you if and when the OPML share site will be operational.

please do not hesitate to drop me a mail for whatever info you may need.
Wednesday, 07 January 2004 21:17:52 (GMT Standard Time, UTC+00:00)
It seems to me that some parts of this specification could be repurposed to allow a blog to list the current entries in a more compact manner than normal RDF. This would allow an aware aggregator to check the compact list and compare that to what was already brought down (to see if there is anything new) before downloading a more bandwidth heavy RSS or Atom feed. A simple extension to this specification would allow an aware aggregator to *only* download the new/changed items instead of a 'full' feed.

Considering the issues of the bandwidth costs of aggregation that have been discussed lately, this might be a good thing to work into the specification.
Jack William Bell
Wednesday, 07 January 2004 22:01:46 (GMT Standard Time, UTC+00:00)
I had no idea this RFC existed. I guess obscurity was your strategy? Hehe. I'll respond by immediately adding a synching capability to feeds.scripting.com. I'll do it tonight. Let's have fun!!
Dave Winer
Wednesday, 07 January 2004 23:47:50 (GMT Standard Time, UTC+00:00)
Further note: Categories in this context make no sense. You might have a feed in a different category in your work aggregator instance than you do in your home aggregator instance. All you really need to know (in order to transfer the information on what has been read between two different aggregators) is the feed and item descriptors.
Jack William Bell
Thursday, 08 January 2004 22:13:42 (GMT Standard Time, UTC+00:00)
Jack: the purpose of SIAM is not just to synchronize the read/unread status of your items, but also to synchronize your subscriptions list. If you add a new feed in your aggregator at work and subsequently synch it at home, your home-aggregator should show this new feed in the same category you put it in at work.
Thursday, 08 January 2004 23:43:29 (GMT Standard Time, UTC+00:00)
OK, I'll buy that SIAM can also synchronize a subscription list. But...

* First there is an existing 'standard' for this; OPML. It ain't perfect, but it is supported by lots of aggregation tools. Synchronizing read/unread status is something for which no current standard exists.

* Second (and this is also a problem with OPML, so perhaps this should be kicked up to the application layer to deal with), how about situations where you are already subscribed to a feed under a different category?
Jack William Bell
Thursday, 08 January 2004 23:43:45 (GMT Standard Time, UTC+00:00)
OK, I'll buy that SIAM can also synchronize a subscription list. But...

* First there is an existing 'standard' for this; OPML. It ain't perfect, but it is supported by lots of aggregation tools. Synchronizing read/unread status is something for which no current standard exists.

* Second (and this is also a problem with OPML, so perhaps this should be kicked up to the application layer to deal with), how about situations where you are already subscribed to a feed under a different category?
Jack William Bell
Friday, 09 January 2004 03:00:17 (GMT Standard Time, UTC+00:00)
SIAM is expected to be used to synchronize the state of an aggregator in much the same way you use IMAP with email clients today. The same way you don't have mail show up in different folders depending on your email client or which machine you access mail from is the same way the category the susbscriptions isn't meant to change depending on which client you read the SIAM document from.

Wednesday, 14 January 2004 11:10:20 (GMT Standard Time, UTC+00:00)
The crazy idea is to use IMAP server for synchronization. I use IMAP since 1997 and never had same problems - my home computers mailbox is always the same as at work. Did you use IMAP conceptions in your protocol?
Monday, 19 January 2004 17:05:47 (GMT Standard Time, UTC+00:00)
Could this be used to mark an item as read even if it appears in multiple feeds? Think of how when one is reading NNTP items, the item is marked as read in all groups, not just the first one you happen to read it in. Maybe this requires some sort of GUID be inserted into RSS or ATOM; I don't know.

Scott Mace
Comments are closed.