November 26, 2004
@ 08:03 PM

Several months ago I wrote a draft spec entitled Synchronization of Information Aggregators using Markup (SIAM) which was aimed at providing a lightweight mechanism for aggregators to synchronize state across multiple machines. There was a flurry of discussion about this between myself and a number of other aggregator authors on the [now defunct] information_aggregators mailing list.

However although there was some interest amongst aggregator authors, there wasn't much incentive to implement the functionality for a number of reasons. The reasons range from the fact that it makes it easier for users to migrate aggregators which payware aggregator authors aren't enthusiastic about to the fact that there was no server side infrastructure for supporting such functionality. Ideally this feature would have been supported by a web service end point exposed by a person's weblog or online aggregator. So not much came of it.

Since then I've implemented syncing in RSS Bandit in an application specific manner. So also have the authors of Shrook and NewsGator. There is also the Bloglines Sync API which provides a web service end point to doing limited syncing of feed state whose limitations I pointed out in my post Thoughts on the Bloglines Sync API.

This post is primarily meant to answer questions asked by Dmitry Jemerov, the creator of Syndirella who is now the dev lead for JetBrains' Omea Reader.


 

November 26, 2004
@ 05:52 PM

This morning I updated the RSS Bandit road map. Mostly we punted a bunch of low priority features to a future release codenamed 'Nightcrawler'. We also took one or two features out of the 'Wolverine' release. The current plan is to ship a beta of 'Wolverine' next month with the big features being the ability to delete items, newspaper views that are 100% compatible with FeedDemon Newspapers and being able to read & post to USENET newsgroups. Afterwards a final version will show up at the end of January or early in February at the latest.

There are also dozens of bug fixes in the 'Wolverine' release. Thanks a lot to the numerous users who took the time to submit bug reports and have been patiently waiting for a release with their fixes. We are now approaching the end game, a lot of the hard work's been done in the underlying infrastructure and the rest of the work left is fairly straightforward.


 

Categories: RSS Bandit

November 26, 2004
@ 03:58 PM

Thanks to my sister coming over for Thanksgiving I find out that G-Unit will be performing in concert in Nigeria. The interesting bit for me is that since my mom works for a TV station back home she'll get to meet 50 Cent and crew personally. While talking to her on the phone this morning it took all my restraint to not ask her to get an autograph for me.

I felt like I was twelve years old. :)


 

Categories: Ramblings

November 23, 2004
@ 04:17 AM

For the past few years I've used the citation search feature of CiteSeer to look for references to papers or articles I'd written and could only come up with one; Concurrency And Computation: Practice And Experience . Running the same search on Google Scholar comes back with 12 papers which reference articles or papers I've written. 

As I expected my C# vs. Java comparison was my most referenced article. Speaking of which it looks like it is about time I got cranking on updating the document to take into account Tiger and Whidbey. All I need now is some Java expert [preferrably a Sun employee] to agree to review it from the Java perspective.

I am definitely curious as to how Google could come up with a more extensive database of published papers than CiteSeer. Interesting.


 

Categories: Ramblings

I was procrastinating this morning from doing any real work and stumbled on a feature request in the RSS Bandit feature request database on Sourceforge requesting that we support adding links to http://del.icio.us from RSS Bandit. For those who are unfamiliar with the site, del.icio.us is a social bookmarks manager. It allows you to easily add sites you like to your personal collection of links, to categorize those sites with keywords, and to share your collection not only between your own browsers and machines, but also with others

Since RSS Bandit supports the IBlogExtension plugin interface I thought it would make more sense to implement this as a plugin that should just work in any .NET Framework based RSS aggregator that supports it. It took about two hours to put it together and now you can simply drop the DeliciousPlugin.dll file in the plugin folder of RSS Bandit, SharpReader or NewsGator and get the ability to post item links to http://del.icio.us.  

Download it from here: DeliciousPlugin.zip

Configuration
The first thing you have to do is configure the plugin by specifying your username and password. There is also the option of changing the del.icio.us API URL which isn't needed at the current time. The only reason that is there is because the del.icio.us API documentation states that the URL will change in the near future. The screenshots below should show how what this looks like

and this is the configuration dialog

Posting Links
The dialog box for posting enables you to edit the URL, description and associated tags before submitting to the site. If any of these fields isn't filled then this is considered an error and no submission is made. Below is a screenshot of the post dialog.

Known Issues
There seems to be a problem posting URLs that contain the '#' character. The website accepts the links without error but they don't show up in your inbox. I'd appreciate any pointers from anyone who can tell me what I did wrong.


 

Categories: RSS Bandit

I just noticed the eWeek article MSN Hotmail Nears Storage Finish Line which reports

Microsoft Corp.'s Internet division on Thursday started offering 250 MB of storage to new users of free Hotmail accounts in the United States and eight other countries. New accounts previously received 2MB of storage. As for current Hotmail users, the majority has gained the added storage and the rest will be upgraded over the next few weeks, said Brooke Richardson, MSN lead product manager. Hotmail has about 187 million customers worldwide.
...
New Hotmail users will get the storage in two steps. They first will receive 25MB of e-mail storage as MSN verifies that the accounts are for legitimate senders of e-mail and not spammers, Richardson said. After 30 days, they will gain the full 250MB of storage.

The increased storage also comes with an increase in the maximum attachment size to 10MB for free accounts.

A new photo-sharing feature in Hotmail lets users browse thumbnails of digital images and include multiple photos in an e-mail with one click, Richardson said. The feature also compressed the image files.

The article doesn't mention the eight other countries where the large Hotmail inbox feature has been deployed, they are the U.K., Australia, Canada, France, Germany, Italy, Japan, and Spain.

I am curious as to how much of a deterrent the 30 day waiting period will be to spammers. You'd think that using CAPTCHA technologies to prevent automated sign ups would get rid of most spammers but it seems like they are still a problem.


 

Categories: Mindless Link Propagation | MSN

Adam Bosworth has posted his ISCOC04 talk on his weblog. The post is interesting although I disagreed with various bits and pieces of it. Below are some comments in response to various parts of his talk

On the one hand we have RSS 2.0 or Atom. The documents that are based on these formats are growing like a bay weed. Nobody really cares which one is used because they are largely interoperable. Both are essentially lists of links to content with interesting associated metadata. Both enable a model for capturing reputation, filtering, stand-off annotation, and so on. There was an abortive attempt to impose a rich abstract analytic formality on this community under the aegis of RDF and RSS 1.0. It failed. It failed because it was really too abstract, too formal, and altogether too hard to be useful to the shock troops just trying to get the job done. Instead RSS 2.0 and Atom have prevailed and are used these days to put together talk shows and play lists (podcasting) photo albums (Flickr), schedules for events, lists of interesting content, news, shopping specials, and so on. There is a killer app for it, Blogreaders/RSS Viewers.

Although it is clear that RSS 2.0 seems to be edging out RSS 1.0, I wouldn't say it has failed per se. I definitely wouldn't say it failed for being too formal and abstract. In my opinion it failed because it was more complex with no tangible benefit. This is the same reason XHTML has failed when compared to HTML. This doesn't necessarily mean that more rigid sysems will fail to take hold when compared to less rigid systems, if so we'd never have seen the shift from C to C++ then from C++ to C#/Java.

Secondly, it is clear It seems Adam is throwing out some Google spin here by trying to lump the nascent and currently in-progress Atom format in the same group as RSS 2.0. In fact, if not for Google jumping on the Atom bandwagon it would even be more of an intellectual curiousity than RSS 1.0.    

As I said earlier, I remember listening many years ago to someone saying contemptuously that HTML would never succeed because it was so primitive. It succeeded, of course, precisely because it was so primitive. Today, I listen to the same people at the same companies say that XML over HTTP can never succeed because it is so primitive. Only with SOAP and SCHEMA and so on can it succeed. But the real magic in XML is that it is self-describing. The RDF guys never got this because they were looking for something that has never been delivered, namely universal truth. Saying that XML couldn't succeed because the semantics weren't known is like saying that Relational Databases couldn't succeed because the semantics weren't known or Text Search cannot succeed for the same reason. But there is a germ of truth in this assertion. It was and is hard to tell anything about the XML in a universal way. It is why Infopath has had to jump through so many contorted hoops to enable easy editing. By contrast, the RSS model is easy with an almost arbitrary set of known properties for an item in a list such as the name, the description, the link, and mime type and size if it is an enclosure. As with HTML, there is just enough information to be useful. Like HTML, it can be extended when necessary, but most people do it judiciously. Thus Blogreaders and aggregators can effortlessly show the content and understanding that the value is in the information. Oh yes, there is one other difference between Blogreaders and Infopath. They are free. They understand that the value is in the content, not the device.

Lots of stuff to agree with and disagree with here. Taking it from the top, the assertion that XML is self-describing is a myth. XML is a way to attach labels to islands of data, the labels are only useful if you know what they mean. Where XML shines is that one can start with a limited set of labels that are widely understood (title, link, description) but attach data with labels that are less likely to be understood (wfw:commentRss, annotate:reference, ent:cloud) without harming the system. My recent talk at XML 2004, Designing XML Formats: Versioning vs. Extensibility, was on the importance of this and how to bring this flexibility to the straitjacketed world of XML Schema.

I also wonder who the people are that claim that XML over HTTP will never succeed. XML over HTTP already has in a lot of settings. However I'd question that it is all you need. The richer the set of interactions allowed by the web site the more an API is needed. Google, Amazon and eBay all have XML-based APIs. Every major blogging tool has an XML-based API even though those same tools are using vanilla XML over HTTP for serving RSS feeds. XML over HTTP can succeed in a lot of settings but as the richness of the interaction between client and server grows so also does the need for a more powerful infrastructure.

The issue is knowing how to pick right tool for the job. You don't need the complexity of the entire WS-* stack to build a working system. I know a number of people at Microsoft realize that this message needs to get out more which is why you've begun to see things like Don Box's WS-Why Talk and the WS Kernel.

What has been new is information overload. Email long ago became a curse. Blogreaders only exacerbate the problem. I can't even imagine the video or audio equivalent because it will be so much harder to filter through. What will be new is people coming together to rate, to review, to discuss, to analyze, and to provide 100,000 Zagat's, models of trust for information, for goods, and for services. Who gives the best buzz cut in Flushing' We see it already in eBay. We see it in the importance of the number of deals and the ratings for people selling used books on Amazon. As I said in my blog, My mother never complains that she needs a better client for Amazon. Instead, her interest is in better community tools, better book lists, easier ways to see the book lists, more trust in the reviewers, librarian discussions since she is a librarian, and so on.
This is what will be new. In fact it already is. You want to see the future. Don't look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler' Will it be Attention.XML as Steve Gillmor and Dave Sifry hope' Or something else less formal and more organic' It doesn't matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn't going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.

I tend to agree with most of this although I'm unsure why he feels the need to knock Longhorn and Java. What he seems to be overlooking is that part of the information overload problem is the prevalance of poor data visualization and user interface metaphors for dealing with significant amounts of data. I know believe that one of the biggest mistakes I made in the initial design of RSS Bandit was modelling it after mail readers like Outlook even though I knew lots of people who had difficulty managing the flood of email they get using them. This is why the next version of RSS Bandit will borrow a leaf from FeedDemon along with some other tricks I have up my sleeve.

A lot of what I do in RSS Bandit is made easy due to the fact that it's built on the .NET Framework and not C++/MFC so I wouldn't be as quick to knock next generation GUI frameworks as Adam is. Of course, now that he works for a Web company the browser is king.


 

Categories: Syndication Technology | XML

November 19, 2004
@ 08:33 AM

My XML in the .NET Framework: Past, Present & Future talk went well yesterday. The room was full and people seemed to like what they heard. The audience was most enamored with the upcoming System.Xml.Schema.XmlSchemaInference class that provides the ability to generate schemas from sample documents and the new XSLT debugger.

It was nice having people walk up to me yesterday to tell me how much they liked my talk from the previous day. There were even a couple of RSS Bandit users who walked up to me to tell me how much they liked it. This was definitely my best XML conference experience.

Arpan did comment on the irony of me giving more talks about XML after leaving the XML team at Microsoft than when I was on the team. :)


 

Categories: Ramblings | XML

November 18, 2004
@ 07:12 PM

My XML 2004 talk, Designing XML Formats: Versioning vs. Extensibility, went over well yesterday. Lots of interesting questions were asked during the Q&A session for my talk and the following talk by Dave Orchard, Achieving Distributed Extensibility and Versioning.

One issue that came up during the discussions after our talk was the cost/benefit of using a mustUnderstand construct in an XML format similar to the SOAP mustUnderstand attribute. The primary benefit of the having such a construct is that it enables third parties to create mandatory extensions to an XML format. However there a number of costs to having such a construct

  1. Entire Element or Document Must Be Read: A processor that just wants to extract a subset of the data in the document still has to parse the entire document and see if there are any mustUnderstand constructs before it can process the document. This increases the cost of processing instances of the format.
  2. Ambiguity as to what is Meant by 'Understand': The concept of what it means to "understand" an XML vocabulary is context specific. For example, should a stylesheet that pretty prints an XML document fail because the format contains a mustUnderstand construct that is not explicitly handled by the stylesheet? A mustUnderstand construct is particularly limiting since it forces all consumers to fail even though there may be some consumers that can still use the format even if they don't explicitly understand certain elements or attribute in the document.
  3. Causes Confusion for Intermediaries: In certain cases, a format may be processed by an intermediary on the way to the client from the server. For example, HTTP requests often pass through proxy servers and there are also web-based aggregators of RSS/Atom feeds such as Feedster & PubSub which can then be subscribed to by other aggregators. In such cases, it is ambiguous whether intermediaries are expected to fail if a construct which isn't explicitly handled is labelled as mustUnderstand or whether they are expected to pass it on with that label to third party aggregators. In fact certain formats thus have separate mustUnderstand constructs for hop-to-hop versus end-to-end transmission.

From my perspective, the cost of having a mustUnderstand construct is often not worth the benefits provided. This wasn't explicitly in my talk but is a conclusion I came to recently which I expanded upon during the Q&A session.


 

Categories: XML

November 17, 2004
@ 01:13 PM

Recently I've been having the same problems with my iPod that Omar Shahine described in his post PlaysForSure

So, here is the landscape today. I have an iPod, it's beautiful, small, light and has a great out of box experience. I plug it into a Mac or a PC with iTunes installed and the rest is mostly magic. iTunes can automatically communicate with the iPod, sync all my music over firewire and charge the device at the same time. However, my iPod seems to think that after hours and hours of charging the battery is half full. As you use it though the battery meter increases before it decreases. If I leave the iPod sitting for a few days, via osmosis or some process, the battery drains. So most of the time when I want to use it, I can't cause it's dead. It also won't even last for a complete transatlantic flight.

I love my iPod but this is beginning to get old. It looks like it's time I replaced my battery, at least the price seems to be only about $30.00. Anyone out there have any experience with replacing their iPod battery?


 

Categories: Ramblings