The MSN Search team recently silently released http://addins.msn.com which among other things provides the API documentation for Windows Desktop Search that ships with the MSN Search Toolbar. The lowdown on what the API is good for is right at the beginning and is excerpted below

Which Extension Technology to Use?

There are two basic methods for creating add-ins for Desktop Search.

1.     Adding new file types by creating IFilters

  • Extend Desktop Search with an IFilter add-in that knows how to “crack” the contents of a new file type in order to index its text and metadata.
  • To do this, you need to build and register an object supporting the IFilter interface.
  • You can add file-specific icons or context-menu handlers by following this documentation on extending the Windows Explorer file types by creating IContextMenu and IExtractIcon interfaces.
2.     Adding a new store by creating protocol handlers

  • Extend Desktop Search so that it can index a new data store, such as the database of an e-mail application.
  • To do this, you need to build a protocol handler object supporting the ISearchProtocol interface, along with an IUrlAccessor to pull the items. If the contents of the data store are file types not already indexed by Desktop Search, you may also need to implement one or more IFilters..
  • To add icons or context-menu handlers, you need to implement portions of an IShellFolder.

You can also find custom IFilters at http://addins.msn.com including ones for PDF, ZIP, CHM and Mozilla Thunderbird mail formats. If only there were C# wrappers for all the gnarly COM interfaces I'd provide one for indexing and searching the cache file format used by RSS Bandit. Then people would be able to search their feeds directly from desktop search. That would be kinda hot. 


 

Categories: MSN

It seems both Google and Yahoo! provided interesting news on the personalized search front recently.

Yahoo! MyWeb 2.0 seems to merge the functionality of del.icio.us with the power of Yahoo! search. I can now add tags to my Yahoo! bookmarks, view cached versions of my bookmarked pages and perform searches restricted to the sites in my bookmark list. Even cooler is that I can share my bookmarks with members of my contact list or just make them public. The search feature also allows one to search sites restricted to those shared by others. All they need to do is provide an API and add RSS feeds for this to be a del.icio.us killer.

Google Personalized Search takes a different tack in personalizing search results. Google's approach involves tracking your search history and tracking what search results you cliked on. Then when next you perform searches, Google Personalized Search brings certain results closer to the top based on your search history or previous click behavior.

As for this week's news about MSN Search? Well you can catch what our CEO had to say about us in the ZDNet article Ballmer confident, but admits failings. We definitely have a lot of catching up to do but I don't think the race is over yet.


 

Categories: Social Software | Technology

Just when you think we've missed the boat on software development trends on the Web, Microsoft surprises folks. First it was announcing that we will be baking RSS support into the next version of Windows. Now we've announced that we will be shipping a toolkit for building AJAX-style Web applications. In his post about the Atlas Project, Scott Guthrie writes

All of the pieces of AJAX – DHTML, JScript, and XMLHTTP – have been available in Internet Explorer for some time, and Outlook Web Access has used these techniques to deliver a great browser experience since 1998. In ASP.NET 2.0, we have also made it easier to write AJAX-style applications for any browser using asynchronous callbacks, and we use them in several of our built-in controls.

 

Recently, however, the technologies used by AJAX have become broadly available in all browsers, and use of this model for rich web applications has really taken flight. There are a number of high-profile new AJAX-style websites out there today, including a number by Google, as well as sites like A9 and Flickr. Microsoft will also have more sites that use this technology out there soon – check out Start.com and the MSN Virtual Earth project for examples.

 

The popularity of AJAX shows the growing demand for richer user experiences over the web. However, developing and debugging AJAX-style web applications is a very difficult task today. To write a rich web UI, you have to know a great deal of DHTML and JavaScript, and have a strong understanding of all the differences and design details of various browsers. There are very few tools to help your design or build these applications easily. Finally, debugging and testing these applications can be very tricky.

...

For this work, we’ve been working on a new project on our team, codenamed “Atlas”. Our goal is to produce a developer preview release on top of ASP.NET 2.0 for the PDC this September, and then have a website where we can keep updating the core bits, publishing samples, and building an active community around it.

 

Here are some of the pieces of Atlas that we are going to be delivering over time:

 

 

Atlas Client Script Framework

 

The Atlas Client Script Framework is an extensible, object-oriented 100% JavaScript client framework that allows you to easily build AJAX-style browser applications with rich UI and connectivity to web services. With Atlas, you can write web applications that use a lot of DHTML, Javascript, and XMLHTTP, without having to be an expert in any of these technologies.

 

The Atlas Client Script Framework will work on all modern browsers, and with any web server. It also won’t require any client installation at all – to use it, you can simply include references to the right script files in your page.

 

The Atlas Client Script Framework will include the following components:

  • An extensible core framework that adds features to JavaScript such as lifetime management, inheritance, multicast event handlers, and interfaces
  • A base class library for common features such as rich string manipulation, timers, and running tasks
  • A UI framework for attaching dynamic behaviors to HTML in a cross-browser way
  • A network stack to simplify server connectivity and access to web services
  • A set of controls for rich UI, such as auto-complete textboxes, popup panels, animation, and drag and drop
  • A browser compatibility layer to address scripting behavior differences between browsers.
This is excellent news which I know a lot of our UX developers at MSN will be glad to hear. Already Scott Isaacs who's been a key part of the AJAX development we've been doing at MSN has posted his opinions about Atlas in his blog entry entitled My personal thoughts on an AJAX (DHTML) framework..... His post highlights some of the history of AJAX as well as the issues a toolkit like Atlas could solve.

First RSS, now AJAX. All that's left is to see some announcement that we will be shipping a REST toolkit to make this a trifecta of utmost excellence. More nagging I must do...

 

Today I learned that Apple brings podcasts into iTunes which is excellent news. This will definitely push subscribing to music and videos via RSS feeds into the mainstream. I wonder how long it'll take MTV to start providing podcast feeds.

One interesting aspect of the announcement which I didn't see in any of the mainstream media coverage was pointed out to me in Danny Ayers's post Apple - iTunes - Podcasting where he wrote

Apple - iTunes - Podcasting and another RSS 2.0 extension (PDF). There are about a dozen new elements (or “tags” as they quaintly describe them) but they don’t seem to add anything new. I think virtually everything here is either already covered by RSS 2.0 itself, except maybe tweaked to apply to the podcast rather than the item.
They’ve got their own little category taxonomy and this delightful thing:

<itunes :explicit>
This tag should be used to note whether or not your Podcast contains explicit material.
There are 2 possible values for this tag: Yes or No

I wondered at first glance whether this was so you could tell when you were dealing with good data or pure tag soup. However, the word has developed a new meaning:

If you populate this tag with “Yes”, a parental advisory tag will appear next to your Podcast cover art on the iTunes Music Store
This tag is applicable to both Channel & Item elements.

So, in summary it’s a bit of a proprietary thing, released as a fait accompli. Ok if you’re targetting for iTunes, for anything else use Yahoo! Media RSS . I wonder where interop went.

This sounds interesting. So now developers of RSS readers that want to consume podcasts have to know how to consume the RSS 2.0 <enclosure> element, Yahoo!'s extensions to RSS and Apple's extensions to RSS to make sure they cover all the bases. Similarly publishers of podcasts also have to figure out which ones they want to publish as well.

I guess all that's left is for Real Networks and Microsoft to publish their own extensions to RSS for dealing with providing audio and video metadata in RSS feeds to make it all complete. This definitely complicates my plans for adding podcasting support to RSS Bandit. And I thought the RSS 1.0 vs. RSS 2.0 vs. Atom discussions were exciting. Welcome to the world of syndication.

PS: The title of this post is somewhat tongue in cheek. It was inspired by Slashdot's headline over the weekend titled Microsoft To Extend RSS about Microsoft's creation of an RSS module for making syndicating lists work better in RSS. Similar headlines haven't been run about Yahoo! or Apple's extensions to RSS but that's to be expected since we're Microsoft. ;)


 

Categories: Syndication Technology | XML

As the developer of an RSS aggregator I'm glad to see Microsoft's Simple List Extensions for RSS. Many of the aggregator developers I spoke to at Gnomedex this weekend felt the same way. The reason for being happy about these extensions is that they provide a way to fix a number of key feeds that are broken in RSS aggregators today. This includes feeds such as the MSN Music Top 100 Songs feed, iTunes Top 25 Songs feed and Netflix Top 100 Movies feed.

The reasons these feeds appear broken in every aggregator in which I have tried them is covered in a previous post of mine entitled The Netflix Problem: Syndicating Ordered Lists in RSS. For those who don't have time to go back and read the post, the following list summarizes the problems with the feeds

  1. When the list changes some items change position, new ones enter the list and old one's leave. An RSS reader doesn't know to remove items that have left the list from the display and in some cases may not know to eliminate duplicates. Eventually you have a garbled list with last week's #25 song and this weeks #25 song and last month's #25 song all in the same view.

  2. There is no way to know how to sort the list. Traditionally RSS ggregators sort entries by date which doesn't make sense for an ordered list.  

The RSS extensions provided by Microsoft are meant to solve these problems and improve the current negative user experience of people who subscribe to ordered lists using RSS today.  

To solve the first problem Microsoft has provided the cf:treatAs element with the value "list" to be used as a signal to aggregators that whenever the feed is updated that the previous contents should be dumped or archived and replaced by the new contents of the list. That way we no longer have last week's Top 25 song list comingled with this week's list. The interesting question for me is whether RSS Bandit should always refresh the contents of the list view when a list feed is updated (i.e. the feed always contains the current list) or whether to keep the old version of the list perhaps grouped by date. My instinct is to go with the first option. I know Nick Bradbury also had some concerns about what the right behavior should be for treating lists in FeedDemon.

To solve the second problem Microsoft has provided the cf:sort element which can be used to specify what elements on an item should be used for sorting, whether the field is numeric or textual so we know how to sort it and what the human readable name of the field should be when displayed to the user. I'm not really sure how to support this in RSS Bandit. Having every feed be able to specify what columns to show in the list view complicates the user interface somewhat and requires a degree of flexibility in the code. Changing the code to handle this should be straightforward although it may add some complexity.

On the other hand there are some user interface problems. For one, I'm not sure what should be the default sort field for lists. My gut instinct is to add a "Rank" column to the list of columns RSS Bandit supports by default and have it be a numeric field that is numbered using the document order of the feed. So the first item has rank 1, the second has rank 2, etc. This handles the case where a feed has a cf:treatAs element but has no cf:sort values. This will be needed for feeds such as the Netflix Top 100 feed which doesn't have a field that can be used for sorting. The second problem is how to show the user what columns can be added to a feed. Luckily we already have a column chooser that is configurable per feed in RSS Bandit. However we now have to make the list of columns in that list configurable per feed. This might be confusing to users but I'm not sure what other options we can try.


 

I missed the first few minutes of this talk.

Bob Wyman of PubSub stated he believed Atom was the future of syndication. Other formats would eventually be legacy formats that would be analogous to RTF in the word processing world. They will be supported but rarely chosen for new efforts in the future.

Mark Fletcher of Bloglines then interjected and pleaded with the audience to stop the practice of providing the same feed in multiple formats. Bob Wyman agreed with his plea and also encouraged members of the audience to pick one format and stick to it. Having the same feed in multiple syndication formats confuses end users who are trying to subscribe to the feed and leads to duplicate items showing up in search engines that specialize in syndication formats like PubSub, Feedster or the Bloglines search features.

A member of the audience responded that he used multiple formats because different aggregators support some formats better than others. Bob Wyman replied that bugs in aggregators should result in putting pressure on RSS aggregator developers to fix them instead of causing confusion to end users by spitting multiple versions of the same feed. Bob then advocated using picking Atom since a lot of lessons had been learned via the IETF process to improve the format. Another audience member mentioned that 95% of his syndication traffic was for his RSS feed not his Atom feed so he knows which format is winning in the market place.

A question was raised about whether the admonition to avoid multiple versions of  feed also included sites that have multiple feeds for separate categories of content. The specific example was having a regular feed and a podcast feed.  Bob Wyman thought that this was not a problem. The problem was the same content served in different formats.

The discussion then switched to ads in feeds. Scott Rafer of Feedster said that he agreed with Microsoft's presentation from the previous day that Subscribing is a new paradigm that has come after Browsing and Searching for content. Although we have figured out how to provide ads to support Browse & Search scenarios we are still experimenting with how to provide ads to support the Subscribe scenarios. Some sites like the New York Times uses RSS to draw people to its website by providing excerpts in its feeds. Certain consultants have full text feeds which they view as advertising their services. While others put ads in their feeds. Bob Wyman mentioned that PubSub is waiting to see which mechanism the market settles on for having advertising in feeds before deciding on approach. Bob Wyman added that finding a model for advertising and syndication was imperative so that intermediary services like PubSub, Feedster and Bloglines can continue to exist. An audience member then followed up and asked why these services couldn't survive by providing free services to the general public and charging corporate users instead of resorting to advertising. The response was that both PubSub and Feedster already have corporate customers who they charge for their services but this revenue is not be enough for them to continue providing services to the general public. The Bloglines team considered having fee-based services but discarded the idea because they felt it would be a death-knell for the service given that most service providers on the Web are free not fee-based.

An audience member asked if any of the services would have done anything different two years ago when they started given the knowledge they had now. The answers were that Feedster would have chosen a different back-end architecture, Bloglines would have picked a better name and PubSub would have started a few months to a year sooner.

I asked the speakers what features they felt were either missing in RSS or not being exploited. Mark Fletcher said that he would like to see more usage of the various comment related extensions to RSS which currently aren't supported by Bloglines because they aren't in widespread use. The other speakers mentioned that they will support whatever the market decides is of value.


 

Scott Gatz of Yahoo! started by pointing out that there are myriad uses for RSS. For this reason he felt that we need more flexible user experiences for RSS that map to these various uses. For example, a filmstrip view is more appropriate for reading a feed of photos than a traditional blog and news based user interface typically favored by RSS readers. Yahoo! is definitely thinking about RSS beyond just blogs and news which is why they've been working on Yahoo! Media RSS which is an extension to RSS that makes it better at syndicating digital media content like music and videos. Another aspect of syndication Yahoo! believes is key is the ability to keep people informed about updates independent of where they are or what device they are using. This is one of the reasons Yahoo! purchased the blo.gs service.

Dave Sifry of Technorati stated that he believed the library model of the Web where we talk about documents, directories and so on is outdated. The Web is more like a river or stream of millions of state changes. He then mentioned that some trends to watch that emphasized the changing model of the Web were microformats and tagging.

BEGIN "Talking About Myself in the Third Person"

Steve Gillmor of ZDNet began by pointing out Dare Obasanjo in the audience and saying that Dare was his hero and someone he admired for the work he'd done in the syndication space. Steve then asked why in a recent blog posting Dare had mentioned that he would not support Bloglines proprietary API for synchronizing a user's subscriptions with a desktop RSS reader but then went on to mention that he would support Newsgator Online's proprietary  API. Specifically he wondered why Dare wouldn't work towards a standard instead of supporting proprietary APIs.

At this point Dare joined the three speakers on stage. 

Dare mentioned that from his perspective there were two major problems that confronted users of an RSS reader. The first was that users eventually need to be able to read their subscriptions from multiple computers. This is because many people have multiple computers (e.g. home & work or home & school) where they read news and blogs from. The second problem is that eventually, due to the ease of subscribing to feeds, people eventually succumb to information overload and need a way to see only the most important or interesting content in the feeds to which they are subscribed. This is the "attention problem" that Steve Gillmor is a strong advocate of solving. The issue discussed in Dare's blog post is the former not the latter. The reason for working with the proprietary APIs provided by online RSS readers instead of advocating a standard is that the online RSS readers are the ones in control. At the end of the day, they are the ones that provide the API so they are the ones that have to decide whether they will create a standard or not.  

Dare rejoined the audience after speaking.  

END "Talking About Myself in the Third Person"

Dave Sifry followed up by encouraging cooperation between vendors to solve the various problems facing users. He gave an example of Yahoo! working with Marc Canter on digital media as an example.

Steve Gillmor then asked audience members to raise their hand if they felt that the ability to read their subscriptions from multiple computers was a problem they wanted solved. Most of the audience raised their hands in response.

A member of the audience responded to the show of hands by advocating that people us web based RSS readers like Bloglines. Scott Gatz agreed that using a web based aggregator was the best way to access one's subscriptions from multiple computers. There is some disagreement between members of the audience and the speakers whether there are problems using Bloglines from mobile devices which prevent it from being the solution to this problem.

From the audience, Dave Winer asks Dave Sifry why Technorati invented Attention.Xml instead of reusing OPML. The response was that the problem was beyond just synchronizing the list of feeds the user is subscribed to.

Steve Gillmor ended the session by pointing out that once RSS usage becomes widespread someone will have to solve the problem once and for all.