A few months ago Robert Scoble wrote a post titled Yahoo announces API for its search engine where he asked
Seriously. Blogs are increasing noise to lots of searches. We already have good engines that let you search blogs (Feedster, Pubsub, Newsgator, Technorati, and Bloglines all are letting you search blogs). What about an engine that lets you search everything BUT blogs? Where's that?
Is Yahoo's API good enough to do that? It doesn't look like it. It looks like Yahoo just gave us an API to embed its search engine into our applications. Sigh. That's not what I want. OK, MSN, your turn. Are you gonna really give us an API that'll let us build a custom search engine and let us have access to the variables that determine the result set?
The first question Robert asks is hard but you can take shortcuts to get approximate results. How do you determine what a blog is? Do you simply exclude all results from LiveJournal, Blogspot and MSN Spaces? That would exclude millions of blogs but it wouldn't catch the various blogs on self hosted domains like mine. Of course, you could get even trickier by always asking to exclude pages that match certain words like "DasBlog", "Movable Type" or "WordPress" which would probably take out another large chunk. By then the search results would probably blog free as you can get without resorting to expensive matching techniques. For icing on the cake it would probably be useful to also be able to skew results by popularity or freshness.
The second question Scoble asks is whether there is a search engine that gives you an API that can do all this stuff. Well MSN Search gives you RSS feeds which as I've mentioned in a previous post is sometimes the only API your website needs. More importantly, as pointed out in a recent post by Andy Edmonds entitled Search Builder Revealed, one can control how variables such as popularity or freshness affect search results. For example,
Search results for "star wars revenge of the sith" by popularity
Search results for "star wars revenge of the sith" by freshness
One could probably write a first cut at the search engine Robert is asking for using the MSN Search RSS feeds in about an hour or so. In a day, it could be made to be quite polished with most of the work being in the user interface. Yet another coding project for a rainy day.