A few months ago Robert Scoble wrote a post titled Yahoo announces API for its search engine where he asked

Seriously. Blogs are increasing noise to lots of searches. We already have good engines that let you search blogs (Feedster, Pubsub, Newsgator, Technorati, and Bloglines all are letting you search blogs). What about an engine that lets you search everything BUT blogs? Where's that?

Is Yahoo's API good enough to do that? It doesn't look like it. It looks like Yahoo just gave us an API to embed its search engine into our applications. Sigh. That's not what I want. OK, MSN, your turn. Are you gonna really give us an API that'll let us build a custom search engine and let us have access to the variables that determine the result set?

The first question Robert asks is hard but you can take shortcuts to get approximate results. How do you determine what a blog is? Do you simply exclude all results from LiveJournal, Blogspot and MSN Spaces? That would exclude millions of blogs but it wouldn't catch the various blogs on self hosted domains like mine. Of course, you could get even trickier by always asking to exclude pages that match certain words like "DasBlog", "Movable Type" or "WordPress" which would probably take out another large chunk. By then the search results would probably blog free as you can get without resorting to expensive matching techniques. For icing on the cake it would probably be useful to also be able to skew results by popularity or freshness.

The second question Scoble asks is whether there is a search engine that gives you an API that can do all this stuff. Well MSN Search gives you RSS feeds which as I've mentioned in a previous post is sometimes the only API your website needs. More importantly, as pointed out in a recent post by Andy Edmonds entitled Search Builder Revealed, one can control how variables such as popularity or freshness affect search results. For example,

  1. Search results for "star wars revenge of the sith" by popularity

  2. Search results for "star wars revenge of the sith" by freshness

One could probably write a first cut at the search engine Robert is asking for using the MSN Search RSS feeds in about an hour or so. In a day, it could be made to be quite polished with most of the work being in the user interface. Yet another coding project for a rainy day.


 

Thursday, May 12, 2005 8:57:26 PM (GMT Daylight Time, UTC+01:00)
The premise seems strange. When someone searches for a web page, they rarely care how that page got published. The fact that the page was published with Frontier instead of FrontPage is irrelevant to someone searching for information.

I suspect that in the very few case when people care about filtering out blogs, they really just mean that they want to filter out livejournal and blogger -- so your workaround works.
Comments are closed.