These are my notes in the Vertical Search and A9 by Jeff Bezos.

The core idea behind this talk was powerful yet simple.

Jeff Bezos started of by talking about vertical search. In certain cases, specialized search engines can provide better results than generic search engines. One example is searching Google for Vioxx and performing the same search on a medical search engine such as PubMed. The former returns results that are mainly about class action lawsuits while the latter returns links to various medical publications about Vioxx. For certain users, the Google results are what they are looking for and for others the PubMed results would be considered more relevant.

Currently at A9.com, they give users the ability to search both generic search engines like Google as well as vertical search engines. The choice of search engines is currently small but they'd like to see users have the choice of building a search homepage that could pull results from thousands of search engines. Users should be able to add any search engine they want to their A9 page and have those results display in A9 alongside Google or Amazon search results. To facilitate this, they now support displaying search results from any search engine that can provide search results as RSS. A number of search engines already do this such as MSN Search and Feedster. There are some extensions they have made to RSS to support providing search results in RSS feeds. From where I was standing some of the extension elements I saw include startIndex, resultsPerPage and totalResults.

Amazon is calling this initiative OpenSearch.

I was totally blown away by this talk when I attended it yesterday. This technology has lots of potential especially since it doesn't seem tied to Amazon in any way so MSN, Yahoo or Google could implement it as well. However there are a number of practical issues to consider. Most search engines make money from ads on their site so creating a mechanism where other sites can repurpose their results would run counter to their business model especially if this was being done by a commercial interest like Amazon.


 

Categories: Technology | Trip Report

These are my notes on the Remixing Technology at Applied Minds session by W. Daniel Hillis.

This was a presentation by one of the key folks at Applied Minds. It seems they dabble in everything from software to robots. There was an initial demo showing a small crawling robot where he explained that they discovered that six legged robots were more stable than those with four legs. Since this wasn't about software I lost interest for several minutes but did hear the audience clap once or twice for the gadgets he showed.

Towards the end the speaker started talking about an open market place of ideas. The specific scenario he described was the ability to pull up a map and have people's opinions of various places on the map show up overlayed on the map. Given that people are already providing these opinions on the Web today for free, there isn't a need to have to go through some licenced database of reviews to get this information. The ability to harness the collective consciousness of the World Wide Web in this manner was the promise of the Semantic Web which the speaker felt was going to be delivered. His talk reminded me a lot of the Committee of Gossips vision of the Semantic Web that Joshua Allen continues to evangelize.

It seems lots of smart people are getting the same ideas about what the Semantic Web should be. Unfortunately, they'll probably have to route around the W3C crowd if they ever want to realize this this vision.


 

Categories: Technology | Trip Report

These are my notes on the The App is the API: Building and Surviving Remixable Applications by Mike Shaver. I believe I heard it announced that the original speaker couldn't make it and the person who gave the talk was a stand in.

This was one of the 15 minute keynotes (aka high order bits). The talk was about Firefox and its extensibility model. Firefox has 3 main extensibility points; components, RDF data sources and XUL overlays.

Firefox components are similar to Microsoft's COM components. A component has a contract id which an analogous to a GUID in the COM world. Components can be MIME type handlers, URL scheme handlers, XUL application extensions (e.g. mouse gestures) or inline plugins (similar to ActiveX). The Firefox team is championing a new plugin model that is similar to ActiveX which is expected to be supported by Opera and Safari as well. User defined components can override built in components by claiming their contract id in a process which seemed to be fragile but the speaker claimed has worked well so far.

Although RDF is predominantly used as a storage format by both Thunderbird and Firefox, the speaker gave the impression that this decision was a mistake. He repeatedly stated that graph based data model was hard for developers to wrap their minds around and that it was too complex for their needs. He also pointed out that whenever RDF was criticized by them, advocates of the technology [and the Semantic Web] would claim tha there were  future benefits that would be reaped from using RDF.

XUL overlays can be used to add toolbar buttons, tree widget columns and context menus to the Firefox user interface. They can also be used to create style changes in viewed pages as well. A popular XUL overlay is GreaseMonkey which the author showed could be used to add features to web sites such as persistent searches to GMail all using client side script. The speaker did warn that such overlays which applied style changes were inherently fragile since they depend on processing the HTML on the site which could change without warning if the site is redesigned. He also mentioned that it was unclear what the versioning model would be for such scripts once new versions of Firefox showed up.


 

Categories: Technology | Trip Report

These are my notes on the Web Services as a Strategy for Startups: Opening Up and Letting Go by Stewart Butterfield and Cal Henderson.

This was one of the 15 minute keynotes (aka high order bits). The talk was about Flickr and its API.  I came towards the end so I missed the meat of the talk but it seemed the focus of it was showing the interesting applications people have built using the Flickr API. The speakers pointed out that having an API meant that cool features were being added to the site by third parties thus increasing the value and popularity of the site.

There were some interesting statistics such as the fact that their normal traffic over the API is 2.93 calls per second but can beup to 50-100 calls per second at its peak. They also estimate that about 5% of the website traffic are calls to the Flickr API.


 

Categories: Trip Report | XML Web Services

These are my notes on the Build Contentcentric Applications on RSS, Atom, and the Atom API session by Ben Hammersley.

This was a 3.5 hour tutorial session [which actually only lasted 2.5 hours].

At the beginning, Ben was warned the audience that the Atom family of specifications are still being worked on but should begin to enter the finalization stages this month. The specs have been stable for about the last 6 months, however anything based on work older than that (e.g. anything based on the Atom 0.3 syndication format spec) may be significantly outdated.

He indicated that there were many versions of syndication formats named RSS, mainly due to acrimony and politics in the online syndication space. However there are basically 3 major flavors of syndication formats; RSS 2.0, RSS 1.0 and Atom.

One thing that sets Atom apart from the other formats is that a number of optional items from RSS 1.0 and RSS 2.0 are mandatory in Atom. For example, in RSS 2.0 an item can contain only a <description> and be considered valid while in RSS 2.0 an item with a blank title and a RDF:about (i.e. link) can be considered valid. This is a big problem for consumers of feeds, when basic information like the date of the item isn't guaranteed to show up.

There then was a slide attempting to show when to use each syndication format. Ben contended that RSS 2.0 is good for machine readable lists but not useful for much else outside of displaying information in an aggregator. RSS 1.0 is useful for complex data mining, not useful for small ad-hoc web feeds. Atom is best of both worlds, a simple format yet strictly defined data.

I was skeptical of this definition especially since the fact that people are using RSS 2.0 for podcasting flies in the face of his contentions about what RSS 2.0 is good for. In talking with some members of the IE team, who attended the talk with me, about this part of the talk later they agreed that Ben didn't present any good examples of use cases that the Atom syndication format satisfied that RSS 2.0 didn't.

Atom has a feed document and an entry document, the latter being a new concept in syndication. Atom also has a reusable syntax for generic constructs (person, link, text, etc). At this point Marc Canter raised the point that there weren't constructs in Atom for certain popular kinds of data on the Web. Some example Marc gave were that there are no explicit constructs handle tags (i.e. folksonomy tags)  or digitial media. Ben responded that the former could be represented with category elements while the latter could be binary payloads that were either included inline or linked from an entry in the feed.

Trying a different tack, I asked how one represented the metadata for digital content within an entry. For example, I asked about doing album reviews in Atom. How would I provide the metadata for my album review (name, title, review content, album rating) as well as the metadata for the album I was reviewing (artist, album, URL, music sample(s), etc)  and his response was that I should use RSS 1.0 since it was more oriented to resources talking about other resources.

The next part of the talk was about the Atom API which is now called the Atom publishing protocol. He gave a brief history of weblog APIs starting with the Blogger API and ending with the MetaWeblog API. He stated that XML-RPC is inelegant while SOAP is "horrible overkill" for solving the problem of posting to a weblog from an API. On the other hand REST is elegant. The core principles of REST are using the HTTP verbs like GET, PUT, POST and DELETE to manipulate representations of resources. In the case of Atom, these representations are Atom entry and feed documents. There are four main URI endpoints; the PostUri, EditUri, FeedUri, and the ResourcePostUri. In a technique reminiscent of RSD, websites that support Atom can place pointers to the API end points by using <link> tags with appropriate values for the rel attribute. 

At the end of the talk I asked what the story was for versioning both the Atom syndication format and the publishing protocol. Ben floundered somewhat in answering this question but eventually pointed to the version attribute in an Atom feed. I asked how would an application tell from the version attribute if it had encountered a newer but backwards compatible version of the spec or was the intention that clients should only be coded against one version of Atom? His response was that I was 'looking for a technological solution to social problem' and more importantly there was little chance that the Atom specifications would change anyway.

Yeah, right.

During the break, Marc Canter and I talked about the fact that both the Atom syndication format and Atom publishing protocol are simply not rich enough to support existing blogging tools let alone future advances in blogging technologies. For example, in MSN Spaces  we already have data types such as music lists and photo albums which don't fit in the traditional blog entry syndication paradigm that Atom is based upon. More importantly it is unclear how one would even extend to do this in an acceptable way. Similar issues exist with the API. The API already has less functionality existing APIs such as the MetaWeblog API. It is unclear how one would perform the basic act of querying one's blog for a list of categories to populate the drop down list used by a rich client which is a commonly used feature by such tools. Let alone, doing things like managing one's music list or photo album which is what I'd eventually like us to do in MSN Spaces.

The conclusion that Marc and I drew was that just to support existing concepts in popular blogging tools, both the Atom syndication format and the Atom API would need to be extended.

There was a break, after which there was a code sample walkthrough which I zoned out on.


 

March 14, 2005
@ 01:57 AM

Recently I saw a post by Ed Oswald entitled Has Spaces Changed the Way You Blog? where he wrote

In the coming weeks I will be writing a commentary on the success of MSN Spaces for BetaNews.. I have made it no secret through several of my posts as well as comments to my friends that I truly think MSN has really struck gold with Spaces, and could change the way people think about blogs. Blogging before Spaces was more unidirectional -- where the author posted to an group which he likely did not know -- and were usually somewhat impersonal. However, with Spaces it's more omnidirectional -- yes, these can be your old fashioned blog -- however, through integration with MSN Messenger and the like, Spaces becomes an extension of your online self. You match it to your interests -- and people can learn more about you than a simple blog can provide. What music interests you -- photos of your recent trip to Australia -- and what not. Plus -- when you have something to say, all your friends will known in seconds with the "gleam".

Many people [especially in the mainstream media] view blogging as amateur punditry. However the truth is that for most people blogging and related activities are about communicating their thoughts and sharing their experiences with others [mainly friends and family]. This is a key aspect of the vision behind MSN Spaces. We aren't the first service provider to design a blogging service based on this premise, LiveJournal would be one of the best examples of this, but I believe have done one of the best jobs so far in truly focusing on building a platform for sharing experiences with friends, family & strangers.

At ETech, I am supposed to demo how the integration of MSN Messenger, MSN Spaces and Hotmail improves the ability of our users to communicate with their friends and family than in isolation. It is clear that this provides enormous value to our users as evidenced by posts such as Ed's, I just hope that I end up presenting this in a way that clearly shows why what we've built is so cool.


 

Categories: MSN

March 13, 2005
@ 07:48 PM

This time tomorrow I'll be at the O'Reilly Emerging Technology Conference. Checking out the conference program, I saw that Evan Williams will be hosting a session entitled Odeo -- Podcasting for Everyone. I've noticed the enthusiasm around podcasting among certain bloggers and the media but I am somewhat skeptical of the vision folks like Evan Williams have espoused in posts such as How Odeo Happened.

In thinking about podcasting, it is a good thing to remember the power law and the long tail. In his post Weblogs, Power Laws and Inequality, Clay Shirky wrote

The basic shape is simple - in any system sorted by rank, the value for the Nth position will be 1/N. For whatever is being ranked -- income, links, traffic -- the value of second place will be half that of first place, and tenth place will be one-tenth of first place. (There are other, more complex formulae that make the slope more or less extreme, but they all relate to this curve.) We've seen this shape in many systems. What've we've been lacking, until recently, is a theory to go with these observed patterns.
...
A second counter-intuitive aspect of power laws is that most elements in a power law system are below average, because the curve is so heavily weighted towards the top performers. In Figure #1, the average number of inbound links (cumulative links divided by the number of blogs) is 31. The first blog below 31 links is 142nd on the list, meaning two-thirds of the listed blogs have a below average number of inbound links. We are so used to the evenness of the bell curve, where the median position has the average value, that the idea of two-thirds of a population being below average sounds strange. (The actual median, 217th of 433, has only 15 inbound links.)

The bottom line here is that a majority of weblogs will have small to miniscule readership. However the focus of the media and the generalizations made about blogging will be on popular blogs with large readership. But the wants and needs of popular bloggers often do not mirror those of the average blogger. There is a lot of opportunity and room for error when trying to figure out where to invest in features for personal publishing tools such as weblog creation tools or RSS reading software. Clay Shirky also mentioned this in his post where he wrote

Meanwhile, the long tail of weblogs with few readers will become conversational. In a world where most bloggers get below average traffic, audience size can't be the only metric for success. LiveJournal had this figured out years ago, by assuming that people would be writing for their friends, rather than some impersonal audience. Publishing an essay and having 3 random people read it is a recipe for disappointment, but publishing an account of your Saturday night and having your 3 closest friends read it feels like a conversation, especially if they follow up with their own accounts. LiveJournal has an edge on most other blogging platforms because it can keep far better track of friend and group relationships, but the rise of general blog tools like Trackback may enable this conversational mode for most blogs.

The value of weblogging to most bloggers (i.e. the millions of people using services like LiveJournal, MSN Spaces and Blogger) is that it allows them to share their experiences with friends, family & strangers on the Web and it reduces the friction for getting content on the Web when compared to managing a personal homepage which was the state of the art in personal publishing on the Web last decade. In addition, there are the readers of weblogs to consider. The existence of RSS syndication and aggregators such as RSS Bandit & Bloglines have made it easy for people to read multiple weblogs with ease. According to Bloglines, their average user reads just over 20 feeds.

Before going into my list of issues with podcasting, I will point out that I think the current definition of podcasting which limits it to subscribing to feeds of audio files is fairly limiting. One could just as easily subscribe to other digital content such as video files using RSS. To me podcasting is about time shifting digital content, not just audio files.

With this setup out of the way I can list the top three reasons I am not as enthusiastic about podcasting as folks like Evan Williams 

  1. Creating digital content and getting it on the Web isn't easy enough: The lowest friction way I've seen thus far for personal publishing of audio content on the Web is the phone posting feature of LiveJournal but it is still a sub optimal solution. It gets worse when one considers how to create and share richer digital content such as videos. I suspect mobile phones will have a big part to play in the podcast creation if it becomes mainstream. On the other hand, sharing your words with the world doesn't get much easier than using the average blogging tool. 
  2. Viewing digital content is more time consuming than reading text content: I believe it takes the average person less time to read an average blog posting than to listen to an average audio podcast. This automatically reduces the size of the podcast market compared to plain old text blogging.  As mentioned earlier, the average Bloglines user subscribes to 20 feeds. Over the past two years, I've gone from subscribing to about 20 feeds to subscribing to around 160. However it would be impossible for me to find the time to listen to 20 podcast feeds a week, let alone scaling up to 160.
  3. Digital content tends to be opaque and lack metadata: Another problem with podcasting is that there are no explicit or implicit metadata standards forming around syndicating digital media content. The fact that an RSS feed is structured data that provides a title, author name, blog name, a permalink and so on allows one to build rich applications for processing RSS feeds both globally like Technorati & Feedster or locally like RSS Bandit. As long as digital media content are just opaque blobs of data hanging of an item in a feed, the ecosystem of tools for processing and consuming them will remain limited.

This is not to say that podcasting won't go a long way in making it easier for popular publishers to syndicate media content to users. It will, however it will not be the revolution in personal publishing that the combination of RSS and weblogging have been.

I'll need to remember to bring some of these up during Evan Williams' talk. I'm sure he'll have some interesting answers.


 

Charlene Li has a post entitled Bloghercon conference proposed where she writes

Quick – name me five woman bloggers. You probably came up with Wonkette, and if you’re reading this post, you’ve got me on your list. Can you come up with three more?

This is why Lisa Stone’s suggestion to develop Bloghercon is such a great idea. (Elisa Camahort has a follow-up post with more details here .)

It’s not that there are no women bloggers out there – it’s that we haven’t built up a network comparable to the “blog-boy’s club” that dominates the Technorati 100 . This is not to presume that there’s a conspiracy – just the reality that for a number of reasons, woman bloggers have had difficulty gaining visibility.

 

Interestingly enough I actually counted 10 women bloggers I know off of the top of my head without needing to count Charlene or knowing who this Wonkette person is. My list was Shelley Powers, Julia Lerman, Liz Lawley, Danah Boyd, Rebecca Dias, KC Lemson, Anita Rowland, Megan Anderson, Eve Maler and Lauren Wood. As I finished the list lots more came to mind, in fact I probably could have hit ten just counting women at MSN I know who blog but that would have been too easy.

 

I am constantly surprised by the people who read the closed circle of white-male dominated blogs commonly called the A-list who think that this somehow constitutes the entire blogosphere (I do dislike that word) or even a significant part of it.

 

I wonder when the NAACP or Jesse Jackson are going to get in on the act and hold a blaggercon conference for black bloggers. Speaking of which, it's my turn to ask "Quick – name me five black bloggers". Post your answers in the comments.


 

Categories: Ramblings

A bunch of folks at work have been prototyping a server-side RSS reader at http://www.start.com/1/. This isn't a final product but instead is intended to show people some of the ideas we at MSN are exploring around providing a rich experience around Web-based RSS/Atom aggregation.  

The Read/Write Web blog has a post entitled Microsoft's Web-based RSS Aggregator? which has a number of screenshots showing the functionality of the site. The site has been around for a few weeks and I'm pretty surprised it took this long to get widespread attention.

We definitely would love to get feedback from folks about the site. I'm personally interested in where people would like to see this sort of functionality integrated into the existing MSN family of sites and products, if at all.

PS: You may also want to check out http://www.start.com/2/ to test drive a prototype of a Web-based bookmarks manager.


 

Categories: MSN

March 8, 2005
@ 03:39 PM

A couple of days ago I was contacted about writing the foreword for the book Beginning RSS and Atom Programming by Danny Ayers and Andrew Watt. After reading a few chapters from the book I agreed to introduce the book.

When I started writing I wasn't familiar with the format of the typical foreword for a technical book. Looking through my library I ended up with two forewords that gave me some idea of how to proceed. They were Michael Rys's introduction of XQuery: The XML Query Language by Michael Brundage and Jim Miller's introduction of Essential .NET, Volume I: The Common Language Runtime by Don Box. I suspect I selected them because I've worked directly or indirectly with both authors and the folks who wrote the forewords to their books, so felt familiar about both the subjects and the people involved.

From the various forewords I read it seemed the goal of a foreword is twofold

  1. Explain why the subject matter is important/relevant to the reader
  2. Explain why the author(s) should be considered an authority in this subject area

I believe I achieved both these goals with the foreword I wrote for the book. The book is definitely a good attempt to distill what the important things a programmer should consider when deciding to work with XML syndication formats.

Even though I have written an academic paper, magazine articles and conference presentations this was a new experience. I keep getting closer and closer to the process of writing a book. Too bad I never will though.


 

Categories: Ramblings