Last week Andrew Conrad told me to check out a recent article by Adam Bosworth in the ACM Queue because he wondered what I thought about. I was rather embarassed to note that althought I'd seen some mention of it online, I hadn't read it. I read it today and as usual, Adam Bosworth is on point.

The article is entitled Learning from THE WEB and it begins by listing eight "unintuitive lessons" we have learned from the Web. The lessons are listed below

  1. Simple, relaxed, sloppily extensible text formats and protocols often work better than complex and efficient binary ones.

  2. It is worth making things simple enough that one can harness Moore’s law in parallel.

  3. It is acceptable to be stale much of the time.

  4. The wisdom of crowds works amazingly well.

  5. People understand a graph composed of tree-like documents (HTML) related by links (URLs).

  6. Pay attention to physics.

  7. Be as loosely coupled as possible.

  8. KISS. Keep it (the design) simple and stupid.

Where the paper gets interesting is that then tries to apply these lessons to XML. Remember that Adam was one of the founder of the XML team at Microsoft and knows a thing or two about it. So he writes

In my humble opinion, however, we ignored or forgot lessons 3, 4, and 5. Lesson 3 tells us that elements in XML with values that are unlikely to change for some known period of time (or where it is acceptable that they are stale for that period of time, such as the title of a book) should be marked to say this. XML has no such model.
...
Lesson 4 says that we shouldn’t over-invest in making schemas universally understood.
...
Lessons 1 and 5 tell us that XML should be easy to understand without schemas

I totally agree with his assessment of the lessons learned from lessons 4 & 5. However the issue of being able to mark an element in an XML file as 'relatively unchanging' in a generic way seems to be lost on me. He then goes on to point out more of the problems with XML [and the Semantic Web/RDF]

There are some interesting implications in all of this.

One is that the Semantic Web is in for a lot of heartbreak. It has been trying for five years to convince the world to use it. It actually has a point. XML is supposed to be self-describing so that loosely coupled works. If you require a shared secret on both sides, then I’d argue the system isn’t loosely coupled, even if the only shared secret is a schema. What’s more, XML itself has three serious weaknesses in this regard:

  1. It doesn’t handle binary data well.
  2. It doesn’t handle links.
  3. XML documents tend to be monolithic.

Now it's gotten pretty interesting and at this point, Adam throws the curve ball.

Recently, an opportunity has arisen to transcend these limitations. RSS 2.0 has become an extremely popular format on the Web. RSS 2.0 and Atom (which is essentially isomorphic) both support a base schema that provides a model for sets. Atom’s general model is a container (a <feed>) of <entry> elements in which each <entry> may contain any namespace scoped elements it chooses (thus any XML), must contain a small number of required elements (<id>, <updated>, and <title>), and may contain some other well-known ones in the Atom namespace such as <link>s. Even better, Atom clearly says that the order doesn’t matter.This immediately gives a simple model for sets missing in XML.
...
Atom also supports links of other sorts, such as comments, so clearly an Atom entry can contain links to related feeds (e.g., Reviews for a Restaurant or Complaints for a Customer) or links to specific posts. This gives us the network and graph model that is missing in XML. Atom contains a simple HTTP-based way to INSERT, DELETE, and REPLACE
s within a . There is a killer app for all these documents because the browsers already can view RSS 2.0 and Atom and, hopefully, will soon natively support the Atom protocol as well, which would mean read and write capabilities.

Now that's deep. Why not move up one level of abstraction from exchanging XML documents to exchanging Web Feeds (RSS/Atom documents)? Adam ends his article by throwing a challenge out to database vendors who he believes have failed to learn the lessons of the Web by writing

All of this has profound implications for databases. Today databases violate essentially every lesson we have learned from the Web.

  1. Are simple relaxed text formats and protocols supported? No.
  2. Have databases enabled people to harness Moore’s law in parallel? This would mean that databases could scale more or less linearly to handle both the volume of the requests coming in and even the complexity. The answer is no.
  3. Do databases optimize caching when it is OK to be stale? No.
  4. Do databases let schemas evolve for a set of items using a bottom-up consensus/tipping point? Obviously not.
  5. Do databases handle flexible graphs (or trees) well? No, they do not.
  6. Have the databases learned from the Web and made their queries simple and flexible? No, just ask a database if it has anyone who, if they have an age, are older than 40; and if they have a city, live in New York; and if they have an income, earn more than $100,000. This is a nightmare because of all the tests for NULL.
The article ends by arguing that database vendors should add native support for the Atom Protocol and wire format. I find this interesting since based on conversations on the atom-protocol list, it is clear that Google is very interested in the Atom API. Perhaps they have already built this Atom store that Adam is arguing for and will expose the Atom API as a way to interact with it. Perhaps this Atom store accessible via Atom feeds and the Atom API is Google Base? Speculation is fun.

As for me, I tend to agree with Adam that moving up layers of abstraction is a good idea. We've all agreed on XML, the next thing to do is to agree on applications of XML. We've all agreed on RSS, the next thing to do is figure out what scenarios are enabled by the subscribe model. This is one of the reasons why I disliked the unnecessary fragmentation caused by the RSS vs. Atom battles. As for whether we need to start seeing databases with native RSS/Atom support, I think it's too early in the game to jump there. Heck, RDF has been around for a while but we are just know seeing some decent things happening with SPARQL and various RDF stores. Similarly with XML and XQuery. I don't think enough lessons have been learned from either to start thinking about what it would mean to have a native RSS/Atom store. It is an interesting idea though. 


 

Categories: XML

There have been a couple of comparisons between last week's announcement's of Windows Live and Microsoft's Hailstorm initiative in the press. Since I gave a talk last week on the differences between our platform thinking then versus now, a few folks have suggested I blog about it so here goes.

My favorite article comparing Windows Live with Hailstorm was Mary Jo Foley's article Microsoft 'Live': 'Hailstorm' Take 2 where she wrote

Microsoft is mixing together rebranded MSN and bCentral properties, and seasoning with a dash of Hailstorm. Read all about that, and more, in Microsoft's 'Live' talking points.
...
We definitely were thinking Windows Live sounded a lot like the ill-fated Hailstorm, when we heard Microsoft's explanation of Windows Live. In case you need a refresher, Hailstorm, which Microsoft announced back in 2001, was designed to be is "a set of user-centric XML Web services that enable developers to build solutions that work seamlessly with one another over the Internet to deliver a more personalized and consistent user experience." Microsoft tabled Hailstorm shortly after its introduction, as a result of customer and partner concerns over privacy and security of the data.

As someone who's been working on our platform story in one way or the other from back when it was the 'MSN platform story' this is something I'm quite aware of. The big difference between Windows Live and Hailstorm is the difference between empowerment and exploitation.

Four years ago, while interning at Microsoft, I saw a demo about Hailstorm in which a user visiting an online CD retailer was showed an ad for a concert they'd be interested in based on their music preferences in Hailstorm. The thinking here was that it would be win-win because (i) all the user's data is entered and stored in one place which is convenient for the user (ii) the CD retailer can access the user's preferences from Hailstorm and cut a deal with the concert ticket provider to show their ads based on user preferences and (iii) the concert ticket provider gets their ads shown in a very relevant context.

The big problem with Hailstorm is that it assumed that potential Hailstorm partners such as retailers and other businesses would give up their customer data to Microsoft. As expected most of them told Microsoft to take a long walk of a short pier.

Fast forward a couple of years later. Microsoft now has some of the most popular services on the Web; the world's most popular IM client, the world's most popular web-based email service, one of the world's most popular blogging services, and a host of other services that are utilized by hundreds of millions of people every day.

At this point it is clear that a number of these services can be exposed as platforms to enable our customers do more. Users deserve to have more options for creating content in their personal space, which is why we are exposing the MetaWeblog API for Spaces. People should be able to design and decide what components are shown on their personalized home page which is why we have Microsoft Gadgets. You should be able to annotate maps with information of interest to yourself and your friends which is why we have the Virtual Earth API. You should be able to subscribe to headlines about news of interest to you in your application of choice, which is why we provide RSS feeds for news search results in MSN Search.

It's about empowering our users.

We are currently thinking about how we transition from http://msdn.microsoft.com/msn and I'd love to see what developers would like to see from us. What APIs would you like us to open up? Also what would you like to see on the site? Is there a problem with the fact that there are a number of different MSN Windows Live developer sites like http://www.start.com/developer, http://msdn.microsoft.com/msn, and http://www.viavirtualearth.com or is having a number of specific product sites/communities better? We're building this platform for you so we'd definitely love to hear your comments.

Holla at me


 

Categories: Windows Live

Dear LazyWeb,
    I've been using the Flickr viewer gadget on my Live.com page and there's just one problem with it. The gadget doesn't persist its state so every time I load the page I have to re-enter the tags of the photos I want to view. I know it's possible for gadgets to persist their state because the ToDo List gadget does exactly that.

What I need is for some kindly Javascript junky to fix the Flickr viewer so all us Flickr fans on Live.com can enjoy its juicy goodness.


 

Categories: Windows Live

November 8, 2005
@ 06:41 PM

Over the weekend I got the following mail from an RSS Bandit user. The body of the message read

Have a suggestion on a set of blog entries (or articles for msdn or whatever) related to moving rss bandit to .net 2.0 / vs 2005. I’ve taken an application that uses the xpexplorerbar and imported it directly to vs 2005. I’m unable to compile due to a p/invoke issue (with LoadBitmap I believe). There is some documentation on MSDN for this, I’m sure it will be simple to correct.

In any case, RSS Bandit really is a better Windows Forms reference app than anything Microsoft is shipping with the product. It would be a very educational and worthwhile effort to document the steps required to move RSS Bandit to the new technology. As far as I’m concerned, Microsoft should champion the effort. Is there any chance this could happen?

We don't plan to move to v2.0 of the .NET Framework in the near future. RSS Bandit has a decent sized user base, with over 200,000 downloads this year. Most of these users are on v1.1 of the .NET Framework. We don't plan to move until a large number of our user base has migrated to v2.0 of the .NET Framework. Once this happens, we'll consider moving our development to .NET Framework v2.0. I assume this is about 1 to 2 years away.

I hate to promise anything so far in advance but if I have time after we port our application I will document some of the trickier issues in migrating to v2.0 of the .NET Framework. More than likely it'll just be a bunch of postsin my blog as opposed to a focused article on MSDN. I'm already so far behind when it comes to articles that I don't want to promise any more. 


 

Categories: RSS Bandit

Yesterday I sent out mail to a few dozen beta testers to let them know that we had started the beta of our implementation of the MetaWeblog API for MSN Spaces. For the uninitiated, the MetaWeblog API will enable you to create, update and delete posts within the blog on a space. This means that you'll be able to use tools like Blogjet and W.Bloggar to manage the blog on your space. A number of other interesting scenarios are now enabled by this as well including

  • Photo sites like Flickr that have a "Post photo(s) to your weblog" feature can now integrate with MSN Spaces.

  • Plugins for posting to MSN Spaces can be added to traditional text editors. An example of this is the Blogger for Word plugin

  • RSS/Atom readers that have a "Blog This" feature can be used to post directly to your space.

There are a lot more interesting ideas one can explore once the API is widely available. If you are interested in the beta testing our implementation of the MetaWeblog API, send me email; dareo at microsoft.com.


 

Categories: Windows Live

Google has reintroduced their Google Desktop with a vengeance. It was evil enough the first time around, but this time it’s downright scary. My original complaint was that Google Desktop ignores basic practices amongst RSS readers for saving bandwidth on the sites it is polling. It was pinging my site every 5 minutes asking for updates without caching the results and thus was using an unreasonable proportion of my bandwidth.

Since a new version was recently released,  I decided to try it out to see if the issue had been fixed since I sent them mail. I installed Fiddler to monitor the traffic of the application and what I found out surprised me a great deal. Google Desktop not only pings sites every 5 minutes in a manner inconsiderate of their bandwidth but it also does so without the users direction. Below is a screenshot of some of the HTTP traffic generated by Google Desktop

The highlighted requests are requests to URLs of Atom & RSS feeds that were in my browser cache by Google Desktop. I did not configure the application to fetch these feeds. So not only does Google Desktop flood websites with feed requests in a manner bordering on the behavior of a malicious application, it also does this automatically without the end user explicitly subscribing to the feed.

That's messed up.


 

In the post Feeds and well-formed XML Sean Lyndersay of the IE RSS team writes

Our years of experience in with HTML in Internet Explorer have taught us the long-term pain that results from being too liberal with what you accept from others. Hence, we’ve adopted the following overriding principle for IE 7 and RSS platform in Windows Vista: 

 We will only support feeds that are well-formed XML.

This principle allows us to build a more predictable feed parser. As a platform, it's important that applications using the platform to consume feeds can rely on the fact that the platform will always be providing information in the way that the publisher intended (trying to guess what a publisher meant to do when there is an error in a feed can be tricky, at best). We also spoke to several people in the RSS and developer community at Gnomedex and at PDC, and they wholeheartedly supported this.

Hell Yeah!!!


 

November 4, 2005
@ 06:09 PM

I saw someone reference the Dave Luebbert's reasons to clone Google's API and wonder what my opinion was in response. In my post from yesterday entitled Clone the Google APIs: Kill That Noise, I gave some technical reasons why we wouldn't want to clone the Google APIs for Windows Live Search.

However, there is probably a clarification that I should have made. In certain cases, there is one thing that trumps all technical arguments against cloning an API. That is when the API has significant market share amongst developers. This is one of the reasons why even though I thought that the MetaWeblog API is a disaster, we made the call that MSN Spaces will support the MetaWeblog API. Since the MetaWeblog API is a derivative of the Blogger API, you could argue that in this case we are cloning a Google API.

To me, the difference here is the case of mindshare. The Blogger & MetaWeblog APIs are widely supported across the weblogging industry and have become de facto industry standards. I don't believe the same can be said for the Google's search API. If anything, I'd say the OpenSearch is the closest to a de facto industry standard for search APIs although [for now] it has been ignored by the big three major search players.

On a similar note, I'd probably agree that the Google Maps API is probably on its way to reaching de facto standard and Yahoo! & Microsoft should just go ahead and clone it. If I worked on the mapping API for either company, I'd probably give it six months and if adoption hadn't increased significantly would consider cloning their API.


 

Categories: Web Development

One of the most eye opening observations I heard recently was a comment by Terry Semel, CEO of Yahoo!, where he pointed out that only 5% of page views on the Web are from search yet the account for about 40% of the revenue generated on the Web. To make this even clearer, consider this recent post on Om Malik's blog entitled Bigger Than Google, MySpace is different which states

Like all community sites that rely mostly on their users to author content, MySpace has had a very difficult time trying to secure high advertising rates. Historically, advertisers have held little trust in content that is not tightly controlled editorially and, therefore, the value they are willing to attach for ads placed next to such uncontrollable content has been very low. The result is clear… MySpace ranks higher than Google in terms of pageviews, but Google will gross $6 billion in revenues this year, while MySpace will generate about $30 million. The delta, which can be measured in orders of magnitude, is almost unbelievable. I realize the comparison is not directly apples to apples, but even so!

I bring this up because this is where Murdoch’s strategic opportunity lies… in eliminating that gap. Put another way, MySpace has a multi-billion dollar opportunity to exploit, which promises to break News Corp out of the media stock depression that it and all its fellow conglomerates have been suffering. Success on this front will demonstrate that News Corp can tap into the fastest growing segment of the advertising industry in a manner that befits Google and Yahoo!

This disparity in ad revenue is quite stunning. I agree with Terry Semel and others that this represents a significant opportunity. I wonder who'll sieze it first...


 

November 4, 2005
@ 02:59 AM

For the past few years my browser home page has alternated between http://news.google.com and http://my.yahoo.com. I like Google News for the variety of news they provide but end up gravitating back to Yahoo! News because I like having my stock quotes, weather reports and favorite news all in a single dashboard.

This morning I decided to try out live.com. After laying out my page, I went to microsoftgadgets.com to see what gadgets I could use to 'pimp my home page' and I found a beauty; the Seattle Bridge Traffic Gadget . I've talked about the power of gadgets in the past but this brought home to me how powerful it is to allow people to extend their personalized portal in whatever ways they wish. Below is a screenshot of my home page.

I'm definitely toying with building my own gadgets now. Matt has a killer gadget he's been working on in his free time that I think will be much appreciated by live.com users. If I ever find some free time, I suspect the gadget I'll end up writing will be one that has to do movie listings. Perhaps a gadget that shows the box office rankings of the previous week and also upcoming listings with information on local showtimes. Or maybe an MSN Spaces photo album gadget in the same vein as the Flickr gadget. There are not enough hours in the day...


 

Categories: Windows Live