Dare Obasanjo's weblog

Article Idea: Processing XML in the Real World

Coincidentally just as I finished reading a post by Tim Bray about Private Syndication, I got two bug reports filed almost simultaneously about RSS Bandit's support for secure RSS feeds. The first was SSL challenge for non-root certs where the user complained that instead of prompting the user when there is a problem with an SSL certificate like browsers do we simply fail. One could argue that this is the right thing to do especially when you have folks like Tim Bray suggesting that bank transactions and medical records should be flowing through RSS. However given the precedent set by web browsers we'll probably be changing our behavior. The second bug was that RSS Bandit doesn't support cookies. Many services use cookies to track authenticated users as well as provide individual views tailored to a user. Although there are a number of folks who tend to consider cookies a privacy issue, most popular websites use them and they are mostly harmless. I'll likely fix this bug in the next couple of weeks.

These bug reports in combination with a couple more issues I've had to deal with while writing code to process RSS feeds in RSS Bandit has givn me my inspiration for my next Extreme XML column. I suspect there's a lot of mileage that can be obtained from an article that dives deep into the various issues one deals with while processing XML on the Web (DTDs, proxies, cookies, SSL, unexpected markup, etc) which uses RSS as the concrete example. Any of the readers of my column care to comment on whether they'd like to see such an article and if so what they'd want to see covered?

Categories: RSS Bandit | XML

January 27, 2005

@ 09:54 PM

5 Things I'd Like To See Fixed In MSN Spaces

Mike has a post entitled 5 Things I Dislike About MSN Spaces where he lists the top 5 issues he has with the service.

Five things I dislike about MSN Spaces:

1. I can't use BlogJet to post to my blog today. Not a huge deal, but I love this little tool. Web browsers (Firefox included!) just don't do it for me in the way of text editing.

2. We don't have enough themes, which means there isn't enough differentiation between spaces. People come in a lot of flavors.

3. We don't have comments on anything other than blog entries, which means a lot of people are using comments to either a) comment on photos, or b) comment on the space itself.

4. The Statistics page doesn't roll-up RSS aggregator "hits", only web page-views. I want to know who is reading this post in various aggregators.

5. The recent blog entry on the MSN Messenger 7.0 beta contact card doesn't show enough information to compel me to visit the user's space.

since I'm the eternal copycat I also decided to put together a top 5 list of issues I'd like to see fixed in the service. Some of them I am in the process of fixing and some I'm still nagging folks like Mike to fix since they are UI issues and have nothing to do with my areas of responsibility. Here's my list

Five things I'd like to see fixed in MSN Spaces:

1. I can't post to my blog using standard weblog tools such as the BlogJet, w.bloggar or Flickr. I now have 3 blogs and I'd love to author my posts in one place then have it automatically posted to all three of them. My MSN Spaces blog prevents this from happening today.

2. The text control for posting to your blog should support editing raw HTML and support a WYSIWYG editting. I find it stunning that open source projects like dasBlog & Community Server::Blogs have a more full featured edit control for posting to ones blog than a MSN Spaces.

3. The user experience around managing and interacting with my blogroll is subpar. I'd love to be able to upload my aggregator subscription list as an OPML file to my MSN Space blog roll. While we're at it, it would be cool to render MSN Spaces blogs differently in my blog roll and links in my posts perhaps with a pawn such as what LiveJournal does with Livejournal user pawn which links to a user's profile.

4. I want to be able to upload music lists from other play lists formats besides Windows media player. I have a bunch of playlists in WinAmp and iTunes which I want to upload to my MSN Space but don't have the patience to transcribe. It is cool that we allow uploading Windows Media Player playlists but what about iTunes and WinAmp?

5. I need more ways to find other MSN Spaces that I might be interested in. The recently updated spaces and newly created spaces widgets on the MSN Spaces homepage aren't cutting it for me. In additon, once that is fixed it would also be cool if it was made really simple to subscribe to interesting MSN Spaces in my RSS aggregator of choice using a one-click subscription mechanism.

Categories: MSN

January 24, 2005

@ 12:23 AM

Folksonomies, Taxonomies and Metacrap

I've been doing a bit of reading about folksonomies recently. The definition of folksonomy in Wikipedia currently reads

Folksonomy is a neologism for a practice of collaborative categorization using simple tags in a flat namespace. This feature has begun appearing in a variety of social software. At present, the best examples of online folksonomies are social bookmarking sites like del.icio.us, a bookmark sharing site, Flickr, for photo sharing, and 43 Things, for goal sharing.

What I've found interesting about current implementations of folksonomies is that they are blogging with content other than straight text. del.icio.us is basically a linkblog and Flickr isn't much different from a photoblog/moblog. The innovation in these sites is in merging the category metadata from the different entries such that users can browse all the links or photos which match a specified keyword. For example, here are recent links added del.icio.us with the tag 'chicken' and recent photos added to Flickr with the tag 'chicken'. Both sites not only allow browsing all entries that match a particular tag but also go as far as alowing one to subscribe to particular tags as an RSS feed.

I've watched with growing bemusement as certain people have started to debate whether folksonomies will replace traditional categorization mechanisms. Posts such as The Innovator's Lemma by Clay Shirky, Put the social back into social tagging by David Weinberger and it's the social network, stupid! by Liz Lawley go back and forth about this very issue. This discussion reminds me of the article Don't Let Architecture Astronauts Scare You by Joel Spolsky where he wrote

A recent example illustrates this. Your typical architecture astronaut will take a fact like "Napster is a peer-to-peer service for downloading music" and ignore everything but the architecture, thinking it's interesting because it's peer to peer, completely missing the point that it's interesting because you can type the name of a song and listen to it right away.

All they'll talk about is peer-to-peer this, that, and the other thing. Suddenly you have peer-to-peer conferences, peer-to-peer venture capital funds, and even peer-to-peer backlash with the imbecile business journalists dripping with glee as they copy each other's stories: "Peer To Peer: Dead!"

I think Clay is jumping several steps ahead to conclude that explicit classification schemes will give way to categorization by users. The one thing people are ignoring in this debate (as in all technical debates) is that the various implementations of folksonomies are popular because of the value they provide to the user. When all is said and done, del.icio.us is basically a linkblog and Flickr isn't much different from a photoblog/moblog. This provides inherent value to the user and as a side effect [from the users perspective] each new post becomes part of an ecosystem of posts on the same topic which can then be browsed by others. It isn't clear to me that this dynamic exists everywhere else explicit classification schemes are used today.

One thing that is clear to me is that personal publishing via RSS and the various forms of blogging have found a way to trample all the arguments against metadata in Cory Doctorow's Metacrap article from so many years ago. Once there is incentive for the metadata to be accurate and it is cheap to create there is no reason why some of the scenarios that were decried as utopian by Cory Doctorow in his article can't come to pass. So far only personal publishing has provided the value to end users to make both requirements (accurate & cheap to create) come true.

Postscript: Coincidentally I just noticed a post entitled Meet the new tag soup by Phil Ringnalda pointing out that emphasizing end-user value is needed to woo people to create accurate metadata in the case of using semantic markup in HTML. So far most of the arguments I've seen for semantic markup [or even XHTML for that matter] have been academic. It would be interesting to see what actual value to end users is possible with semantic markup or whether it really has been pointless geekery as I've suspected all along.

Categories: Technology

January 21, 2005

@ 01:44 AM

My C-Omega Article Published

My article on Cω is finally published. It appeared on XML.com as Introducing Comega while it showed up as the next installment of my Extreme XML column on MSDN with the title An Overview of Cω: Integrating XML into Popular Programming Languages. It was also mentioned on Slashdot in the story Microsoft Research's C-Omega.

I'll be following this with an overview of ECMAScript for XML (E4X) in a couple of months.

Categories: Technology

January 21, 2005

@ 01:31 AM

O'Reilly Emerging Technology Conference 2005

It looks like I'm going to be attending the O'Reilly Emerging Technology Conference in San Diego. If you're there and want to find me I'll be with the rest of the exhibitors hanging out with MSR Social Computing Group talking about the social software applications that have been and are being built by MSN with input from the folks at Microsoft Research. I should also be at Tuesday's after party.

Categories: Technology

January 19, 2005

@ 05:42 PM

A Look at Some of the Negative Feedback on rel="nofollow"

If you've been following the blogosphere you should know by now that the Google, Yahoo! and MSN search engines decided to start honoring the rel="nofollow" attribute on links to mean that the linked page shouldn't get any increased ranking from that link. This is intended to reduce the incentive for comment spammers who've been flooding weblogs with links to their websites in comment fields. There is another side effect of the existence of this element which is pointed out by Shelley Powers in her post The Other Shoe on Nofollow where she writes

I expected this reason to use nofollow would take a few weeks at least, but not the first day. Scoble is happy about the other reason for nofollow: being able to link to something in your writing and not give ‘google juice’ to the linked.

Now, he says, I can link to something I dislike and use the power of my link in order to punish the linked, but it won’t push them into a higher search result status.

Dave Winer started this, in a way. He would give sly hints about what people have said and done, leaving you knowing that an interesting conversation was going on elsewhere, but you’re only hearing one side of it. When you’d ask him for a link so you could see other viewpoints, he would reply that "…he didn’t want to give the other party Google juice." Now I imagine that *he’ll link with impunity–other than the fact that Technorati and Blogdex still follow the links. For now, of course. I imagine within a week, Technorati will stop counting links with nofollow implemented. Blogdex will soon follow, I’m sure.

Is this so bad? In a way, yes it is. It’s an abuse of the purpose of the tag, which was agreed on to discourage comment spammers. More than that, though, it’s an abuse of the the core nature of this environment, where our criticism of another party, such as a weblogger, came with the price of a link. Now, even that price is gone.

I don't see this is an abuse of the tag, I see it as fixing a bug in Google's PageRank algorithm. It's always seemed broken to me that Google assumes that any link to a source is meant to convey that the target is authoritative. Many times people link to websites they don't consider authoritative for the topic they are discussing. This notion of 'the price of a link' has been based on a design flaw in Google's PageRank algorithm. Social norms should direct social behavior not bugs in software.

A post entitled Nofollow Sucks on the Aimless Words blog has the following statement

Consider what the wholesale implementation of this new web standard means within the blogosphere. "nofollow" is English for NO FOLLOW and common sense dictates that when spider finds this tag it will not follow the subsequent link.

The author of the blog post later retracts his statements but it does bring up an interesting point. Robert Scoble highlights the fact that it didn't take a standards committee to come up with this just backchannel conversations that took a few hours. However as Tim Ewald recently wrote in his post "Make it easy for people to pay you"

The value of the standardization process is that it digs issues - architectural, security, reliability, scalability, etc. - and addresses them. It also makes the language more tighter and less vague

The Aimless Words weblog points out that it is unclear to anyone who isn't party to whatever conversations that went on between Google, MSN, Yahoo and others what exactly are the semantics of rel='nofollow' on a link. Given that it is highly unlikely that all three search engines even use the same ranking algorithms I'm not even sure what it means for them to say the link doesn't contribute to the ranking of the site. Will the penalty that Yahoo search applies to such sites be the same in Google and MSN search? Some sort of spec or spec text would be nice to see instead of 'trust us' which seems to be what is emanating from all the parties involved at the current time.

PS: I was wondering why I never saw the posts about this on the Google blog in RSS Bandit and it turned out to be because the Google Blog atom feeds are malformed XML. Hopefully they'll fix this soon.

Categories: Technology

January 19, 2005

@ 04:36 PM

Comments [11]

Iraq was only the beginning?

The New Yorker has an article by Seymour Hersh entitled THE COMING WARS: What the Pentagon can now do in secret where he discusses alleged plans the US administration has for invading Iran in the near term. To article is scary reading but the part that had me the most stunned is the following excerpt

In my interviews over the past two months, I was given a much harsher view. The hawks in the Administration believe that it will soon become clear that the Europeans’ negotiated approach cannot succeed, and that at that time the Administration will act. "We’re not dealing with a set of National Security Council option papers here," the former high-level intelligence official told me. "They’ve already passed that wicket. It’s not if we’re going to do anything against Iran. They’re doing it."

The immediate goals of the attacks would be to destroy, or at least temporarily derail, Iran’s ability to go nuclear. But there are other, equally purposeful, motives at work. The government consultant told me that the hawks in the Pentagon, in private discussions, have been urging a limited attack on Iran because they believe it could lead to a toppling of the religious leadership. "Within the soul of Iran there is a struggle between secular nationalists and reformers, on the one hand, and, on the other hand, the fundamentalist Islamic movement," the consultant told me. "The minute the aura of invincibility which the mullahs enjoy is shattered, and with it the ability to hoodwink the West, the Iranian regime will collapse"—like the former Communist regimes in Romania, East Germany, and the Soviet Union. Rumsfeld and Wolfowitz share that belief, he said.

"The idea that an American attack on Iran’s nuclear facilities would produce a popular uprising is extremely illinformed," said Flynt Leverett, a Middle East scholar who worked on the National Security Council in the Bush Administration. "You have to understand that the nuclear ambition in Iran is supported across the political spectrum, and Iranians will perceive attacks on these sites as attacks on their ambitions to be a major regional player and a modern nation that’s technologically sophisticated." Leverett, who is now a senior fellow at the Saban Center for Middle East Policy, at the Brookings Institution, warned that an American attack, if it takes place, "will produce an Iranian backlash against the United States and a rallying around the regime."

This sounds suspiciously like the same reasoning that claimed that Iraqis would welcome the US led invasion with open arms. I know the saying "those who do not learn from history are doomed to repeat it" is a cliché but this is getting ridiculous. Maybe someone should get these folks a copy of DJ Green Lantern's Shade 45: Sirius Bizness mixtape and put on track 10 where Immortal Technique opens up the second verse with

They say the rebels in Iraq still fight for Saddam,
But that's bullshit i'll show you why it's totally wrong,
Cuz if another country invaded the hood tonight,
It'd be warfare through Harlem and Washington Heights
I wouldn't be fightin' for Bush or white americas dream,
I'd be fightin' for my peoples survival and self esteem,
I wouldn't fight for racist churches from the south my nigga,
I'd be fightin' to be keep the occupation out my nigga,

It doesn't take an expert in Middle East history with a Ph.D to figure this stuff out. The continual waste of life and resources going on in the Middle East due the Bush administrations misadventures completely turns my stomach.

Categories: Ramblings

January 18, 2005

@ 04:44 PM

The Game album finally out

After several months of waiting for The Game's new album, The Documentary is finally out. I knew subscribing to Amazon's RSS feeds would come in handy.

G-G-G-G-G-Unit!!!

Categories: Ramblings

January 18, 2005

@ 01:41 PM

Comments [2]

MyMSN Supports RSS and Atom Aggregation

It had to happen sooner or later. MyMSN now supports adding RSS or Atom 0.3 feeds as content sources for your home page. RSS/Atom content modules can be customized to show articles from up to 1 day old to up to 365 days old, display from up to 1 article to up to 30 articles and either show article summaries as a tool tip or inline following the article headline.

Below are screenshots of me testdriving the new features

In addition the MyMSN folks also provide a handy way to create a link that enables people to add a feed to their MyMSN front page. The following link will add my RSS feed to your MyMSN front page

http://my.msn.com/addtomymsn.armx?id=rss&ut=http://www.25hoursaday.com/weblog/SyndicationService.asmx/GetRss&ru=http://www.25hoursaday.com/weblog

Just as with other Web based aggregators there is also a handy button one can add to a website to enable one-click subscription to your RSS feed. This brings to a grand total of four buttons I need to add to my homepage; , , , and .

There is also an MSN RSS Directory that contains links to the RSS feeds produced by MSN properties such as MSNBC, MSN Autos and MSN Music.

I'm really glad to see this ship. When I was first hired onto MSN I was supposed to work with the MyMSN folks on this effort but eventually things changed. Even though I haven't been directly responsible for this feature I've been in touch with the folks driving it and I think it is totally killer that Microsoft has finally officially cast a stone in the XML syndication waters.

Great work all around.

Categories: MSN

January 17, 2005

@ 04:14 AM

Desktop Search: Solving the Wrong Problem as Quickly as Possible

Derek has a post entitled Search is not Search where he alludes to conversations we had about my post Apples and Oranges: WinFS and Google Desktop Search. His blog post reminds me about why I'm so disappointed that the benefits of adding structured metadata capabilities to file systems is being equated to desktop search tools that are a slightly better incarnation of the Unix grep command. Derek wrote

I was reminded of that conversation today, when catching up on a recent-ish publication from MIT's Haystack team: The Perfect Search Engine is Not Enough: A Study of Orienteering Behavior in Directed Search. One of the main points of the paper is that people tend not to use 'search' (think Google), even when they have enough information for search to likely be useful. Often they will instead go to a know location from which they believe they can find the information they are looking for.

For me the classic example is searching for music. While I tend to store my mp3s in a consistent directory structure such that the song's filename is the actual name of the song, I almost never use a generic directory search to find a song. I tend to think of songs as "song name: Conga Fury, band: Juno Reactor", or something like that, so when I'm looking for Conga Fury, I am more likely to walk the album subdirectories under my Juno Reactor directory, than I am to search for file "Conga Fury.mp3". The above paper talks a bit about why, and I think another key aspect that they don't mention is that search via navigation leverages our brain's innate fuzzy computation abilities. I may not remember how to spell "Conga Fury" or may think that it was "Conga Furvor", but by navigating to my solution, such inaccuracies are easily dealt with.

As Derek's example shows, comparing the scenarios enabled by a metadata based file system against those enabled by desktop search is like comparing navigating one's music library using iTunes versus using Google Desktop Search or the MSN Desktop Search to locate audio files.

Joe Wilcox (of Jupiter Research) seems to have reached a similar conclusion based on my reading of his post Yes, We're on the Road to Cairo where he wrote

WinFS could have anchored Microsoft's plans to unify search across the desktop, network and the Internet. Further delay creates opportunity for competitors like Google to deliver workable products. It's now obvious that rather than provide placeholder desktop search capabilities until Longhorn shipped, MSN will be Microsoft's major provider on the Windows desktop. That's assuming people really need the capability. Colleague Eric Peterson and I chatted about desktop search on Friday. Neither of us is convinced any of the current approaches hit the real consumer need. I see that as making more meaningful disparate bits of information and complex content types, like digital photos, music or videos.

WinFS promised to hit that need, particularly in Microsoft public demonstrations of Longhorn's capabilities. Now the onus and opportunity will fall on Apple, which plans to release metadata search capabilities with Mac OS 10.4 (a.k.a. "Tiger") in 2005. Right now, metadata holds the best promise of delivering more meaningful search and making sense of all the digital content piling up on consumers' and Websites' hard drives. But there are no standards around metadata. Now is the time for vendors to rally around a standard. No standard is a big problem. Take for example online music stores like iTunes, MSN Music or Napster, which all tag metadata slightly differently. Digital cameras capture some metadata about pictures, but not necessarily the same way. Then there are consumers using photo software to create their own custom metadata tags when they import photos.

I agree with his statements about where the real consumer need lies but disagree when he states that no standards around metadata exist. Music files have ID3 and digital images have EXIF. The problem isn't a lack of standards but instead a lack of support for these standards which is a totally different beast.

I was gung ho about WinFS because it looked like Microsoft was going to deliver a platform that made it easy for developers to build applications that took advantage of the rich metadata inherent in user documents and digital media. Of course, this would require applications that created content (e.g. digital cameras) to actually generate such metadata which they don't today. I find it sad to read posts like Robert Scoble's Desktop Search Reviewers' Guide where he wrote

2) Know what it can and can't do. For instance, desktop search today isn't good at finding photos. Why? Because when you take a photo the only thing that the computer knows about that file is the name and some information that the camera puts into the file (like the date it was taken, the shutter speed, etc). And the file name is usually something like DSC0050.jpg so that really isn't going to help you search for it. Hint: put your photos into a folder with a name like "wedding photos" and then your desktop search can find your wedding photos.

What is so depressing about this post is that it costs very little for the digital camera or its associated software to tag JPEG files with comments like 'wedding photos' as part of the EXIF data which would then make them accessible to various applications including desktop search tools.

Perhaps the solution isn't expending resources to build a metadata platform that will be ignored by applications that create content today but instead giving these applications incentive to generate this metadata. For example, once I bought an iPod I became very careful to ensure that the ID3 information on the MP3s I'd load on it were accurate since I had a poor user experience otherwise.

I wonder what the iPod for digital photography is going to be. Maybe Microsoft should be investing in building such applications instead of boiling the oceans with efforts like WinFS which aim to ship everything including the kitchen sink in version 1.0.

Categories: Technology

January 17, 2005

@ 03:14 AM

Comments [2]

On 1.5 million MSN Spaces created in less than 1.5 months

About two weeks ago there was an interview on C|Net with Bill Gates entitled Gates taking a seat in your den where he mentioned there had been 1 million MSN Spaces created in our first month. About a week later Mike Torres blogged that the number had risen to 1.5 million MSN Spaces created. In response to both of these statements about the growth of MSN Spaces I've seen a couple of detractors complaining about our adoption numbers. A prototypical example of the kind of these comments is the following post by Ed Brill entitled Gates: close to a million people on MSN Spaces. Ed Brill wrote

I made this comment on Scoble's blog, here for y'all as well...
Not to take this too far afield, but this is one of those fascinating examples of how MS is so good at staying "on message", but how bad it makes them look when that message lacks credibility. Those of us in the blogging community look at this "1 million" number with an extremely crooked eye, no offense to Mike Torres and his work. We all know someone who created an MSN Space only for the purpose of checking it out, and will never use it again. We know there are people who blog elsewhere that created Spaces because it's more free web space. We know that there are "people" who created more than one space, just like "people" have more than one Hotmail account. But BillG says "1 million" and the choir says "yea, verily."
...
It's a fascinating culture to observe from the outside, and it often works. But when the claim is too far afield, it does nothing to help the corporate image and credibility. (In this case, neither did BillG's comment that "So no big problem; it's not that people have stopped using IE").

I was quite surprised by this outburst given that quoting the number of unique user accounts is common practice for online services. In fact in a recent press release from Six Apart entitled Weblogging Software Leader Six Apart Acquires LiveJournal it is stated

Six Apart, makers of the highly acclaimed Movable Type publishing platform and TypePad personal weblogging service, today announced that it has acquired Danga Interactive, Inc., the operators of the popular service LiveJournal, for an undisclosed amount of stock and cash. With the acquisition, Six Apart solidifies its position as the industry's recognized leader in weblogging software across all markets, and LiveJournal can continue its rapid growth trajectory under Six Apart's umbrella. As of today, the combined user base of both companies exceeds 6.5 million users, with thousands more added daily.

The 6.5 million user number above is calculated from about 1 million TypePad accounts and about 5.5 million LiveJournal accounts. Of course, anyone with a web browser can go to the LiveJournal statistics page where it states they currently have about 2.5 million active blogs out of 5.7 million blogs. In fact, according to the statistics page over 1.5 million blogs have never been updated. This means over 20% of the blogs on LiveJournal didn't get past the first post.

This isn't meant to single out LiveJournal especially since according to the Perseus blog survey from a few years ago, LiveJournal's retention numbers are the best in the industry. In fact, the Perseus blog survey estimates that about 66% of blogs are eventually abandoned. This is something that everyone on the MSN Spaces team is aware of and which Bill Gates himself alluded to in his interview that got Ed Brill so upset. Specifically Bill Gates said

Well, actually I think the biggest blogging statistic I know, which really blew me away, is that we've got close to a million people setting up blogs (Web logs) with the Spaces capability that's connected up to Messenger. Now, with blogs, you always have to be careful. The decay rate of "I started and I stopped" or "I started and nobody visited" is fairly high, but as RSS (Really Simple Syndication) has gotten more sophisticated and value-added search capabilities have come along, this thing is really maturing.

Given that caveat I'm not really sure what more Ed Brill expects. Given that MSN Spaces has been in beta for less than 2 months we don't have meaningful 'active' user numbers yet although from our daily stats it seems we are at least in the same ratio as the rest of the industry.

One of the unfortunate things about working for Microsoft is that no matter what we do we tend to get attacked. Eventually one learns to filter out useful feedback from the 'I hate Microsoft' crowd.

Categories: MSN

January 13, 2005

@ 02:32 PM

The Netflix Problem: Syndicating Ordered Lists in RSS

I recently found a complaint about how NetFlix's RSS feeds appear in RSS Bandit from Danny Glasser, a dev manager on my team, in his post Netflix sucks less?. He wrote

Netflix has recently created RSS feeds for subscribers' current queues and recent rental activity, so in theory I can exchange the URLs with friends and view their queues in an RSS aggregator. I've been playing with this a bit and unfortunately it doesn't render particularly well in RSS Bandit. It doesn't sort nicely and old entries aren't expired properly. I'm not sure if this is true with other aggregators but I suppose I could ask Dare.

I decided to take a look at the various Netflix RSS feeds and the problem became instantly obvious. Below is an excerpted version of the Netflix Top 100 RSS feed which I'll use the discuss the various problems with syndicating lists in RSS.

<rss version="2.0"> <channel> <title>Netflix Top 100</title> <ttl>20160</ttl> <link>http://www.netflix.com/Top100</link> <description>Top 100 Netflix movies, published every 2 weeks.</description> <language>en-us</language> <item> <title>1- Mystic River</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031232&trkid=134852</link> <description><![CDATA[Three childhood friends, Sean (Kevin Bacon), Dave (Tim Robbins) and Jimmy (Sean Penn) are reunited in Boston 25 years later when they are linked together in the murder investigation of Jimmy's daughter. ]]></description> </item> <item> <title>2- The Last Samurai</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031274&trkid=134852</link> <description><![CDATA[Tom Cruise stars as Captain Nathan Algren in this epic movie set in 1870s Japan. ]]></description> </item> <item> <title>3- Something's Gotta Give</title> <link>http://www.netflix.com/MovieDisplay?movieid=60031278&trkid=134852</link> <description><![CDATA[Sixty and still sexy, Harry (Jack Nicholson) is having the time of his life, wining, dining and bedding women half his age.]]></description> </item> </channel> </rss>

There are several problems with the above feed. The first is a combination of the fact that no mechanism is provided for uniquely identifying items in the feed using GUIDs and the lack of dates in the feed. The problem manifests itself when two weeks from now the top 100 list is refreshed. Using the above feed as an example imagine that a new entry becomes number 1 thus moving Mystic River and Last Samurai one notch down. Now several things break at once.

The first problem is that the user has no way of grouping together top 100 lists for each week so I can't have last month's top 100 list and this week's top 100 list in my aggregator in any sort of meaningful way. Even if there were dates the fact that there are no GUIDs means that the aggregator will likely use the <link> element to uniquely identify the item for determining whether the user has seen it or not. This means that only the new entrant to the list will be marked as unread while movies that were already in the list and have been seen remain unhighlighted. I can see arguments for both viewpoints. On the one hand Netflix may expect that the aggregator should always have 100 items in it with only the new entrants in the list being marked as unread and positions of movies changing from week to week. On the other hand, a user may want to keep the top 100 feeds for each time period in their aggregator so they can see a timeline of the movie rankings in their aggregator. In that case, every two weeks there should be a 100 new items waiting for the user. Unfortunately neither of these happens in RSS Bandit or a number of other aggregators with Netflix's current implementation. Instead old entries in the feed and new entries show up munged together with no separation of them based on date so users can't group by date. Another problem is that he link to the movie's page is the only thing used to uniquely identify the item. So when the feed is fetched and the position of a movie changes (i.e. the title changes) instead of creating a new item in the aggregator, RSS Bandit assumes it is a post whose title has been changed and simply updates the feed in place. This makes sense in 99% of aggregator scenarios when changing the title usually means a typo was fixed in a blog post. However in the Netflix case this means a movie will always show up with its most recent position in the top 100 list. BUT once the movie leaves the list (i.e. is dropped off the feed) the movie will remain at its last position seen in the feed within the aggregator.

The second problem is the fact that there is no way to tell the aggregator how to sort the list of movies. Sorting using the title won't work because it will be an alphabetical sort, ditto for using the description. Even if there were dates, using those for sorting wouldn't make much sense either. Ideally there would have to be some way for the item to specify its position relative to other items in the same list with it at a given point in time. Again, this would require the dates should be attached to the items in the feed.

There are a number of issues raised by the Netflix problem. One could look at the problem as an indication that there should be an item expiry mechanism in RSS so the aggregator should know to dump the list every 2 weeks and refresh it with the new list. Others could argue that this could be solved by giving each item a unique ID independent of the movie and specify its date as well as a sort position. This would allow the user to track changing lists over time even if the same item appears in the list multiple times.

I don't think I've seen anyone raise any of the various problems with the Netflix feeds online. This is surprising since I'd be hard pressed to imagine how any aggregator does the 'right' thing with these feeds. More importantly the Netflix feeds show a significant hole in RSS as well as syndication formats like Atom whose primary goal seems to be RSS feature parity.

I'm going to bring this up on the RSS-AggDev mailing list and see what the other aggregator developers think about this problem.

Categories: Syndication Technology

January 12, 2005

@ 02:28 PM

Photo Sharing with MSN Premium and MSN Spaces

MSN Search Supports RSS

Its begun to spread around the blogosphere that MSN has added support for RSS to a couple more of its web offerings. Yesterday on the MSN Search weblog, Brady announced that there are now RSS Feeds for Search Results on the MSN Search beta site. The URL below returns an RSS feed containing the first 20 items for a search for 'rss bandit'.

http://beta.search.msn.com/results.aspx?q=rss+bandit&format=rss&count=20

Looking at the results returned using Rex Swain's HTTP Viewer it seems the results don't return the Last-Modified or ETag HTTP headers. This means every time the aggregator queries the feed it'll get an XML document downloaded even if nothing has changed in the search results since the last time the query was sent. So as not to waste bandwidth on the client side I'll probably specify that the MSN Search feeds should only be fetched once a day. One surprising thing is that sponsored links don't show up in the search results. I'd have expected that they would given that they are often relevant to the search as well.

This is totally cool feature. The MSN Search folks are doing good things.

Categories: MSN

January 11, 2005

@ 04:06 PM

Comments [5]

I've been playing around with the photo album in my MSN Space and have begun to get interested in online photo sharing. I've never been big on taking pictures. The last time I took pictures were on my vacation in Hawaii with the ex last year but I didn't even get them after the breakup. Before that it was Freaknik in 1998. However after playing around with the MSN Spaces photo album I feel like sharing some pics other than RSS Bandit screenshots as part of my space. I'd definitely appreciate any tips from folks out there on purchasing a digital camera.

Once I was done geeking out about the MSN Spaces photo album I decided to check out what other hosted blogging services provided with regards to photo sharing. This is where I found out about Hello and BloggerBot. For those who aren't aware of it, Hello is an application for sharing images with people in real-time. A sort of instant messaging client with a photo slideshow feature. The BloggerBot feature of Hello allows you to post images to your blog hosted on Blogger.com from the Hello application. This integration makes sense since the company that created Hello was recently purchased by Google.

During my next daily rap session with Mike about Spaces, I brought up the photo sharing features of Hello and its integration with Blogger. Mike pointed out that a similar user experience was already possible using MSN. This is where I first learned about MSN Premium. The MSN Premium service is an MSN offering that provides a bunch of value adds to browsing the Web for under $10 a month. It includes a firewall, anti-virus software, Encarta, Microsoft Money, Outlook plugins and a number of photo management features. I tried the service yesterday and so far I like it. The MSN Outlook Connector which allows you to access Hotmail from Outlook is quite nice.

The photo sharing features of MSN Premium come in a couple of flavors. The first part is MSN Messenger Photo Swap which enables you to initiate a photo sharing session with any MSN Messenger user. This seems to be provide an equivalent experience to the real-time photo sharing features in Hello. Here is a screenshot of Mike Torres using Messenger Photo Swap to show me his vacation pics. The second major photo sharing feature of MSN premium is called Photo Email. With Photo Email you can send photo slideshows to people as regular HTML email. The email slide shows are a compressed version of a slide show of the full resolution images hosted on an automatically generated Web site which is linked to from the email. People can then view the full slide show then either download the images for printing or order prints online. Here is a screenshot of Photo Email I sent to myself of a modified version of RSS Bandit.

The ActiveX slideshow control used to host the images on the automatically generated website is extremely similar to that used by MSN Spaces. It shouldn't be too hard to send some sort of MSN Spaces photo email to invite people to view the photo album on your Space. I should remember to add this as a feature request on the MSN Spaces Wiki.

Then there is still the question of how one sends a picture to their MSN Spaces blog as a blog posting the same way Hello allows one to do so using the BloggerBot. The answer is the email posting feature of MSN Spaces. Simply enable Mobile Publishing on Mobile Settings tab of the Settings page of the MSN Space. Enter an email address (e.g. your mobile phone email if you are a moblogger) and turn on “publish immediately.” Enter a secret word. You can now blog direct to that email address (e.g. carnage4life.blogthis@spaces.msn.com) with a photo attachment and/or text. The subject of the e-mail becomes the subject of the post.

Categories: MSN

January 11, 2005

@ 02:37 PM

Comments [5]

RSS Aggregator Market Share and Default Feeds in RSS Bandit Installations

The folks behind FeedBurner have a blog post about RSS Market Share which discusses the distribution of aggregators they see polling their most popular feeds. They write

...RSS Client market is not yet consolidating, it's expanding. There were 409 different clients polling the top 800 FeedBurner feeds in September and now there are 719 different clients. FeedBurner actively catalogs the behavior and specifications for hundreds of these user-agents...

...This list is heavily skewed toward aggregators used on blog feeds, since most of our feeds are from blogs. This list might read quite differently for more traditional media feeds such as Reuters, NYT, CNET, etc. On a similar theme, individual publishers will notice that the overall market share may be wildly different from their own feed's market share. Simply removing our top 10 feeds from this data results in a wildly different market share list, possibly because of clients that ship with one or more of our top 10 feeds as a default. All of this pointing to the caution not to read too much into this single data point. We could make qualifications about everything on the list. Your mileage may vary, caveat emptor, mea culpa, c'est la vie..

Top 20 RSS clients across FeedBurner most highly subscribed 800 feeds as of January 6, 2005

Aggregator Name (Market Share Percentage)
1. Bloglines (32.86%)
2. NetNewsWire (16.95%)
3. Firefox Live Bookmarks (7.78%)
4. Pluck (7.20%)
5. NewsGator Online(4.45%)
6. (not identified) (4.07%)*
7. FeedDemon (3.83%)
8. SharpReader (3.27%)
9. My Yahoo (2.58%)
10. iPodder (2.42%)
11. NewsGator (2.23%)
12. Thunderbird (2.13%)
13. RSS Bandit (1.12%)
14. NewsFire (1.05%)
15. iPodderX (1.02%)
16. Sage (0.71%)
17. FeedReader (0.67%)
18. RssReader (0.54%)
19. LiveJournal (0.46%)
20. Opera RSS Reader (0.45%)

Although interesting, their numbers probably aren't reflective of the reality of the RSS aggregator market share. LiveJournal has over 5 million accounts with at least half of them being active users. I suspect there are far more people using their LiveJournal friends page as an RSS aggregator than the entire top 10 list combined.

However this does bring up a question I've been considering for a while. What should be the default feeds in an RSS Bandit installation? Besides the various RSS Bandit feeds we also subscribe the user to the RSS feeds for Microsoft Watch, Yahoo! News, BBC, Rolling Stone, Slashdot, Boing Boing and InstaPundit. I've been considering removing a few of these feeds such as InstaPundit since I don't read it regularly but the one or two times I've read it I didn't think much of it. I've also considered adding more blogs I read such as Robert Scoble or Dave Winer.

Given that RSS Bandit is moderately popular with about 50,000 downloads of the most recent version and about 130,000 total downloads over the past year I'm sure we'd be contributing a decent amount of readership to whatever feeds we install as default. Therefore I'd like some ideas from our users on what you think the best mix of feeds should be for folks installing RSS Bandit for the first time which in certain cases may be their first RSS aggregator.

Categories: RSS Bandit

January 9, 2005

@ 04:55 PM

A Common Mistake Made By Microsoft Detractors

Doc Searls has a post entitled Resistance isn't futile where he writes

Russell Beattie says "it's game over for a lot of Microsoft competitors." I don't buy it, and explained why in a comment that's still pending moderation. (When the link's up, I'll put it here.)
Meanwhile, I agree with what Phillip Swann (who is to TVs what Russell is to mobile devices) says about efforts by Microsoft and others to turn the TV into a breed of PC:
...it's not going to happen, no matter how much money is spent in the effort. Americans believe the TV is for entertainment and the PC is for work. New TV features that enhance the viewing experience, such as Digital Video Recorders, High-Definition TV, Video on Demand, Internet TV (the kind that streams Net-based video to the television, expanding programming choices) and some Interactive TV features (and, yes, just some), will succeed. Companies that focus on those features will also succeed.
But the effort to force viewers to perform PC tasks on the TV will crash faster than a new edition of a buggy PC software. I realize that doesn't speak to all of Russell's points, or to more than a fraction of Microsoft's agenda in the consumer electronics world; but it makes a critical distinction (which I boldfaced, above) that's extremely important, and hard to see when you're coming from the PC world.

It seems Doc Searls is ignoring the truth around him. Millions of people [including myself] watch TV by interacting with a PC via TiVo and other PVRs. I haven't met anyone who after using a PVR who wants to go back to regular TV. As is common with most Microsoft detractors Doc Searls is confusing the problems with v1/v2 of a product with the long term vision for the product. People used to say the same things about Windows CE & PalmOS but now Microsoft has taken the lead in the handheld market.

The current crop of Windows Media Centers have their issues, many of which have even been pointed out by Microsoft employees. However it is a big leap to translate that to people don't want more sophistication out of their television watching experience. TiVo has already taught us that people do. The question is who will be providing the best experience possible when the market matures?

Categories: Technology

January 9, 2005

@ 03:04 AM

A New Generation of Features for Programming Languages

Recently Ted Leung posted a blog entry entitled Linguistic futures where he summarized a number of recent discussions in the blogosphere about potential new features for the current crop of popular programming languages. He wrote

1. Metaprogramming facilities

Ian Bicking and Bill Clementson were the primary sources on this particular discussion. Ian takes up the simplicity argument, which is that metaprogramming is hard and should be limited -- of course, this gets you things like Python 2.4 decorators, which some people love, and some people hate. Bill Mill hates decorators so much that he wrote the redecorator, a tool for replacing decorators with their "bodies".

2. Concurrency

Tim Bray and Herb Sutter provided the initial spark here. The basic theme is that the processor vendors are finding it really hard to keep the clock speed increases going (that's actually been a trend for all of 2004), so they're going to start putting more cores on a chip... But the big take away for software is that uniprocessors are going to get better a lot more slowly than we are used to. So that means that uniprocessor efficiency matters again, and the finding concurrency in your program is also going to be important. This impacts the design of programming languages as well as the degree of skill required to really get performance out of the machine...

Once that basic theme went out, then people started digging up relevant information. Patrick Logan produced information on Erlang, Mozart, ACE, Doug Lea, and more. Brian McCallister wrote about futures and then discovered that they are already in Java 5.

It seems to me that Java has the best support for threaded programming. The dynamic languages seem to be behind on this, which is must change if these predictions hold up.

3. Optional type checking in Python

Guido van Rossum did a pair of posts on this topic. The second post is the scariest because he starts talking about generic types in Python, and after seeing the horror that is Java and C# generics, it doesn't leave me with warm fuzzies.

Patrick Logan, PJE, and Oliver Steele had worthwhile commentary on the whole mess. Oliver did a good job of breaking out all the issues, and he worked for quite a while on Dylan which had optional type declarations. PJE seems to want types in order to do interfaces and interface adaptation, and Patrick's position seems to be that optional type declarations were an artifact of the technology, but now we have type inference so we should use that instead.

Coincidentally I recently finished writing an article about Cω which has integrated both optional typing via type inference and concurrency into C#. My article indirectly discusses the existence of type inference in Cω but doesn't go into much detail. I don't mention the concurrency extensions in Cω in the article primarily due to space constraints. I'll give a couple of examples of both features in this blog post.

Type inference in Cω allows one to write code such as

public static void Main(){
  x = 5; 
  Console.WriteLine(x.GetType()); //prints "System.Int32"
}

This feature is extremely beneficial when writing queries using the SQL-based operators in Cω. Type inference allows one turn the following Cω code

public static void Main(){

  struct{SqlString ContactName; SqlString Phone;} row;
  
  struct{SqlString ContactName; SqlString Phone;}* rows = select
            ContactName, Phone from DB.Customers;
 
  foreach( row in rows) {
      Console.WriteLine("{0}'s phone number is {1}", row.ContactName, row.PhoneNumber);
   }
}

to

public static void Main(){

  foreach( row in select ContactName, PhoneNumber from DB.Customers ) {
      Console.WriteLine("{0}'s phone number is {1}", row.ContactName, row.PhoneNumber);
   }
}

In the latter code fragment the type of the row variable is inferred so it doesn't have to be declared. The variable is now seemingly dynamically typed but really isn't since the type checking is done at compile time. This seems to offer the best of both worlds because the programmer can write code as if it is dynamically typed but is warn of type errors at compile time when a type mismatch occurs.

As for concurrent programming, many C# developers have embraced the power of using delegates for asynchronous operations. This is one place where I think C# and the .NET framework did a much better job than the Java language and the JVM. If Ted likes what exists in the Java world I bet he'll be blown away by using concurrent programming techniques in C# and .NET. Cω takes the support for asynchronous programming further by adding mechanisms for tying methods together in the same way a delegate and its callbacks are tied together. Take the following class definition as an example

public class Buffer {
   public async Put(string s);
   public string Get() & Put(string s) { return s; }
}

In the Buffer class a call to the ~~Put()~~ Get() method blocks until a corresponding call to a ~~Get()~~ Put() method is made. Once this happens the parameters to the Put() method are treated as local variable declarations in the Get() method and then the code block runs. ~~Similarly a call to a Get() method blocks until a corresponding Put() method is called.~~ On the other hand a call to a Put() method returns immediately while its arguments are queued as inputs to a matching call to the Get() method. This assumes that each Put() call has a corresponding Get() call and vice versa.

There are a lot more complicated examples in the documentation available on the Cω website.

Categories: Technology

January 6, 2005

@ 05:06 PM

Comments [4]

Synchronizing RSS Bandit with USB Keychain Drives

An RSS Bandit user has created the RSS Bandit Flagged Item Merge Utility. The description of its usage states

If you use RSSBandit as your feedreader, you've probably used the Flag Item feature to store interesting articles for future access. However, if you use RSSBandit on multiple computers, your flagged item lists aren't synchronized. While you could upload your data to a central location, I find it more convenient to use a USB Memory Stick to keep these flagged item lists. I wrote this utility to synchronize from various lists to your main list. Get it here.

Usage: Select your main flagitems.xml file as the RSS Bandit Flag File and the file you want to merge into it as the Import File and hit Start. The application will display the items that it imported successfully or failed.

Although it is quite cool to see people writing tools to work with RSS Bandit, the fact is that you can already get this functionality out of RSS Bandit without resorting to tools. One of the options in the Remote Storage tab of the Tools->Options menu is 'File Share'. Although this option states that it only works on network shares it also works on local drives as well. So this means you can select an external drive such as "H:\" or whatever drive letter the USB keychain maps to and then synchronize with that.

That way all you need to synchronize RSS Bandit between two machines would be your USB keychain. I guess this means we should probably update the text of that dialog to explain that it's actually any drive not just network shares.

Categories: RSS Bandit

January 6, 2005

@ 04:54 AM

Comments [4]

Bill Gates on MSN and Social Software

C|Net News has an interview with Bill Gates entitled Gates taking a seat in your den. One of his most interesting answers from my perspective was his take on Microsoft and blogging. The question and his answer are excerpted below

One of the big phenomena of the year has been Web logging. Has the growth surprised you?

Well, actually I think the biggest blogging statistic I know, which really blew me away, is that we've got close to a million people setting up blogs with the Spaces capability that's connected up to Messenger.

Now, with blogs, you always have to be careful. The decay rate of "I started and I stopped" or "I started and nobody visited" is fairly high, but as RSS (Really Simple Syndication) has gotten more sophisticated and value-added search capabilities have come along, this thing is really maturing.

And we've done some things in Japan and Korea that are unique blog experiments. The Spaces thing is a worldwide effort. It's a great phenomena, and it's sort of built on e-mail, and so we need to integrate more blogging capability into the e-mail world--and as we do the next generation of Outlook, you'll see that. We need to integrate it more into our SharePoint, which is our collaboration Office platform, and then, as I discussed, MSN is embracing it so that instead of thinking about, "OK, I go to one community to do photos, one community to do social networking, one community to do this," we say, "Hey," off of Messenger, which has got your buddy list already, then, "Let's let you do the photos and the social networking and everything--but starting in an integrated way off of Messenger."

I also have been quite impressed by our signup rate, it has totally exceeded expectations. As BillG says above, we at MSN have been thinking a lot about the problems facing the existing social software landscape and how we can create the best place on the Web for people to communicate, share their experiences and interact with friends, family and strangers who may one day become friends or family. You guys haven't seen anything yet.

It's going to be a fun ride.

Categories: Mindless Link Propagation | MSN

January 6, 2005

@ 04:40 AM

Comments [2]

My First Mention in the Wall Street Journal

I was quite surprised to find out that my blog was mentioned in the Wall Street Journal. For those that don't have a WSJ subscription below is an excerpt of the story (***fair use***)

The rivalry between Google Inc. and Microsoft Corp. has been heating up since the Redmond, Wash., software behemoth last year unveiled its own search-engine technology. But tension between the two recently flared amid an online scrap about Google's use of open-source software.

The scuffle started with a Dec. 29 Web log post by Krzysztof Kowalczyk entitled "Google -- we take it all, give nothing back," in which the former Microsoft employee accused Google of freeloading. Mr. Kowalczyk, who now works at PalmOne Inc., cited a blog post by Google executive -- and former Microsoft staffer himself -- Adam Bosworth in which Mr. Bosworth called for open-source programmers to build better database software that Google and other big companies could use.

Mr. Kowalczyk wrote in his blog that Google gets an estimated tens of millions of dollars worth of software for free thanks to open-source developers, who release their programs without charge. And he alleged that Google gives little back to open source in return: "Microsoft creates more open-source code than Google." Microsoft staffer Dare Obasanjo excerpted portions of Mr. Kowalczyk's post on his personal blog and also took issue with at least one element of Mr. Bosworth's blogged response.

Mr. Bosworth fired back, posting in the comments section of Mr. Obasanjo's blog. "For Microsoft to condemn those of us who benefit from Open Source is rich," he wrote.

Spokesmen for Google and Microsoft declined to comment on the exchange. The Microsoft spokesman said the company "treats blogs as individuals expressing their independent opinion."

For those who missed the discussion and the original posts you can find them in my work blog in the posts entitled Google and Open Source and More on Google and Open Source.

Categories: Mindless Link Propagation

January 5, 2005

@ 03:54 PM

Getting Back in the Saddle

I finished my first article since I switching jobs this weekend. It's tentatively titled Integrating XML into Popular Programming Languages: An Overview of Cω and should show up on both XML.com and my Extreme XML column on MSDN at the end of the month. I had initially planned to do the overview of Cω(C-Omega) for MSDN and do a combined article about ECMAScript for XML (E4X) & Cω for XML.com but it turned out that just an article on Cω was already fairly long. My plan is to follow up with an E4X piece in a couple of months. For the geeks in the audience who are a little curious as to exactly what the heck Cω is, here's an introduction to one of the sections of the article to whet your appetite.

The Cω Type System

The goal of the Cω type system is to bridge the gap between Relational, Object and XML data access by creating a type system that is a combination of all three data models. Instead of adding built-in XML or relation types to the C# language, the approach favored by the Cω type system has been to add certain general changes to the C# type system that make it more conducive for programming against both structured relational data and semi structured XML data.

A number of the changes to C# made in Cω make it more conducive for programming against strongly typed XML, specifically XML constrained using W3C XML Schema. Several concepts from XML and XML Schema have analogous features in Cω. Concepts such as document order, the distinction between elements and attributes, having multiple fields with the same name but different values, and content models that specify a choice of types for a given field exist in Cω. A number of these concepts are handled in traditional Object<->XML mapping technologies but it is often with awkwardness. Cω aims at makes programming against strongly typed XML as natural as programming against arrays or strings in traditional programming languages.

I got a lot of good feedback on the article from a couple of excellent reviewers including the father of the X#/Xen himself, Erik Meijer. For those not in the know, X#/Xen was merged with Polyphonic C# to create Cω. Almost all of my article focuses on the aspects of Cω inherited from X#/Xen.

Categories: XML

January 2, 2005

@ 04:20 AM