Saturday, 05 February 2005 - Dare Obasanjo's weblog

February 5, 2005

@ 03:09 PM

RSS Bandit v1.3.x (Wolverine) Beta Installer Available

We are now feature complete for the next release of RSS Bandit. Interested users can download it at RssBandit.1.3.0.18.Wolverine.Beta.zip. All bugs specific to the previous alpha release have been fixed. The next steps for us are finish working on documentation, translations and fixing any bugs that are reported during the beta with the intention of shipping a final release in the next two weeks.

Check out a screenshot of the Wolverine Beta which shows some of the new features such as newspaper styles and the additional columns in the listview. The major new features are listed below but there are couple of minor ones such as proxy exclusion lists and better handling of SSL certificate issues which aren't listed. There will be a comprehensive list of bug fixes and new features in the announcement for the final release.

NEW FEATURES (compared to v1.2.0.117)

Newspaper styles: Ability to view all unread posts in feeds and categories or all posts in a search folder in a Newspaper view. This view uses templates that are in the same format as those used by FeedDemon so one can use RSS Bandit newspaper styles in FeedDemon and vice versa. One can choose to specify a particular stylesheet for a given feed or category. For example, one could use the slashdot.fdxsl stylesheet for viewing Slashdot, headlines.fdxsl for viewing news sites in the News category and outlook-express.fdxsl for viewing blog posts.
Column Chooser: Ability to specify which columns are shown in the list view from a choice of Headline, Topic, Date, Author, Comment Count, Enclosures, Feed Title and Flag. One can choose to specify a column layout for a particular feed or category. For example, one could use the layout {Headline, Author, Topic, Date, Comment Count} for a site like Slashdot but use one like {Headline, Topic, Date} for a traditional blog that doesn't have comments.
Category Properties: It is now possible to specify certain properties for all feeds within a category such as how often they should be updated or how long posts should be kept before being removed.
Identities: One can create multiple personas with associated information (homepage, name, email address, signature, etc) for use when posting comments to weblogs that support the CommentAPI.
del.icio.us Integration: Users who have accounts on the del.icio.us service can upload links to items of interest directly from RSS Bandit.
Skim Mode: Added option to 'Mark All Items As Read on Exiting a Feed'
Search Folder Improvements: Made the following additions to the context menu for search folders; 'New Search Folder', 'Refresh Search', 'Mark All Items As Read' and 'Rename Search Folder'. Also deletion of a search folder now prompts the user to prevent accidental deletion
Item Deletion: News items can be deleted by either using the [Del] key or using the context menu. Deleted items go to the "Deleted Items" special folder and can be deleted permanently by emptying the folder or restored to the original feed at a later date.
UI Improvements: Tabbed browsers now use a multicolored border reminiscent of Microsoft OneNote.

POSTPONED FEATURES (will be in the NightCrawler release)

NNTP Support
Synchronizing state should happen automatically on startup/shutdown
Applying search filters to the list view
Provide a way to export feeds from a category

Categories: RSS Bandit

February 5, 2005

@ 02:01 PM

Comments [0]

Michael Brundage on What it's Like to Work at Microsoft

When I used to work on the XML team at Microsoft, there were a number of people who I interacted with who were so smart I used to feel like I learned something new everytime I walked into their office. These folks include

Michael Brundage - social software geek before it was a hip buzzword, XML query engine expert and now working on the next generation of XBox
Joshua Allen - semantic web and RDF enthusiast, if not for him I'd dismiss the semantic web as a pipe dream evangelized by a bunch of theoreticians who wouldn't know real code if it jumped them in the parking lot and defecated on their shoes, now works at MSN but not on anything related to what I work on
Erik Meijer - programing language god and leading mind behind X# ~~Xen~~ Cω , he is a co-inventor on all my patent applications most of which started off with me coming into his office to geek out about something I was going to blog about
Derek Denny-Brown - XML expert from back when Tim Bray and co. were still trying to figure out what to name it, one heckuva cool cat

Anyway that was a bit of digression before posting the link mentioned in the title of the post. Michael Brundage has an essay entitled Working at Microsoft where he provides some of his opinions on the good, the bad, and the in-between of working at Microsoft. One key insight is that Microsoft tends to have good upper management and poor middle management. This insight strikes very close to home but I know better than to give examples of the latter in a public blog post. Rest assured it is very true and the effects on the company have cost it millions, if not billions of dollars.

Categories: Life in the B0rg Cube

February 3, 2005

@ 05:44 PM

Comments [0]

Square Pegs in Round Holes: W3C XML Schema and XInclude

One of the biggest assumptions I had about software development was shattered when I started working on the XML team at Microsoft. This assumption was that standards bodies know what they are doing and produce specifications that are indisputable. However I've come to realize that the problems of design by committee affects illustrious names such as the W3C and IETF just like everyone else. These problems become even more pernicious when trying to combine technologies defined in multiple specifications to produce a coherent end to end application.

An example of the problem caused by contradictions in core specifications of the World Wide Web is summarized in Mark Pilgrim's article, XML on the Web Has Failed. The issue raised in his article is that determining the encoding to use when processing an XML document retrieved off the Web via HTTP, such as an RSS feed, is defined in at least three specifications which contradict each other somewhat; XML 1.0, HTTP 1.0/1.1 and RFC 3023. The bottom line being that most XML processors including those produced by Microsoft ignore one or more of these specifications. In fact, if applications suddenly started following all these specifications to the letter a large number of the XML documents on the Web would be considered invalid. In Mark Pilgrim's article, 40% of 5,000 RSS feeds chosen at random would be considered invalid even though they'd work in almost all RSS aggregators and be considered well-formed by most XML parsers including the System.Xml.XmlTextReader class in the .NET Framework and MSXML.

The newest example, of XML specifications that should work together but instead become a case of putting square pegs in round holes is Daniel Cazzulino's article, W3C XML Schema and XInclude: impossible to use together??? which points out

The problem stems from the fact that XInclude (as per the spec) adds the xml:base attribute to included elements to signal their origin, and the same can potentially happen with xml:lang. Now, the W3C XML Schema spec says:

3.4.4 Complex Type Definition Validation Rules

Validation Rule: Element Locally Valid (Complex Type)
...

3 For each attribute information item in the element information item's [attributes] excepting those whose [namespace name] is identical to http://www.w3.org/2001/XMLSchema-instance and whose [local name] is one of type, nil, schemaLocation or noNamespaceSchemaLocation, the appropriate case among the following must be true:

And then goes on to detailing that everything else needs to be declared explicitly in your schema, including xml:lang and xml:base, therefore :S:S:S.

So, either you modify all your schemas to that each and every element includes those attributes (either by inheriting from a base type or using an attribute group reference), or you validation is bound to fail if someone decides to include something. Note that even if you could modify all your schemas, sometimes it means you will also have to modify the semantics of it, as a simple-typed element which you may have (with the type inheriting from xs:string for example), now has to become a complex type with simple content model only to accomodate the attributes. Ouch!!! And what's worse, if you're generating your API from the schema using tools such as xsd.exe or the much better XsdCodeGen custom tool, the new API will look very different, and you may have to make substancial changes to your application code.

This is an important issue that should be solved in .NET v2, or XInclude will be condemned to poor adoption in .NET. I don't know how other platforms will solve the W3C inconsistency, but I've logged this as a bug and I'm proposing that a property is added to the XmlReaderSettings class to specify that XML Core attributes should be ignored for validation, such as XmlReaderSettings.IgnoreXmlCoreAttributes = true. Note that there are a lot of Ignore* properties already so it would be quite natural.

I believe this is a significant bug in W3C XML Schema that it requires schema authors to declare up front in their schemas where xml:lang, xml:base or xml:base may occur in their documents. Since I used to be the program manager for XML Schema technologies in the .NET Framework this issue would have fallen on my plate. I spoke to Dave Remy who toke over my old responsibilities and he's posted his opinion about the issue in his post XML Attributes and XML Schema . Based on the discussion in the comments to his post it seems the members of my old team are torn on whether to go with a flag or try to push an errata through the W3C. My opinion is that they should do both. Pushing an errata through the W3C is a time consuming process and in the meantime using XInclude in combination with XML Schema is signficantly crippled on the .NET Framework (or on any other platform that supports both technologies). Sometimes you have to do the right thing for customers instead of being ruled by the letter of standards organizations especially when it is clear they have made a mistake.

Please vote for this bug on the MSDN Product Feedback Center.

Categories: XML

January 31, 2005

@ 06:14 PM

Comments [13]

Technorati Tags: Why Do Bad Ideas Keep Resurfacing?

So I just read an interesting post about Technorati Tags on Shelley Powers's blog entitled Cheap Eats at the Semantic Web Café. As I read Shelley's post I kept feeling a strong sense of deja vu which I couldn't shake. If you were using the Web in the 1990s then the following descriptions of Technorati Tags taken from their homepage should be quite familiar.

What's a tag?

Think of a tag as a simple category name. People can categorize their posts, photos, and links with any tag that makes sense.

....

The rest of the Technorati Tag pages is made up of blog posts. And those come from you! Anyone with a blog can contribute to Technorati Tag pages. There are two ways to contribute:

If your blog software supports categories and RSS/Atom (like Movable Type, WordPress, TypePad, Blogware, Radio), just use the included category system and make sure you're publishing RSS/Atom and we'll automatically include your posts! Your categories will be read as tags.
If your blog software does not support those things, don't worry, you can still play. To add your post to a Technorati Tag page, all you have to do is "tag" your post by including a special link. Like so:
<a href="http://technorati.com/tag/[tagname]" rel="tag">[tagname]</a>
The [tagname] can be anything, but it should be descriptive. Please only use tags that are relevant to the post. No need to include the brackets. Just make sure to include rel="tag".

Also, you don't have to link to Technorati. You can link to any URL that ends in something tag-like. These tag links would also be included on our Tag pages:
<a href="http://apple.com/ipod" rel="tag">iPod</a> <a href="http://en.wikipedia.org/wiki/Gravity" rel="tag">Gravity</a> <a href="http://flickr.com/photos/tags/chihuahua" rel="tag">Chihuahua</a>

If you weren't using the Web in the 1990s this may seem new and wonderful to you but the fact is we've all seen this before. The so-called Technorati Tags are glorified HTML META tags with all their attendant problems. The reason all the arguments in Shelley's blog post seemed so familiar is that a number of them are the same ones Cory Doctorow made in his article Metacrap from so many years ago. All the problems with META tags are still valid today most important being the fact that people lie especially spammers and that even well intentioned people tend to categorize things incorrectly or according to their prejudices.

META tags simply couldn't scale to match the needs of the World Wide Web and are mostly ignored by search engines today. I wonder why people think that if you dress up an old idea with new buzzwords (*cough* folksonomies *cough* social tagging *cough*) that it somehow turns a bad old idea into a good new one?

Categories: Technology

January 31, 2005

@ 04:35 PM

Comments [4]

News Stories Confuse Me: Microsoft Won't Bundle Desktop Search with Windows

I noticed an eWeek article this morning titled Microsoft Won't Bundle Desktop Search with Windows which has had me scratching my head all morning. The article contains the following excerpts

Microsoft Corp. has no immediate plans to integrate desktop search into its operating system, a company executive said at a conference here this weekend.
...
Indeed, while including desktop search in Windows might seem like a logical step to many, "there's no immediate plan to do that as far as I know," Kroese said. "That would have to be a Bill G. [Microsoft chairman and chief software architect Bill Gates] and the lawyers' decision."

I thought Windows already had desktop search. In fact, Jon Udell of Infoworld recently provided a screencast in his blog post Where was desktop search when we needed it? which shows off the capabilities of the built-in Windows desktop search which seems almost on par with the recent offerings from MSN, Google and the like.

Now I'm left wondering what the EWeek article means. Does it mean there aren't any plans to replace the annoying animated dog with the MSN butterfly? That Microsoft has access to a time machine and will go back and rip out desktop search from the operating system including the annoying animated dog? Or is there some other obvious conclusion that can be drawn from the facts and the article that I have failed to grasp?

The technology press really disappoints me sometimes.

Categories: Technology

January 29, 2005

@ 03:38 PM

Comments [8]

One Click Subscription: The World is Full of Bad Ideas

I had promised myself I wouldn't get involved in this debate but it seems every day I see more and more people giving credence to bad ideas and misconceptions. The debate I am talking about is one click subscriptions to RSS feeds which Dave Winer recently called the Yahoo! problem. Specifically Dave Winer wrote

Yahoo sends emails to bloggers with RSS feeds saying, hey if you put this icon on your weblog you'll get more subscribers. It's true you will. Then Feedster says the same thing, and Bloglines, etc etc. Hey I did it too, back when Radio was pretty much the only show in town, you can see the icon to the right, if you click on it, it tries to open a page on your machine so you can subscribe to it. I could probably take the icon down by now, most Radio users probably are subscribed to Scripting News, since it is one of the defaults. But it's there for old time sake, for now.

Anyway, all those logos, when will it end? I can't imagine that Microsoft is far behind, and then someday soon CNN is going to figure out that they can have their own branded aggregator for their own users (call me if you want my help, I have some ideas about this) and then MSNBC will follow, and Fox, etc. Sheez even Best Buy and Circuit City will probably have a "Click here to subscribe to this in our aggregator" button before too long.

That's the problem.

Currently I have four such one click subscription buttons on my homepage; , , , and . Personally, I don't see this as a problem since I expect market forces and common sense to come into play here. But let's see what Dave Winer proposes as a solution.

Ask the leading vendors, for example, Bloglines, Yahoo, FeedDemon, Google, Microsoft, and publishers, AP, CNN, Reuters, NY Times, Boing Boing, etc to contribute financially to the project, and to agree to participate once it's up and running.
...
Hire Bryan Bell to design a really cool icon that says "Click here to subscribe to this site" without any brand names. The icon is linked to a server that has a confirmation dialog, adds a link to the user's OPML file, which is then available to the aggregator he or she uses. No trick here, the technology is tried and true. We did it in 2003 with feeds.scripting.com.

This 'solution' to the 'problem' boggled my bind. So every time someone wants to subscribe to an RSS feed it should go through a centralized server? The privacy implications of this alone are significant let alone the creation of a central point of failure. In fact Dave Winer recently posted a link that highlights a problem with centralized services related to RSS.

Besides my issues with the 'solution' I have quite a few problems with the so-called 'Yahoo problem' as defined. The first problem is that it only affects web based aggregators that don't have a client application installed on the user's desktop. Desktop aggregators already have a de facto one-click subscription mechanism via the feed URI scheme which is supported by at least a dozen of the most popular aggregators across multiple platforms including the next version of Apple's web browser Safari. Recently both Brent Simmons (in his post feed: protocol) and Nick Bradbury (in his post Really Simple Subscription) reiterated this fact. In fact, all four services whose buttons I have on my personal blog can utilize the feed URI scheme as a subscription mechanism since all four services have desktop applications they can piggyback this functionality onto. Yahoo and MSN have toolbars, Newsgator has their desktop aggregator and Bloglines has its notifier.

The second problem I have with this is that it aims to stifle an expression of competition in the market place. If the Yahoo! aggregator becomes popular enough that every website puts a button for it beside a link to their RSS feed then they deserve kudos for spreading their reach since it is unlikely that this would happen without some value being provided by their service. I don't think Yahoo! should be attacked for their success nor should their success be termed some sort of 'problem' for the RSS world to fix.

As I mentioned earlier in my post I was going to ignore this re-re-reiteration of the one click subscription debate until I saw rational folks like Tim Bray in his post One-Click subscription actually discussing the centralized RSS subscription repository idea without giving it the incredulity it deserves. Of course, I also see some upturned eyebrows at this proposal from folks like Bill de hÓra in his post A registry for one click subscription, anyone?

Categories: Syndication Technology

January 28, 2005

@ 12:13 AM

Comments [3]

Article Idea: Processing XML in the Real World

Coincidentally just as I finished reading a post by Tim Bray about Private Syndication, I got two bug reports filed almost simultaneously about RSS Bandit's support for secure RSS feeds. The first was SSL challenge for non-root certs where the user complained that instead of prompting the user when there is a problem with an SSL certificate like browsers do we simply fail. One could argue that this is the right thing to do especially when you have folks like Tim Bray suggesting that bank transactions and medical records should be flowing through RSS. However given the precedent set by web browsers we'll probably be changing our behavior. The second bug was that RSS Bandit doesn't support cookies. Many services use cookies to track authenticated users as well as provide individual views tailored to a user. Although there are a number of folks who tend to consider cookies a privacy issue, most popular websites use them and they are mostly harmless. I'll likely fix this bug in the next couple of weeks.

These bug reports in combination with a couple more issues I've had to deal with while writing code to process RSS feeds in RSS Bandit has givn me my inspiration for my next Extreme XML column. I suspect there's a lot of mileage that can be obtained from an article that dives deep into the various issues one deals with while processing XML on the Web (DTDs, proxies, cookies, SSL, unexpected markup, etc) which uses RSS as the concrete example. Any of the readers of my column care to comment on whether they'd like to see such an article and if so what they'd want to see covered?

Categories: RSS Bandit | XML

January 27, 2005

@ 09:54 PM

Comments [0]

5 Things I'd Like To See Fixed In MSN Spaces

Mike has a post entitled 5 Things I Dislike About MSN Spaces where he lists the top 5 issues he has with the service.

Five things I dislike about MSN Spaces:

1. I can't use BlogJet to post to my blog today. Not a huge deal, but I love this little tool. Web browsers (Firefox included!) just don't do it for me in the way of text editing.

2. We don't have enough themes, which means there isn't enough differentiation between spaces. People come in a lot of flavors.

3. We don't have comments on anything other than blog entries, which means a lot of people are using comments to either a) comment on photos, or b) comment on the space itself.

4. The Statistics page doesn't roll-up RSS aggregator "hits", only web page-views. I want to know who is reading this post in various aggregators.

5. The recent blog entry on the MSN Messenger 7.0 beta contact card doesn't show enough information to compel me to visit the user's space.

since I'm the eternal copycat I also decided to put together a top 5 list of issues I'd like to see fixed in the service. Some of them I am in the process of fixing and some I'm still nagging folks like Mike to fix since they are UI issues and have nothing to do with my areas of responsibility. Here's my list

Five things I'd like to see fixed in MSN Spaces:

1. I can't post to my blog using standard weblog tools such as the BlogJet, w.bloggar or Flickr. I now have 3 blogs and I'd love to author my posts in one place then have it automatically posted to all three of them. My MSN Spaces blog prevents this from happening today.

2. The text control for posting to your blog should support editing raw HTML and support a WYSIWYG editting. I find it stunning that open source projects like dasBlog & Community Server::Blogs have a more full featured edit control for posting to ones blog than a MSN Spaces.

3. The user experience around managing and interacting with my blogroll is subpar. I'd love to be able to upload my aggregator subscription list as an OPML file to my MSN Space blog roll. While we're at it, it would be cool to render MSN Spaces blogs differently in my blog roll and links in my posts perhaps with a pawn such as what LiveJournal does with Livejournal user pawn which links to a user's profile.

4. I want to be able to upload music lists from other play lists formats besides Windows media player. I have a bunch of playlists in WinAmp and iTunes which I want to upload to my MSN Space but don't have the patience to transcribe. It is cool that we allow uploading Windows Media Player playlists but what about iTunes and WinAmp?

5. I need more ways to find other MSN Spaces that I might be interested in. The recently updated spaces and newly created spaces widgets on the MSN Spaces homepage aren't cutting it for me. In additon, once that is fixed it would also be cool if it was made really simple to subscribe to interesting MSN Spaces in my RSS aggregator of choice using a one-click subscription mechanism.

Categories: MSN

January 24, 2005

@ 12:23 AM

Comments [1]

Folksonomies, Taxonomies and Metacrap

I've been doing a bit of reading about folksonomies recently. The definition of folksonomy in Wikipedia currently reads

Folksonomy is a neologism for a practice of collaborative categorization using simple tags in a flat namespace. This feature has begun appearing in a variety of social software. At present, the best examples of online folksonomies are social bookmarking sites like del.icio.us, a bookmark sharing site, Flickr, for photo sharing, and 43 Things, for goal sharing.

What I've found interesting about current implementations of folksonomies is that they are blogging with content other than straight text. del.icio.us is basically a linkblog and Flickr isn't much different from a photoblog/moblog. The innovation in these sites is in merging the category metadata from the different entries such that users can browse all the links or photos which match a specified keyword. For example, here are recent links added del.icio.us with the tag 'chicken' and recent photos added to Flickr with the tag 'chicken'. Both sites not only allow browsing all entries that match a particular tag but also go as far as alowing one to subscribe to particular tags as an RSS feed.

I've watched with growing bemusement as certain people have started to debate whether folksonomies will replace traditional categorization mechanisms. Posts such as The Innovator's Lemma by Clay Shirky, Put the social back into social tagging by David Weinberger and it's the social network, stupid! by Liz Lawley go back and forth about this very issue. This discussion reminds me of the article Don't Let Architecture Astronauts Scare You by Joel Spolsky where he wrote

A recent example illustrates this. Your typical architecture astronaut will take a fact like "Napster is a peer-to-peer service for downloading music" and ignore everything but the architecture, thinking it's interesting because it's peer to peer, completely missing the point that it's interesting because you can type the name of a song and listen to it right away.

All they'll talk about is peer-to-peer this, that, and the other thing. Suddenly you have peer-to-peer conferences, peer-to-peer venture capital funds, and even peer-to-peer backlash with the imbecile business journalists dripping with glee as they copy each other's stories: "Peer To Peer: Dead!"

I think Clay is jumping several steps ahead to conclude that explicit classification schemes will give way to categorization by users. The one thing people are ignoring in this debate (as in all technical debates) is that the various implementations of folksonomies are popular because of the value they provide to the user. When all is said and done, del.icio.us is basically a linkblog and Flickr isn't much different from a photoblog/moblog. This provides inherent value to the user and as a side effect [from the users perspective] each new post becomes part of an ecosystem of posts on the same topic which can then be browsed by others. It isn't clear to me that this dynamic exists everywhere else explicit classification schemes are used today.

One thing that is clear to me is that personal publishing via RSS and the various forms of blogging have found a way to trample all the arguments against metadata in Cory Doctorow's Metacrap article from so many years ago. Once there is incentive for the metadata to be accurate and it is cheap to create there is no reason why some of the scenarios that were decried as utopian by Cory Doctorow in his article can't come to pass. So far only personal publishing has provided the value to end users to make both requirements (accurate & cheap to create) come true.

Postscript: Coincidentally I just noticed a post entitled Meet the new tag soup by Phil Ringnalda pointing out that emphasizing end-user value is needed to woo people to create accurate metadata in the case of using semantic markup in HTML. So far most of the arguments I've seen for semantic markup [or even XHTML for that matter] have been academic. It would be interesting to see what actual value to end users is possible with semantic markup or whether it really has been pointless geekery as I've suspected all along.

Categories: Technology

January 21, 2005

@ 01:44 AM

Comments [1]

My C-Omega Article Published

My article on Cω is finally published. It appeared on XML.com as Introducing Comega while it showed up as the next installment of my Extreme XML column on MSDN with the title An Overview of Cω: Integrating XML into Popular Programming Languages. It was also mentioned on Slashdot in the story Microsoft Research's C-Omega.

I'll be following this with an overview of ECMAScript for XML (E4X) in a couple of months.

Categories: Technology

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Saturday, 05 February 2005 - Dare Obasanjo's weblog

3.4.4 Complex Type Definition Validation Rules

What's a tag?