Dare Obasanjo's weblog

XSD, RELAX NG and Why We Didn't Ship System.Xml.IXmlType

Tim Bray has a post entitled More Relax where he writes

I often caution people against relying too heavily on schema validation. “After all,” I say, “there is lots of obvious run-time checking that schemas can’t do, for example, verifying a part number.” It turns out I was wrong; with a little extra work, you can wire in part-number validation—or pretty well anything else—to RelaxNG. Elliotte Rusty Harold explains how. Further evidence, if any were required, that RelaxNG is the world’s best schema language, and that anyone who who’s using XML but not RelaxNG should be nervous.

Elliote Rusty Harold's article shows how to plug in custom datatype validation into Java RELAX NG validators. This enables one to enforce complex constraints on simple types such as such as "the content of an element is correctly spelled, as determined by consulting a dictionary file" or "the number is prime" to take examples from ERH's article.

Early in the design of the version 2.0 of the System.Xml namespace in the .NET Framework we considered creating a System.Xml.IXmlType interface. This interface would basically represent the logic for plugging one's custom types into the XSD validation engine. After a couple of months and a number of interesting discussions between myself, Joshua and Andy we got rid of it.

There were two reasons we got rid of this functionality. The simple reason was that we didn't have much demand for this functionality. Whenever we had people complaining about the limitations of XSD validation it was usually due to its inability to define co-occurence constraints (i.e. if some element or attribute has a certain value then the expected content should be blah) and other aspects of complex type validation than needing more finer grained simple type validation. The other reason was that the primary usage of XSD for many of our technologies is primarily as a type system not as a validation language. There's already the fact that XSD schemas are used to generate .NET Framework classes via the System.Xml.Serialization.XmlSerializer and relational tables via the System.Data.DataSet. However there were already impedence mismatches between these domains and XSD, for example if one defined a type as xs:nonNegativeInteger this constraint was honored in the generated C#/VB.NET classes created by the XmlSerializer or in the relational tables created by the DataSet. Then there was the additional wrinkle that at the time we were working on XQuery which used XSD as its typoe system and we had to factor in the fact that if people could add their own simple types we didn't just have to worry about validation but also how query operators would work on them. What would addition, multiplication or subtraction of such types mean? How would type promotion, casting or polymorphism work with some user's custom type defined outside the rules of XSD?

Eventually we scrapped the class as having too much cost for too little benefit.

This reminds me of Bob DuCharme's XML 2004 talk Documents vs. Data, Schemas vs. Schemas where he advised people on how to view RELAX NG and XSD. He advised viewing RELAX NG as a document validation language and considering XSD as a datatyping language. I tend to agree although I'd probably have injected something in there about using XSD + Schematron for document validation so one could get the best of both worlds.

Categories: XML

November 29, 2004

@ 05:58 AM

Bloggers Are Dumb: Target, Anal Massages and Marijuana

In Jeremy Zawodney's post Are bloggers really that dumb? he writes

I'm not sure if I should be insulted, disappointed, or both. There's a lot of noise out there right now about some dummy data that ended up the Target.com website. Steve Rubel goes so far as to call this a PR crisis in the blogosphere. Even Scoble is complaining.

Jesus Fu@%ing Christ, people. It's a stupid mistake. Are we too screwed up to realize that companies are composed of people and that people sometimes make mistakes? I don't know about you, but I see really big differences between this and the Kryptonite "pick a lock with a ball point pen" crisis. (Hint: It actually was a crisis.)

Any regular reader of Boing Boing has seen this issue before. First it was the Target sells "anal massage" hubbub which turned out to be the result of the fact that Target.com basically mirrors content from Amazon.com but leaves off significant identifying information. Now it's the fact that Target.com is mirroring the page for the book 'Marijuana' published by Rosen Publishing. I guess bloggers are hungry to be the next one who discovers a scandal as juicy as the Trent Lott or Dan Rather fiascos.

Categories: Mindless Link Propagation

November 29, 2004

@ 03:09 AM

The Taming of the Shrew?

Before Jay-Z: Independent Women & Bills Bills Bills

After Jay-Z: Cater 2 U

Categories: Ramblings

November 26, 2004

@ 08:03 PM

What Happened to SIAM?

Several months ago I wrote a draft spec entitled Synchronization of Information Aggregators using Markup (SIAM) which was aimed at providing a lightweight mechanism for aggregators to synchronize state across multiple machines. There was a flurry of discussion about this between myself and a number of other aggregator authors on the [now defunct] information_aggregators mailing list.

However although there was some interest amongst aggregator authors, there wasn't much incentive to implement the functionality for a number of reasons. The reasons range from the fact that it makes it easier for users to migrate aggregators which payware aggregator authors aren't enthusiastic about to the fact that there was no server side infrastructure for supporting such functionality. Ideally this feature would have been supported by a web service end point exposed by a person's weblog or online aggregator. So not much came of it.

Since then I've implemented syncing in RSS Bandit in an application specific manner. So also have the authors of Shrook and NewsGator. There is also the Bloglines Sync API which provides a web service end point to doing limited syncing of feed state whose limitations I pointed out in my post Thoughts on the Bloglines Sync API.

This post is primarily meant to answer questions asked by Dmitry Jemerov, the creator of Syndirella who is now the dev lead for JetBrains' Omea Reader.

Categories: Syndication Technology

November 26, 2004

@ 05:52 PM

RSS Bandit Roadmap Updated

This morning I updated the RSS Bandit road map. Mostly we punted a bunch of low priority features to a future release codenamed 'Nightcrawler'. We also took one or two features out of the 'Wolverine' release. The current plan is to ship a beta of 'Wolverine' next month with the big features being the ability to delete items, newspaper views that are 100% compatible with FeedDemon Newspapers and being able to read & post to USENET newsgroups. Afterwards a final version will show up at the end of January or early in February at the latest.

There are also dozens of bug fixes in the 'Wolverine' release. Thanks a lot to the numerous users who took the time to submit bug reports and have been patiently waiting for a release with their fixes. We are now approaching the end game, a lot of the hard work's been done in the underlying infrastructure and the rest of the work left is fairly straightforward.

Categories: RSS Bandit

November 26, 2004

@ 03:58 PM

50 Cent & G-Unit in the Motherland

Thanks to my sister coming over for Thanksgiving I find out that G-Unit will be performing in concert in Nigeria. The interesting bit for me is that since my mom works for a TV station back home she'll get to meet 50 Cent and crew personally. While talking to her on the phone this morning it took all my restraint to not ask her to get an autograph for me.

I felt like I was twelve years old. :)

Categories: Ramblings

November 23, 2004

@ 04:17 AM

Ego Surfing on Google Scholar

For the past few years I've used the citation search feature of CiteSeer to look for references to papers or articles I'd written and could only come up with one; Concurrency And Computation: Practice And Experience . Running the same search on Google Scholar comes back with 12 papers which reference articles or papers I've written.

As I expected my C# vs. Java comparison was my most referenced article. Speaking of which it looks like it is about time I got cranking on updating the document to take into account Tiger and Whidbey. All I need now is some Java expert [preferrably a Sun employee] to agree to review it from the Java perspective.

I am definitely curious as to how Google could come up with a more extensive database of published papers than CiteSeer. Interesting.

Categories: Ramblings

November 21, 2004

@ 10:08 PM

Post to http://del.icio.us from your Favorite RSS Aggregator

I was procrastinating this morning from doing any real work and stumbled on a feature request in the RSS Bandit feature request database on Sourceforge requesting that we support adding links to http://del.icio.us from RSS Bandit. For those who are unfamiliar with the site, del.icio.us is a social bookmarks manager. It allows you to easily add sites you like to your personal collection of links, to categorize those sites with keywords, and to share your collection not only between your own browsers and machines, but also with others

Since RSS Bandit supports the IBlogExtension plugin interface I thought it would make more sense to implement this as a plugin that should just work in any .NET Framework based RSS aggregator that supports it. It took about two hours to put it together and now you can simply drop the DeliciousPlugin.dll file in the plugin folder of RSS Bandit, SharpReader or NewsGator and get the ability to post item links to http://del.icio.us.

Download it from here: DeliciousPlugin.zip

Configuration
The first thing you have to do is configure the plugin by specifying your username and password. There is also the option of changing the del.icio.us API URL which isn't needed at the current time. The only reason that is there is because the del.icio.us API documentation states that the URL will change in the near future. The screenshots below should show how what this looks like

and this is the configuration dialog

Posting Links
The dialog box for posting enables you to edit the URL, description and associated tags before submitting to the site. If any of these fields isn't filled then this is considered an error and no submission is made. Below is a screenshot of the post dialog.

Known Issues
There seems to be a problem posting URLs that contain the '#' character. The website accepts the links without error but they don't show up in your inbox. I'd appreciate any pointers from anyone who can tell me what I did wrong.

Categories: RSS Bandit

November 20, 2004

@ 08:41 PM

250MB Inbox and Photo Sharing Come to Hotmail Users

I just noticed the eWeek article MSN Hotmail Nears Storage Finish Line which reports

Microsoft Corp.'s Internet division on Thursday started offering 250 MB of storage to new users of free Hotmail accounts in the United States and eight other countries. New accounts previously received 2MB of storage. As for current Hotmail users, the majority has gained the added storage and the rest will be upgraded over the next few weeks, said Brooke Richardson, MSN lead product manager. Hotmail has about 187 million customers worldwide.
...
New Hotmail users will get the storage in two steps. They first will receive 25MB of e-mail storage as MSN verifies that the accounts are for legitimate senders of e-mail and not spammers, Richardson said. After 30 days, they will gain the full 250MB of storage.
The increased storage also comes with an increase in the maximum attachment size to 10MB for free accounts.
A new photo-sharing feature in Hotmail lets users browse thumbnails of digital images and include multiple photos in an e-mail with one click, Richardson said. The feature also compressed the image files.

The article doesn't mention the eight other countries where the large Hotmail inbox feature has been deployed, they are the U.K., Australia, Canada, France, Germany, Italy, Japan, and Spain.

I am curious as to how much of a deterrent the 30 day waiting period will be to spammers. You'd think that using CAPTCHA technologies to prevent automated sign ups would get rid of most spammers but it seems like they are still a problem.

Categories: Mindless Link Propagation | MSN

November 19, 2004

@ 09:40 AM

Some Thoughts on Adam Bosworth's ISCOC04 Talk

Adam Bosworth has posted his ISCOC04 talk on his weblog. The post is interesting although I disagreed with various bits and pieces of it. Below are some comments in response to various parts of his talk

On the one hand we have RSS 2.0 or Atom. The documents that are based on these formats are growing like a bay weed. Nobody really cares which one is used because they are largely interoperable. Both are essentially lists of links to content with interesting associated metadata. Both enable a model for capturing reputation, filtering, stand-off annotation, and so on. There was an abortive attempt to impose a rich abstract analytic formality on this community under the aegis of RDF and RSS 1.0. It failed. It failed because it was really too abstract, too formal, and altogether too hard to be useful to the shock troops just trying to get the job done. Instead RSS 2.0 and Atom have prevailed and are used these days to put together talk shows and play lists (podcasting) photo albums (Flickr), schedules for events, lists of interesting content, news, shopping specials, and so on. There is a killer app for it, Blogreaders/RSS Viewers.

Although it is clear that RSS 2.0 seems to be edging out RSS 1.0, I wouldn't say it has failed per se. I definitely wouldn't say it failed for being too formal and abstract. In my opinion it failed because it was more complex with no tangible benefit. This is the same reason XHTML has failed when compared to HTML. This doesn't necessarily mean that more rigid sysems will fail to take hold when compared to less rigid systems, if so we'd never have seen the shift from C to C++ then from C++ to C#/Java.

Secondly, it is clear It seems Adam is throwing out some Google spin here by trying to lump the nascent and currently in-progress Atom format in the same group as RSS 2.0. In fact, if not for Google jumping on the Atom bandwagon it would even be more of an intellectual curiousity than RSS 1.0.

As I said earlier, I remember listening many years ago to someone saying contemptuously that HTML would never succeed because it was so primitive. It succeeded, of course, precisely because it was so primitive. Today, I listen to the same people at the same companies say that XML over HTTP can never succeed because it is so primitive. Only with SOAP and SCHEMA and so on can it succeed. But the real magic in XML is that it is self-describing. The RDF guys never got this because they were looking for something that has never been delivered, namely universal truth. Saying that XML couldn't succeed because the semantics weren't known is like saying that Relational Databases couldn't succeed because the semantics weren't known or Text Search cannot succeed for the same reason. But there is a germ of truth in this assertion. It was and is hard to tell anything about the XML in a universal way. It is why Infopath has had to jump through so many contorted hoops to enable easy editing. By contrast, the RSS model is easy with an almost arbitrary set of known properties for an item in a list such as the name, the description, the link, and mime type and size if it is an enclosure. As with HTML, there is just enough information to be useful. Like HTML, it can be extended when necessary, but most people do it judiciously. Thus Blogreaders and aggregators can effortlessly show the content and understanding that the value is in the information. Oh yes, there is one other difference between Blogreaders and Infopath. They are free. They understand that the value is in the content, not the device.

Lots of stuff to agree with and disagree with here. Taking it from the top, the assertion that XML is self-describing is a myth. XML is a way to attach labels to islands of data, the labels are only useful if you know what they mean. Where XML shines is that one can start with a limited set of labels that are widely understood (title, link, description) but attach data with labels that are less likely to be understood (wfw:commentRss, annotate:reference, ent:cloud) without harming the system. My recent talk at XML 2004, Designing XML Formats: Versioning vs. Extensibility, was on the importance of this and how to bring this flexibility to the straitjacketed world of XML Schema.

I also wonder who the people are that claim that XML over HTTP will never succeed. XML over HTTP already has in a lot of settings. However I'd question that it is all you need. The richer the set of interactions allowed by the web site the more an API is needed. Google, Amazon and eBay all have XML-based APIs. Every major blogging tool has an XML-based API even though those same tools are using vanilla XML over HTTP for serving RSS feeds. XML over HTTP can succeed in a lot of settings but as the richness of the interaction between client and server grows so also does the need for a more powerful infrastructure.

The issue is knowing how to pick right tool for the job. You don't need the complexity of the entire WS-* stack to build a working system. I know a number of people at Microsoft realize that this message needs to get out more which is why you've begun to see things like Don Box's WS-Why Talk and the WS Kernel.

What has been new is information overload. Email long ago became a curse. Blogreaders only exacerbate the problem. I can't even imagine the video or audio equivalent because it will be so much harder to filter through. What will be new is people coming together to rate, to review, to discuss, to analyze, and to provide 100,000 Zagat's, models of trust for information, for goods, and for services. Who gives the best buzz cut in Flushing' We see it already in eBay. We see it in the importance of the number of deals and the ratings for people selling used books on Amazon. As I said in my blog, My mother never complains that she needs a better client for Amazon. Instead, her interest is in better community tools, better book lists, easier ways to see the book lists, more trust in the reviewers, librarian discussions since she is a librarian, and so on.
This is what will be new. In fact it already is. You want to see the future. Don't look at Longhorn. Look at Slashdot. 500,000 nerds coming together everyday just to manage information overload. Look at BlogLines. What will be the big enabler' Will it be Attention.XML as Steve Gillmor and Dave Sifry hope' Or something else less formal and more organic' It doesn't matter. The currency of reputation and judgment is the answer to the tragedy of the commons and it will find a way. This is where the action will be. Learning Avalon or Swing isn't going to matter. Machine learning and inference and data mining will. For the first time since computers came along, AI is the mainstream.

I tend to agree with most of this although I'm unsure why he feels the need to knock Longhorn and Java. What he seems to be overlooking is that part of the information overload problem is the prevalance of poor data visualization and user interface metaphors for dealing with significant amounts of data. I know believe that one of the biggest mistakes I made in the initial design of RSS Bandit was modelling it after mail readers like Outlook even though I knew lots of people who had difficulty managing the flood of email they get using them. This is why the next version of RSS Bandit will borrow a leaf from FeedDemon along with some other tricks I have up my sleeve.

A lot of what I do in RSS Bandit is made easy due to the fact that it's built on the .NET Framework and not C++/MFC so I wouldn't be as quick to knock next generation GUI frameworks as Adam is. Of course, now that he works for a Web company the browser is king.

Categories: Syndication Technology | XML

November 19, 2004

@ 08:33 AM

Poppin' Them Thangs at XML 2004

My XML in the .NET Framework: Past, Present & Future talk went well yesterday. The room was full and people seemed to like what they heard. The audience was most enamored with the upcoming System.Xml.Schema.XmlSchemaInference class that provides the ability to generate schemas from sample documents and the new XSLT debugger.

It was nice having people walk up to me yesterday to tell me how much they liked my talk from the previous day. There were even a couple of RSS Bandit users who walked up to me to tell me how much they liked it. This was definitely my best XML conference experience.

Arpan did comment on the irony of me giving more talks about XML after leaving the XML team at Microsoft than when I was on the team. :)

Categories: Ramblings | XML

November 18, 2004

@ 07:12 PM

The Tyranny of MustUnderstand

My XML 2004 talk, Designing XML Formats: Versioning vs. Extensibility, went over well yesterday. Lots of interesting questions were asked during the Q&A session for my talk and the following talk by Dave Orchard, Achieving Distributed Extensibility and Versioning.

One issue that came up during the discussions after our talk was the cost/benefit of using a mustUnderstand construct in an XML format similar to the SOAP mustUnderstand attribute. The primary benefit of the having such a construct is that it enables third parties to create mandatory extensions to an XML format. However there a number of costs to having such a construct

Entire Element or Document Must Be Read: A processor that just wants to extract a subset of the data in the document still has to parse the entire document and see if there are any mustUnderstand constructs before it can process the document. This increases the cost of processing instances of the format.
Ambiguity as to what is Meant by 'Understand': The concept of what it means to "understand" an XML vocabulary is context specific. For example, should a stylesheet that pretty prints an XML document fail because the format contains a mustUnderstand construct that is not explicitly handled by the stylesheet? A mustUnderstand construct is particularly limiting since it forces all consumers to fail even though there may be some consumers that can still use the format even if they don't explicitly understand certain elements or attribute in the document.
Causes Confusion for Intermediaries: In certain cases, a format may be processed by an intermediary on the way to the client from the server. For example, HTTP requests often pass through proxy servers and there are also web-based aggregators of RSS/Atom feeds such as Feedster & PubSub which can then be subscribed to by other aggregators. In such cases, it is ambiguous whether intermediaries are expected to fail if a construct which isn't explicitly handled is labelled as mustUnderstand or whether they are expected to pass it on with that label to third party aggregators. In fact certain formats thus have separate mustUnderstand constructs for hop-to-hop versus end-to-end transmission.

From my perspective, the cost of having a mustUnderstand construct is often not worth the benefits provided. This wasn't explicitly in my talk but is a conclusion I came to recently which I expanded upon during the Q&A session.

Categories: XML

November 17, 2004

@ 01:13 PM

iPod Battery Woes

Recently I've been having the same problems with my iPod that Omar Shahine described in his post PlaysForSure

So, here is the landscape today. I have an iPod, it's beautiful, small, light and has a great out of box experience. I plug it into a Mac or a PC with iTunes installed and the rest is mostly magic. iTunes can automatically communicate with the iPod, sync all my music over firewire and charge the device at the same time. However, my iPod seems to think that after hours and hours of charging the battery is half full. As you use it though the battery meter increases before it decreases. If I leave the iPod sitting for a few days, via osmosis or some process, the battery drains. So most of the time when I want to use it, I can't cause it's dead. It also won't even last for a complete transatlantic flight.

I love my iPod but this is beginning to get old. It looks like it's time I replaced my battery, at least the price seems to be only about $30.00. Anyone out there have any experience with replacing their iPod battery?

Categories: Ramblings

November 16, 2004

@ 06:56 AM

Halo 2 and the Audiovox 5600

I picked up a copy of Halo 2 from the Microsoft company store last week. It's definitely a great game but nothing revolutionary. It's the original Halo with more guns where you also get to play as one of the Covenant in campaign mode. The graphics are excellent, the sound is great and the gameplay is about the same. The outdoor levels where you ride around on a Warthog are just as cool as before and the treks around the mazes in the indoor levels also tend to get just as repetitive as before. The game truly shines in multiplayer mode and may be the incentive for me finally setting up XBox Live at home given that the kit has been sitting on my cofee table for about half a year.

I also picked up the AudioVox SMT 5600 last week. I've wanted a Windows Smartphone for months because I'd gotten to the point where having access to my Outlook inbox and calendar on the go was becoming more and more necessary. I picked the AudioVox 5600 based on some favorable comments from Robert Scoble which were echoed by someone from the MSN Messenger team during an impromptu cross team meeting. I totally love the phone and take back all the snide comments I used to make about folks like Russell Beattie who are always singing the praises of mobile phones that do more than make voice calls. I even used the camera on my phone today while sight seeing in Washington, DC. However, unlike Scoble and my new boss I don't see this phone or anything like it replacing my iPod anytime soon.

That's two Microsoft-related personal purchases that I'd heartily recommend to a friend. Excellent.

Categories: Mindless Link Propagation

November 12, 2004

@ 06:37 AM

Comments [3]

Delete Support Checked in for RSS Bandit

I checked in the basic infrastructure for adding support for deleting items in RSS Bandit this weekend and Torsten made a first pass at the UI. The screenshot below shows the feature as currently checked into CVS. The main pieces left are to ensure that this works smoothly with synchronization so that if I sync from home my work instance of RSS Bandit knows which items I deleted while at home.

Categories: RSS Bandit

November 12, 2004

@ 06:21 AM

Comments [3]

MSN Search (beta) Already Better Than Google in Some Searches?

This morning I tried out the MSN Search Beta and was suitably impressed. There were some availability issues last night which led some to proclaim the new MSN search: an unmitigated disaster. However today things are running fine.

I tried the following queries on both services and got some interesting results

"dare obasanjo"

Google Result	Description
http://www.25hoursaday.com/weblog/	My current personal weblog
http://www.kuro5hin.org/user/Carnage4Life/diary	My former personal weblog
http://blogs.msdn.com/dareobasanjo	My current work-related weblog
http://www.xml.com/pub/au/142	My author page on XML.com
http://msdn.microsoft.com/library/en-us/dnexxml/html/xml01202003.asp	The most popular article from my Extreme XML column on MSDN

MSN Search (beta) Result	Description
http://www.25hoursaday.com/weblog/	My current personal weblog
http://blogs.msdn.com/dareobasanjo	My current work-related weblog
http://www.xml.com/pub/au/142	My author page on XML.com
http://www.rssbandit.org/ow.asp?DareObasanjo	My personal page on the RSS Bandit wiki
http://www.afriguru.com/2004/dare-obasanjo.html	A blog post that refers to me as a Nigerian XML expert

"rss bandit" OR rssbandit

Google Result	Description
http://www.rssbandit.org/	The RSS Bandit webpage
http://www.gotdotnet.com/Community/Workspaces/Workspace.aspx?id=cb8d3173-9f65-46fe-bf17-122e3703bb00	The former RSS Bandit project page on GotDotNet Workspaces
http://sourceforge.net/projects/rssbandit	The current RSS Bandit project page on SourceForge
http://www.kuro5hin.org/story/2003/5/16/135349/207	A post in my former blog about RSS Bandit
http://weblogs.asp.net/rosherove/archive/2004/01/04/47392.aspx	A review of RSS Bandit by Roy Osherove

MSN Search (beta) Result	Description
http://www.rssbandit.org/	The RSS Bandit webpage
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnexxml/html/xml02172003.asp	The first article I wrote about RSS Bandit on MSDN
http://www.marketingwithrss.com/rss-bandit-or-rssbandit/	SPAM
http://www.25hoursaday.com/weblog/CategoryView.aspx?category=RSS%20Bandit	Posts from the RSS Bandit category of my current weblog
http://directory.google.com/Top/Reference/Libraries/Library_and_Information_Science/Technical_Services/Cataloguing/Metadata/RDF/Applications/RSS/News_Readers/	A catalog of RSS readers

The results from MSN Search were pertinent and in some cases moreso than the Google ones. Although MSN Search did allow a spam entry to make it into the top 5 results it also returned a link to all the RSS Bandit related posts on my blog which Google didn't pick up. Well, it doesn't look so inconceivable anymore that Microsoft will give Google a run for their money.

Then just wait until you see launch version of MSN Spaces and compare it to Blogger (although I'd prefer comparisons to LiveJournal or TypePad). The next few years are going to be fun.

Categories: MSN

November 11, 2004

@ 03:54 PM

The Transition is Complete

Yesterday I literally stopped being the face of XML at Microsoft. From now on if you go to http://www.microsoft.com/xml or http://msdn.microsoft.com/xml you won't see my work blog or my picture welcoming you to the XML Developer Center at MSDN. It's a particularly bittersweet experience. I fought with MSDN for about a year and a half to get that site launched and for a while I felt that it was my baby. The new owner of the site, Irwin, is a great guy and I'm sure he'll do excellent things with it.

Speaking of transitions, I'm still trying to fit in at MSN. It's interesting going from being extremely knowledgeable about all the technologies I'm responsible for to returning blank stares when asked about some aspect of a spec I now own. Hopefully I'll have some downtime at next week's XML 2004 conference to bone up on the various specs about our backend infrastructure so I don't seem so clueless at the next feature costing meeting. :)

So far the new job's been awesome. Great people and the features I'm working on are killer. Best of all I not only get to deliver features for MSN Spaces but also Hotmail, MSN Messenger and even MyMSN. Of course, this means I attend lots and lots of cross-team meetings. Yay, fun...

Categories: Life in the B0rg Cube | MSN

November 10, 2004

@ 06:00 AM

Comments [3]

Oh Happy Day

A comment in Slashdot pointed out to me that it's been a day full of good news. On this day we find out that

The last one is spectacularly good news [unless he is nominated as a Supreme Court Justice].

Categories: Mindless Link Propagation

November 9, 2004

@ 01:39 PM

The Incredibles: Pixar Does It Again

I saw The Incredibles this weekend and it was great. The animation was amazing, the story top notch and it had the right ratio of action to humor. This is probably one of the best super hero movies I've ever seen.

Rating: ***** out of *****

Categories: Movie Review

November 9, 2004

@ 01:28 PM

The Blogging API Problem

A recent post entitled Finally, a Use for Atom by Charles Miller got me thinking about the usefulness or lack thereof of the IETF Atom effort. It seems I wasn't the only one who started thinking this given a mail thread started by Tim Bray on the atom-syntax list entitled Posted PaceDeclareVictoryOnFormat where he writes

To: Atom WG <atom-syntax@xxxxxxx>
Subject: Posted PaceDeclareVictoryOnFormat
From: Tim Bray <Tim.Bray@xxxxxxx>
Date: Mon, 08 Nov 2004 14:13:17 -0800

See http://www.intertwingly.net/wiki/pie/PaceDeclareVictoryOnFormat

The world can use Atom, sooner rather than later. The return-on-investment of further WG time invested in polishing something that's already pretty good is starting to be very unattractive. Particularly when the Protocol draft seriously needs work and progress.

Note that this has not been formally placed at the front of the queue yet. -Tim

I posted some comments to the thread that reflect the same opinions from my post Mr. Safe's Guide to the RSS vs. ATOM debate, the relationship between the Atom syndication format and RSS is the same as that of XHTML and HTML; geeks will like it but there's no real concrete reason to use it over the old stuff that already works pretty well for the most part.

However I also reiterated that I think the Atom API is a worthwhile addition to the world of blogging technologies. I listed the problems with the current crop of blog posting APIs such as the Blogger API and MetaWeblog API in my post What's Wrong with the MetaWeblog API? from a year and a half ago

Security: The MetaWeblog API has no concept of security. Passwords are sent in plaintext as parameters to XML-RPC functions (i.e. they are sent in plain text on the wire as part of the XML message).

Strongly Coupled To XML-RPC: RSS and Joe Gregorio's CommentAPI have shown that one can build applications that retrieve and send XML documents from client to server directly using HTTP GET and POST instead of going through an added layer of indirection by using exlicit RPC mechanisms.
...
I also believe that there should be the API should not just be tied to XML-RPC but should have interfaces that utilize the XML Web Services family of technologies like SOAP and WSDL not just XML-RPC. There are many products and toolkits that support SOAP/WSDL/etc plus more are being built every day. It makes little sense to me that almost everywhere else in the software industry people are either exchanging XML documents using RESTian principles (i.e. HTTP GET and POST) or the XML Web Services family of technologies but when it comes to web content related technologies there is this anachronism where an arbitrarily different methodology is used.

Limited Functionality: The MetaWeblog API only allows one to either post and edit blog entries, fetch information about a specific user or change the website template. This is a drop in the bucket considering all the things one would like to do with a weblog engine which can be supported by the engine.

As time has passed some of my opinions on this matter have mellowed. Security is a big problem and one I don't think can be ignored. The fact that existing APIs depend on XML-RPC instead of more accepted industry practices such as using RESTian principles or SOAP+WSDL isn't that great but it isn't that big a deal. The issue of limited functionality is probably something that has to be lived with since for the API to be widely adopted it has to support a lowest common denominator of features. As long as the API can be extended then the fact that there isn't some functionality in the core isn't that bad.

So for me, the high order bit is security. I can see at least two ways to solve this problem listed in order of least disruptive to most disruptive

Blog editing tools and blog vendors moving towards using XML-RPC over HTTPS/SSL or at least use digest HTTP authentication instead of HTTP.
Blog editing tools and blog vendors moving towards using the Atom API over HTTPS/SSL or at least use digest HTTP authentication instead of HTTP.

A number of blog hosting services such as Blogger/Google and SixApart have moved towards doing (2) above. However it is unclear to me how much this has been embraced by builders of popular blog editing tools such as BlogJet and w::bloggar. Looking at the list of Atom Enabled client software I only see aggregators listed but not blog editing tools.

So I was curious, are there any major blog editing tools that support the Atom API? If not, do these tools support using the Blogger/MetaWeblog API over HTTPS/SSL? If not, is there any interest in doing the former or the latter any time soon?

Categories: Syndication Technology

November 9, 2004

@ 12:37 PM

Why Comments in WordPress Blogs Don't Show Up in RSS Bandit

There was a recent posting on the RSS Bandit bug forum entitled Weblog comments do not appear in listview where the author wrote

I use Wordpress to generate my site feed (http://www.chaoszone.org/index.xml). Wordpress supports wfw:commentRSS, but RSS Bandit is unable to show comments for posts in this feed inside the listview (the way it does for Dare's and other MSDN weblogs). Instead, I have to click through to go to the comments page.

I've since tweaked the basic Wordpress RSS2 generator routine to include slash:comments -- a temp feed is available here: http://www.chaoszone.org/rss-with-slashcomments.xml

Since the element in question is supported by RSS Bandit and I can view comments just fine in any dasBlog or Community Server::Blogs (formerly .TEXT) weblog I assumed this was a bug in WordPress. Eventually I tracked down the problem to an issue of capitalization.

The element in question is called wfw:commentRss in Chris Sells' original specification but incorrectly transcribed as wfw:commentRSS in Joe Gregorio's list of the wfw namespace elements. Since it looks like at least one blog tool uses the latter capitalization the next version of RSS Bandit will support both versions of the element. This means that comments to blog posts on WordPress-based blogs will now show up in the next version of RSS Bandit.

Categories: RSS Bandit

November 4, 2004

@ 06:10 PM

XML Specs That Give You Nightmares

Many times when implementing XML specifications I've come across I've come up against feature that just seem infeasible or impractical to implement. However none of them have given me nightmares as they have my friend Mike Vernal, a program manager on the Indigo team at Microsoft. In his post could you stop the noise, i'm trying to get some rest ... he talks about spending nights tossing and turning having nightmares about how the SOAP mustUnderstand header attribute should be processed. In Mike's post More SOAP Sleepness he mentions having sleepless nights worrying about the behavior of SOAP intermediaries as described in Section 2.7: Relaying SOAP Messages.

This isn't to say I didn't have sleepless nights over implementing XML specifications when I worked on the XML team at Microsoft. One of the issues that consumed a lot more of my time than is reasonable is explained in Derek Denny-Brown's post Loving and Hating XML Namespaces

Namespaces and your XML store
For example, load this document into your favorite XML store API (DOM/XmlBeans/etc)
<book title='Loving and Hating XML Namespaces'>
   <author>Derek Denny-Brown</author>
</book>
Then add the attribute named "xmlns" with value "http://book" to the <book> element. What should happen? Should that change the namespaces of the <book> and <author> elements? Then what happens if someone adds the element <ISBN> (with no namespace) under <book>? Should the new element automatically acquire the namespace "http://book", or should the fact that you added it with no namespace mean that it preserves it's association with the empty namespace?

In MSXML, we tried to completely disallow editing of namespace declarations, and mostly succeeded. There was one case, which I missed, and we have never been able to fix it because so many people found it an exploited it. The W3C's XML DOM spec basically says that element/attribute namespaces are assigned when the nodes are created, and never change, but is not clear about what happens when a namespace declaration is edited.

Then there is the problem of edits that introduce elements in a namespace that does not have an existing namespace declaration:
<a xmlns:p="http://p/">
  <b>
    ...
      <c p:x="foo"/>
    ...
  </b>
</a>
If you add attribute "p:z" in namespace "bar" to element <b>, what should happen to the p:x attribute on <c>? Should the implementations scan the entire content of <b> just in case there is a reference to prefix "p"?

Or what about conflicts? Add attribute "p:z" in namespace "bar" to the below sample... what should happen?
<a xmlns:p="http://p/" p:a="foo"/>

This problem really annoyed me while I was the PM for the System.Xml.XmlDocument class and the short-lived System.Xml.XPath.XPathDocument2. In the former, I found out that once you started adding, modifying and deleting namespace declarations the results would most likely be counter-intuitive and just plain wrong. Of course, the original W3C DOM spec existed before XML namespaces and trying to merge them in after the fact was probably a bad idea. With the latter class, it seemed the best we could do was try and prevent editing namespace nodes as much as possible. This is the track we decided to follow with the newly editable System.Xml.XPath.XPathNavigator class in the .NET Framework.

This isn't the most sleep depriving issue I had to deal with when trying to reflect the decisions in various XML specifications in .NET Framework APIs. Unsurprisingly, the spec that caused the most debate amongst our team when trying to figure out how to implement its features over an XML store was the W3C XML Schema Recommendation part 1: Structures. The specific area was the section on contributions to the Post Schema Validation Infoset and the specific infoset contribution which caused so much consternation was the validity property.

After schema validation an XML element or attribute should have additional metadata added to it related to validation such as it's type, its default value specified in the schema if any and whether it is valid or not according to its type. Although the validity property is trivial to implement on a read-only API such as the System.Xml.XmlReader class, it was unclear what would be the right way to expose this in other representations of XML data such as the System.Xml.XmlDocument class. The basic problem is "What happens to the validity propety of the element or attribute those of all its ancestors once the node is updated?". Once I change the value of an age element which is typed as an integer from 17 to seventeen what should happen. Should the DOM test every edit to make sure it is valid for that type then reject it otherwise? Should the edit be allowed but the validity property of the element and all its ancestors be changed? What if there is a name element with required first and last elements and the user wants to delete the first element and replace it with a different one? How would that be reflected with regards to the validity property of the name element?

None of the answers to the question we came up with satisfactory. In the end, we were stuck between a rock and a hard place so we made the compromise choice. I believe we debated this issue every other month for about a year.

Categories: XML

November 3, 2004

@ 05:45 PM

What's The Most Dilbert-esque Cost Cutting You've Experienced?

Vote Or Die Redux

According to the CTV article Election voter turnout highest in 30 years

About 120 million people cast ballots, or just under 60 percent of eligible voters — the highest percentage turnout since 1968, said Curtis Gans, director of the nonpartisan Committee for the Study of the American Electorate.

In the 2000 election, when Republican George W. Bush squeaked out a win over Democrat Al Gore, slightly more than 54 per cent of eligible voters, or about 105.4 million, voted.

Gans said the modern record for voter turnout was 1960, when 65 per cent of those eligible cast ballots to back Democrat John Kennedy over Republican Richard Nixon.

More detailed figures were expected later Wednesday.

Still, despite the high turnout, 2004 did not prove to be a breakout year for young voters.

Exit polls indicated that fewer than one in 10 voters who came out on Tuesday were 18 to 24. That's about the same proportion of the electorate as in 2000.

It looks like P. Diddy's Vote Or Die campaign didn't get the MTV demographic out after all. However I was quite amused to see him as one of the election pundits on MSNBC last night.

Categories: Mindless Link Propagation

November 2, 2004

@ 03:00 PM

Comments [5]

Yesterday I found out that a shortage of styrofoam cups in the kitchen that we experienced in our building was actually occuring all over the Redmond campus. Some of us joked last night that this was the latest in the string of penny wise, pound foolish cost cuttings in the same vein as only having office supplies on one floor of the building.

This morning I realized that moving all of a particular resource to one floor to "cut costs" was actually an example used in The Dilbert Principle in a section entitled Companies That Turn On Themselves.

I wonder what other Dilbert-esque cost cutting moves folks out there have experienced? Post your favorites in the comments to this post.

Categories: Life in the B0rg Cube

November 2, 2004

@ 02:35 PM