Sunday, 22 February 2004 - Dare Obasanjo's weblog

February 22, 2004

@ 09:08 PM

Yesterday my mom and I went on a train ride that is often billed as being a way for couples to spend a special occassion. The train was full of couples celebrating anniversaries, birthdays and other special occassions. Quite a number of couples were making out openly at the end of the train ride whose main features are a picturesque dinner on the train and a stop with a tour of a local winery.

One of the less romantic aspects of this train ride is that for the most part you have to share a table with another couple facing them. this means they get to overhear your conversation and interrupt yours. The couple we shared our table with were celebrating the guy's birthday and his girlfriend was treating him to a special day out that ended with the train ride. After we got back on the train from the winery tour the unexpected happened. They were engaged in conversation and he was comparing her favorably to ex-girlfriends, then all of a sudden he got down on one knee and pulled out a box with a ring in it. After a stunned silence she took it, said some words softly then said “I appreciate the sentiment but the timing is inappropriate” and handed it back. This was followed by her voicing her concerns about his ability to support them and him rattling of how much he made a month plus various bonuses, etc. I think it went downhill from there.

All through this I was staring out the window trying to make small talk with my mom but failing miserably. If and whenever I do end up proposing to someone I've definitely learned a thing or two about what not to do.

Categories: Ramblings

February 22, 2004

@ 06:32 PM

Comments [1]

Newspaper Views for Reading RSS Feeds

Jon Udell writes in is entry Heads, decks, and leads: revisited

Yesterday, for example, Steve Gillmor told me that he's feeling overwhelmed by thousands of unread items in NetNewsWire. Yet I never feel that way. I suspect that's because I'm reading in batches of 100 (in the Radio UserLand feedreader). I scan each batch quickly. Although opinions differ as to whether or not a feed should be truncated, my stance (which I'm reversing today) has been that truncation is a useful way to achieve the effect you get when scanning the left column of the Wall Street Journal's front page. Of the 100 items, I'll typically only want to read several. I open them into new Mozilla tabs, then go back and read them. Everybody's different, but for me -- and given how newspapers work, I suspect for many others too -- it's useful to separate the acts of scanning and reading. When I'm done with the batch, I click once to delete all 100 items.

and in today's post entitled Different strokes he writes

I agree. In trying to illustrate a point about scanning versus reading, I'm afraid I fanned the flames of the newsreader-style versus browser-style debate. In fact, the two modes can be complementary. I just bought the full version of NetNewsWire, which exploits that synergy as Brent describes. So does FeedDemon, which this posting prompted me to re-explore.

This highlights a conflict between the traditional 3-pane aggregators that follow the mail or news reader model which implies that every post is important and should be read one by one and web-style aggregators like Radio Userland that present blogs in a unified web-based view reminiscent of an aggregated blog or newspaper. On the RSS Bandit wiki there's a wishlist item that reads

Newspaper view. A summery of unread feed items, formatted by a XSLT stylesheet and displayed as HTML/PDF. Inspired by Don Park. also here

which was originally added by Torsten. He never got around to adding this feature because he felt it wasn't that useful after all. I never implemented it because one would have to provide a way to interact with posts from this newspaper view (i.e. mark them as read or deleted, view comments, etc) which either translates to Javascript coding or running a local web server. Neither of the options was palatable.

This morning I downloaded FeedDemon to see how it got around these problems for its newspaper view. I found out that it does the obvious thing, it doesn't. From what I gather there is an option to 'mark all items in a channel as read' once you leave the channel. So once you close the newspaper view it assumes every post that showed up in it was read. A heavy-handed approach but it probably works for the most part.

Looks like something else to add to the RSS Bandit TODO list.

I've been thinking that something like this is necessary after reading Robert Scoble's post 1296 newsfeeds +are+ sustainable where he wrote

Here's my workflow:

At about 5 p.m. every day I tell NewsGator to get me my feeds. It is downloading them in the background as I speak.

Then I open each folder that's bold...

Then I only read the headlines. I'm getting very good at ignoring headlines with subjects like "isn't my cat cute?" See, that's another productivity point. Robin probably assumes I read all the crap that people post. I don't. I only read those things that MIGHT be interesting. If I find a headline that's interesting, then I scan the article it is associated with. I don't read it. Just scan at that point. Usually that means reading the first paragraph and scanning the rest for later.

I've found that reading headlines isn't always the best way to find good stuff and wouldn't mind a way to quickly scan all the articles in a category that goes beyond eyeballing a bunch of headlines. However I'm going to avoid Jon Udell's advice about XHTML-izing all the HTML content in feeds is the way to get you there. Been there, done that, not going back. The approach used by FeedDemon is a step in the right direction and doesn't require absorbing the problems that comes with trying to convert the ill-formed markup that typically shows up in feeds to XHTML.

Categories: RSS Bandit | XML

February 21, 2004

@ 09:15 PM

Comments [2]

Microsoft XML Web Services Architect Joins the Atom Effort

In his post entitled Back in the Saddle Don Box writes

My main takeaway was that it's time to get on board with Atom - Sam is a master cat herder and I for one am ready to join the other kittens.

This is good news. Anyone who's read my blog probably can discern that I think the ATOM syndication format is a poorly conceived, waste of effort that unnecessarily fragments the website syndication world. On the other hand the ATOM API especially the bits about SOAP enabled clients are a welcome upgrade to the existing landscape of blog posting/editing APIs.

My experiences considerng how to implement the ATOM API in RSS Bandit have highlighted a one or two places where the API seems 'problematic' which actually point more to holes in the XML Web Services architecture than actual problems with the API. The two scenarios that come most readily to mind are

Currently if a user would wants to post a comment to their blog using client software they need to configure all sorts of technical settings such as which API to use, port numbers, end point URLs and a lot more. For example, look at what one has to post to a dasBlog weblog from w.bloggar Ideally, the end user should just be able to point their client at their blog URL (e.g. http://www.25hoursaday.com/weblog) and it figures out the rest.
The current ATOM specs describe a technique for discovering the web service end points a blog exposes which involves downloading the HTML page and parsing out all the <link> tags. I've disagreed with this approach in the past but the fact is that it does get the job done.

What this situation has pointed out to me is that there is no generic way to go up to a website and find out what XML Web Service end points it exposes. For example, if you wanted to tell all the publiclly available Web Services provided by Microsoft you'd have to read Aaron Skonnard's A Survey of Publicly Available Web Services at Microsoft instead of somehow discovering this programmatically. Maybe this is what UDDI was designed for?
Different blogs allow different syntax for posting comments. I've lost count of the amount of times I've posted a comment to a blog and wanted to provide a link but couldn't tell whether to just use a naked URL (http://www.example.com) or a hyperlink (<a href=“http://www.example.com“>example link</a>). Being that RSS Bandit has supported the CommentAPI for a while now I've constantly been frustrated by the inability to tell what kind of markup or markup subset the blog allows in comments. A couple of blogs provide formatting rules when one is posting a comment but there really is no programmatic way to discover this.

Another class of capabilities I'd like to discover dynamically is which features a blog supports. For instance, the ATOM API spec used to have a 'Search facet' which was removed because it seemed to many people thought it'd be onerous to implement. What I'd have preferred would have been for it to be optional then clients could dynamically discover whether the ATOM end point had search capabilities and if so how rich they were.

The limitation here is that there isn't a generic way to discover and enunciate the fine grained capabilities of an XML Web Service end point. At least not one I am familiar with.

It would be nice to see what someone like Don Box can bring to the table in showing how to architect and implement such a loosely coupled XML Web Service based system on the World Wide Web.

Categories: Technology | XML

February 21, 2004

@ 01:23 PM

Comments [9]

RSS Bandit Future Release Plans

Torsten and I have fixed about a dozen bugs since the v1.2.0.90 release and implemented one or two minor features. There are two major issues we'd like to tackle in the next few weeks then ship a minor release then work on the next major version of RSS Bandit. The issues are both performance related

High Memory Consumption: We don't consume as much memory as the other free .NET Framework based aggregator, SharpReader, which I've seen consume over 100MB of memory but we do stay in the 30MB to 60MB range which is excessive. I'm pretty sure I have a good idea what the prime culprits are for the memory issues but currently can't think of a good way to reduce the memory consumption without removing or degrading some features. However our perf goals are to reduce those numbers by half in the next few weeks.
Feeds with Lots of Posts Take too Long to Load: This is related to one of the culprits in the previous problem. If you are subscribed to a feed such as Weblogs @ ASP.NET which gets 50-100 posts a day which translates to about 1500-3000 posts a month then there is a perceptible slowdown in how long it takes to load the listview when you click on the feed.

We'll be fixing bugs and implementing minor features along the way while getting the aforementioned performance issues under control. Once we are satisfied with the perf , we'll have a beta period and then ship a release. This should be within the next month.

After this there are a number of significant features we have slated such as NNTP support, subscription harmonization using SIAM, translations to multiple languages (German being the first) and better integration with IE (such as supporting the Favorites menu). The release where these features show up will be in 2 or 3 months.

In the meantime, Torsten and I will be discussing RSS Bandit development in our blogs and on the RSS Bandit mailing list

Categories: RSS Bandit

February 21, 2004

@ 12:39 PM

Comments [0]

It's the Superficial Features that Make Me Smile

Torsten figured out how to place background images in our XSLT styled themes for RSS Bandit this past week. As a celebration, we'll be including the following theme in the default install.

Personally, I'd have preferred Torsten's Christina Aguilera theme. :)

Categories: RSS Bandit

February 20, 2004

@ 05:22 PM

Comments [2]

The Impedence Mismatch between W3C XML Schema and the CLR

Daniel Cazzulino is writing about W3C XML Schema type system < - > CLR type system and has an informal poll at the bottom of his article where he writes

We all agree that many concepts in WXS don't map to anything existing in OO languages, such as derivation by restriction, content-ordering (i.e. sequence vs choice), etc. However, in the light of the tools the .NET Framework makes available to map XML to objects, we usually have to analyze WXS (used to define the structure of that very XML instance to be mapped) and its relation with our classes
In this light, I'm conducting a survey about developer's view on the relation of the XSD type system and the .NET one. Ignoring some of the more advanced (I could add cumbersome and confusing) features of WXS, would you say that both type systems fit nicely with each other?

I find the question at the end of his post which I highlighted to be highly tautological. His question is basically, “If you ignore the parts where they don't fit well together do the CLR and XSD type system fit well together?”. Well if you ignore the parts where they don't then the only answer is YES. In reality many developers don't have the freedom to ignore parts of XSD they don't want to support especially when utilizing XML Web Services designed by others.

There are two primary ways one can utilize the XmlSerializer which maps between XSD and CLR types

XML Serialization of Object State: In this case the developer is only interested in ensuring that the state of his classes can be converted to XML. This is a fairly simple problem because the expressiveness of the CLR is a subset of that of W3C XML Schema. Any object's state could be mapped to an element of complex type containing a sequence or choice of other nested elements that are either nested simple types or complex types.

Even then there are limitations in the XmlSerializer which make this cumbersome such as the fact that it only serializes public fields but not public properties. But that is just a design decision that can be revisited in future releases.
Conversion of XML to Objects: This is the scenario where a developer converts an XML schema to CLR objects to make them easier to program against. This is particularly common in XML Web Services scenarios which is why the XmlSerializer was originally designed. In this scenario the conversion tool has to contend with the breadth of features in the XML Schema: Structures and XML Schema: Datatypes recommendations.

There are enough discrepancies between the W3C XML Schema type system and that of the CLR to fill a Ph.D thesis. I touched on some of these in my article XML Serialization in the .NET Framework such as
Q: What aspects of W3C XML Schema are not supported by the XmlSerializer during conversion of schemas to classes?

A: The XmlSerializer does not support the following:
- Any of the simple type restriction facets besides enumeration.
- Namespace based wildcards.
- Identity constraints.
- Substitution groups.
- Blocked elements or types.
After gaining more experience with working with the XmlSerializer and talking to a number of customers I wrote som more about the impedance mismatches in my article XML Schema Design Patterns: Is Complex Type Derivation Unnecessary? specifically
For usage scenarios where a schema is used to create strongly typed XML, derivation by restriction is problematic. The ability to restrict optional elements and attributes does not exist in the relational model or in traditional concepts of type derivation from OOP languages. The example from the previous section where the email element is optional in the base type, but cannot appear in the derived type, is incompatible with the notion of derivation in an object oriented sense, while also being similarly hard to model using tables in a relational database.

Similarly changing the nillability of a type through derivation is not a capability that maps to relation or OOP models. On the other hand, the example that doesn't use derivation by restriction can more straightforwardly be modeled as classes in an OOP language or as relational tables. This is important given that it reduces the impedance mismatch which occurs when attempting to map the contents of an XML document into a relational database or convert an XML document into an instance of an OOP class

I'm not the only one at Microsoft who's written about this impedance mismatch or tried to solve it. Gavin Bierman, Wolfram Schulte and Erik Meijer wrote in their paper Programming with Circles, Triangles and Rectangles an entire section about this mismatch. Below are links to descriptions of a couple of the mismatches they found most interesting
The mismatch between XML and object data-models
     Edge-labelled vs. Node-labelled
     Attributes versus elements
     Elements versus complex and simple types
     Multiple occurrences of the same child element
     Anonymous types
     Substitution groups vs derivation and the closed world assumption
     Namespaces, namespaces as values
     Occurence constraints part of container instead of type
     Mixed content

There is a lot of discussion one could have about the impedance mismatch between the CLR type system and the XSD type system but one thing you can't say is that it doesn't exist or that it can be ignored if building schema-centric applications.

In conclusion, the brief summary is that if one is mapping objects to XML for the purpose of serializing their state then there is a good match between the CLR & XSD type systems since the XSD type system is more expressive than the CLR type system. On the other hand, if one is trying to go from XSD to the CLR type system there are significant impedance mismatches some of which are limitations of the current tools (e.g. XmlSerializer could code gen range checks for derivation by restriction of simple types or uniqueness tests for identity constraints ) while others are fundamental differences between the XSD type system and object oriented programming such as the difference between derivation by restriction in XSD and type derivation.

Categories: XML

February 20, 2004

@ 07:07 AM

Comments [10]

Never ascribe to malice that which can be explained by incompetence

Recently ZDNet ran an article entitled Google spurns RSS for rising blog format where it stated

The search giant, which acquired Blogger.com last year, began allowing the service's million-plus members to syndicate their online diaries to other Web sites last month. To implement the feature, it chose the new Atom format instead of the widely used, older RSS.

I've seen some discussion about the fact that Google only provides feeds for certain blogs in the ATOM 0.3 syndication format which is an interim draft of the spec that is part of an effort being driven by Sam Ruby to replace RSS and related technologies. When I first read this I ignored it because I didn't have any Blogger.com feeds that were of interest to me. This changed today. This afternoon I found out that Steve Saxon, the author of the excellent article XPath Querying Over Objects with ObjectXPathNavigator had a Blogger.com blog that only provided an ATOM feed. Being that I use RSS Bandit as my aggregator of choice I cannot subscribe to his feed nor can I use a large percentage of the existing news aggregators to read Steve's feed.

What a find particularly stunning about Google's decision is that they have removed support for an existing, widely supported format for an interim draft of a format which according to Sam Ruby's slides for the O'Reilly Emerging Technology Conference is several months away from being completed. An appropriate analogy for what Google has done would be like AOL abandoning support for HTML and changing all of its websites to use the May 6th 2003 draft of the XHTML 2.0 spec. It simply makes no sense.

Some people, such as Dave Winer believe Google is engaging in such user unfriendly behavior for malicious reasons but given that Google doesn't currently ship a news aggregator there doesn't seem to be much of a motive there (Of course, this changes once they ship one). I recently stumbled across an article entitled The Basic Laws of Human Stupidity which described the following 5 laws

Always and inevitably everyone underestimates the number of stupid individuals in circulation.
The probability that a certain person be stupid is independent of any other characteristic of that person.
A stupid person is a person who causes losses to another person or to a group of persons while himself deriving no gain and even possibly incurring losses.
Non-stupid people always underestimate the damaging power of stupid individuals. In particular non-stupid people constantly forget that at all times and places and under any circumstances to deal and/or associate with stupid people always turns out to be a costly mistake.
A stupid person is the most dangerous type of person.

The only question now is Is Google crazy, or crazy like a fox? and only time will tell the answer to that question.

Categories: Technology

February 18, 2004

@ 06:26 PM

Comments [0]

Mr. Safe's Guide to the RSS vs. ATOM debate

Dave Winer recently wrote that at least one person has asked if it is safe to ignore Atom in his weblog. If you are a cautious person like Tim Bray's Mr. Safe or you fit more on the right than the left side of the Technology Adoption Life Cycle then you are probably wondering why you should want to support the Atom syndication format over one of the many flavors of RSS. There are two parts to this question, if you are a consumer of syndication feeds or if you are a consumer of syndication feeds.

The Safe Syndication Producer's Perspective
An RSS feed is a regularly updated XML document that contains metadata about a news source and the content in it. Minimally an RSS feed consists of a channel that represents the news source, which has a title, link, and description that describe the news source. Additionally, an RSS feed typically contains one or more item elements that represent individual news items, each of which should have a title, link, or description

There are two primary flavors of RSS; Dave Winer's family of specifications (the most popular being RSS 0.91 & RSS 2.0) and the RDF-based RSS 1.0. The most popular are Dave Winer's family of specifications which have been adopted by a number of well-known organizations such as Yahoo! News, the BBC, Rolling Stone magazine, the Microsoft Developer Network (MSDN) , the Oracle Technology Network (OTN), the Sun Developer Network and Apple's iTunes Music Store. According to Syndic8 which tracks over 50,000 RSS feeds RSS 0.91, RSS 1.0 & RSS 2.0 all have about 30% of the RSS marketshare.

Most news aggregators support all 3 major versions of RSS although few actually take advantage of the fact that RSS 1.0 is an RDF vocabulary. If all one want is simple syndication of news items the RSS 0.91 should be satisfactory. If one plans to use extensions to the core RSS specification that expose application or domain specific functionality such as the ability to post comments one can use one of the many RSS modules in combination with RSS 2.0. The only advantage that RSS 1.0 gives over RSS 0.91/RSS 2.0 is that it is an RDF vocabulary and thus fits nicely into the dream of the Semantic Web.

The Atom syndication format can be considered to be a more sophisticated implementation of the ideas in RSS 2.0. It adds richer syndication capabilities such as the ability to put binary formats such as Word documents and Powerpoint documents in feeds and formalizes some of the best practices in the RSS world around putting [X]HTML in feeds.

The average user of a news aggregator will not be able to tell the difference between an Atom or RSS feed from their aggregator if it supports both. However users of aggregators that don't support Atom will not be able to subscribe to feeds in that format. In a few years, the differences between RSS and Atom will most likely be the same as those that are different between RSS 1.0 and RSS 0.91/RSS 2.0; only of interest to a handful of XML syndication geeks. Even then the simplest and safest bet would still be to use RSS as a syndication format. This is the same as the fact that even though the W3C has published XHTML 1.0 & XHTML 1.1 and is working on XHTML 2.0, the safest bet to get the widest reach with the least problems is to publish a website in HTML 3.2 or HTML 4.01.

The Safe Syndication Consumer's Perspective
If you plan to consume feeds from a wide variety of sources then one should endeavor to support as many syndication formats as possible. The more formats a feed consumer supports the more content is available for its users.

Based on their current popularity, degree of support and ease of implementation one should consider supporting the major syndication formats in the following order of priority

RSS 0.91/RSS 2.0
RSS 1.0
Atom

RSS 0.91 support is the simplest to implement and most widely supported by websites while Atom is the most difficult to implement being the most complex and will be least supported by websites in the coming years.

Categories: XML

February 18, 2004

@ 04:42 PM

Comments [3]

Introductions: My Day Job

This is here mainly for me to be able to look back on in a few years and for any new readers of my blog who wonder what I actually do at Microsoft.

I am a program manager for the WebData XML team. The WebData team is part of the SQL Server Product Unit and produces the major data access technologies that Microsoft produces including MDAC, MSXML, ADO.NET, System.Xml, ObjectSpaces and the WinFS API.

As a technical program manager I am responsible for the nitty gritty of the design of the classes in the following namespaces in the .NET Framework

System.Xml.Schema
System.Xml.XPath
System.Xml (I was sharing this with Joshua before he left our team)

Nitty gritty design details means stuff like triaging bug fixes, designing new features or new classes, writing specifications, and interacting with internal & external customers to discover their likes and dislikes about the APIs in question.

I am also the community lead for the WebData XML team which means I am responsible for things like XML Most Valuable Professional (MVP) program and the upcoming MSDN XML Developer Center. For the MVP program I am the primary point of contact between my team and the Microsoft MVP program and with our MVPs. I am also one of the folks who approves or rejects nominees. As for the developer center, I am the equivalent of what MSDN likes to call a “content strategist“ which basically means I am responsible for the content on the site. For the most part I am also the primary point of contact between my team and MSDN.

If you have any issues or questions related to the aforementioned aspects of my job at Microsoft (e.g. bug reports, feature requests or questions about writing for MSDN) feel free to ping me on my work email address. If you don't know it you should be able to find it from a minute or two of Googling.

Categories: Life in the B0rg Cube

February 18, 2004

@ 06:28 AM

Comments [3]

WinFS as a Digital Media Store

Chris Sells writes

On his quest to find "non-bad WinFS scenarios" (ironically, because he was called out by another Microsoft employee -- I love it when we fight in public : ), Jeremy Mazner, Longhorn Technical Evangelist, starts with his real life use of Windows Movie Maker and trying to find music to use as a soundtrack. Let Jeremy know what you think.

I think the scenario is compelling. In fact, the only issue I have with the WinFS scenario that Jeremy outlines is that he implies that the metadata about music files Windows Media player exposes is tied to the application but in truth most of it is tied to the actual media files as regular file info [file location, date modified, etc] or as ID3 tags [album, genre, artist, etc]. This means that there doesn't even need to be explicit inter-application sharing of data.

If the file system had a notion of a music item which exposed the kind of information one sees in ID3 tags which was also exposed by the shell in standard ways then you could do lots of interesting things with music metadata without even trying hard. I also like it's quite compelling because metadata attached to music files is such a low hanging fruit that one can get immediate value out of and which exists today on the average person's machine.

Categories: Technology

Dare Obasanjo's weblog

"You can buy cars but you can't buy respect in the hood" - Curtis Jackson

Navigation for Sunday, 22 February 2004 - Dare Obasanjo's weblog